Scaling goals
- Increase throughput without unstable tail latency
- Maintain request success rate during demand spikes
- Keep credit settlement and node discovery consistent
Worker scaling strategy
Registry scaling strategy
| Component | Recommendation |
|---|---|
| API layer | Run multiple stateless replicas behind a load balancer. |
| Database | Use managed PostgreSQL with automated backups and read replicas. |
| Cache | Add Redis for node discovery and session lookup hot paths. |
| Queue | Use durable queues for asynchronous settlement and retries. |
Autoscaling signals
Use a combination of:- queue depth
- in-flight request count
- P95 latency
- GPU utilization
Avoid common scaling mistakes
Avoid common scaling mistakes