Scaling - ARIS Core Ecosystem

Scaling goals

Scale out first

Add more workers before increasing per-node concurrency.

Tune concurrency

Raise max_concurrency incrementally while tracking P95 latency.

Pin workloads

Route model families to dedicated node pools when possible.

Component	Recommendation
API layer	Run multiple stateless replicas behind a load balancer.
Database	Use managed PostgreSQL with automated backups and read replicas.
Cache	Add Redis for node discovery and session lookup hot paths.
Queue	Use durable queues for asynchronous settlement and retries.

Use a combination of:

Avoid common scaling mistakes

Scaling only by CPU can under-provision GPU-bound workloads.

Large concurrency jumps can increase timeout rates and reduce total throughput.

Mixing dissimilar models in one pool can create noisy-neighbor effects.

Last modified on February 21, 2026