Monitoring - ARIS Core Ecosystem

What to monitor

Node heartbeats, registry uptime, and request success rate.

P50/P95/P99 inference duration by model and node.

Credits consumed per route, model, and tenant.

Queue depth, GPU utilization, memory pressure, and saturation.

Metric	Type	Why it matters
`aris_requests_total`	Counter	Total traffic per endpoint/model.
`aris_request_latency_ms`	Histogram	Detect slowdowns and regressions.
`aris_inference_errors_total`	Counter	Track node and model failures.
`aris_queue_depth`	Gauge	Signal backpressure and scale events.
`aris_credits_deducted_total`	Counter	Cost visibility and billing audits.

Alert immediately when success rate drops below 97% for 5 minutes or registry is unreachable.

Use a shared request_id across:

This allows end-to-end failure triage in one query.

Last modified on February 21, 2026