What to monitor
Availability
Node heartbeats, registry uptime, and request success rate.
Latency
P50/P95/P99 inference duration by model and node.
Cost
Credits consumed per route, model, and tenant.
Capacity
Queue depth, GPU utilization, memory pressure, and saturation.
Recommended metrics
| Metric | Type | Why it matters |
|---|---|---|
aris_requests_total | Counter | Total traffic per endpoint/model. |
aris_request_latency_ms | Histogram | Detect slowdowns and regressions. |
aris_inference_errors_total | Counter | Track node and model failures. |
aris_queue_depth | Gauge | Signal backpressure and scale events. |
aris_credits_deducted_total | Counter | Cost visibility and billing audits. |
Alert policy
- Critical
- Warning
- Capacity
Alert immediately when success rate drops below 97% for 5 minutes or registry is unreachable.
Logs and trace correlation
Use a sharedrequest_id across:
- client request logs
- registry routing logs
- node execution logs