Skip to main content

What to monitor

Availability

Node heartbeats, registry uptime, and request success rate.

Latency

P50/P95/P99 inference duration by model and node.

Cost

Credits consumed per route, model, and tenant.

Capacity

Queue depth, GPU utilization, memory pressure, and saturation.
MetricTypeWhy it matters
aris_requests_totalCounterTotal traffic per endpoint/model.
aris_request_latency_msHistogramDetect slowdowns and regressions.
aris_inference_errors_totalCounterTrack node and model failures.
aris_queue_depthGaugeSignal backpressure and scale events.
aris_credits_deducted_totalCounterCost visibility and billing audits.

Alert policy

Alert immediately when success rate drops below 97% for 5 minutes or registry is unreachable.

Logs and trace correlation

Use a shared request_id across:
  • client request logs
  • registry routing logs
  • node execution logs
This allows end-to-end failure triage in one query.
Last modified on February 21, 2026