from aris.client import Aris
client = Aris(
api_key="sk-aris-your-key",
timeout=45,
headers={"X-Environment": "staging"},
)
Retry with exponential backoff
import time
from aris.client import Aris
from aris.errors import ArisNodeError
client = Aris(api_key="sk-aris-your-key")
for attempt in range(4):
try:
result = client.generate(prompt="Generate release notes from this changelog")
print(result["output"])
break
except ArisNodeError:
if attempt == 3:
raise
time.sleep(2 ** attempt)
Async fan-out requests
import asyncio
from aris.client import AsyncAris
client = AsyncAris(api_key="sk-aris-your-key")
async def run(prompt: str):
return await client.generate(prompt=prompt, max_tokens=120)
async def main():
prompts = [
"Summarize incident 1021",
"Summarize incident 1022",
"Summarize incident 1023",
]
results = await asyncio.gather(*(run(p) for p in prompts))
for item in results:
print(item["output"])
asyncio.run(main())
Runtime guidance
Latency sensitive
Throughput focused
Cost constrained
Keep max_tokens small, use deterministic prompts, and cap retries to avoid tail latency.
Use AsyncAris, batch independent calls, and apply concurrency limits at your app layer.
Lower output token limits, avoid unnecessary context, and monitor credit burn per endpoint.
Pair this guide with /errors to standardize retryable vs non-retryable failures. Last modified on February 21, 2026