Send a prompt to an eligible worker node and receive generated output plus usage metadata.
Request body
Input text for inference.
Model identifier (for example tinyllama).
Maximum output tokens. Default is 256.
Sampling temperature between 0.0 and 2.0. Default is 0.7.
Example request
{
"prompt": "Explain quantum tunneling in one paragraph.",
"model": "tinyllama",
"max_tokens": 256,
"temperature": 0.7
}
Success response
Unique job ID for this generation request.
Worker node that executed the inference.
Generated completion text.
Usage object with token and credits metadata.
Response example
{
"id": "job_abc123",
"node": "node-9006",
"usage": {
"tokens": 142,
"credits_deducted": 1.42
},
"output": "Quantum tunneling occurs when..."
}
Error responses
| Status | Meaning | Retry |
|---|
400 | Invalid request parameters. | No |
401 | Missing or invalid API key. | No |
402 | Insufficient credits. | No |
429 | Rate limited. | Yes |
503 | No healthy nodes available. | Yes |