Rate limits & concurrency
Two independent throttling mechanisms apply to every request:
- Per-minute rate limit (how fast you submit).
- Concurrency cap (how many runs can process simultaneously).
Rate limits (per minute)
Enforced on the submit endpoint:
| Limit | |
|---|---|
| Default | 500 requests/minute per client |
The rate limit is configured per key and visible in the dashboard.
Over the limit returns:
429 RATE_LIMIT_EXCEEDED
Back off and retry. Implement exponential backoff — don't hot-loop.
Need higher throughput? Contact ZenHire support — enterprise plans can raise the per-minute cap.
Concurrency cap
Default 8 simultaneous processing runs per client (contact support
to extend). Enforced atomically at submit time. When you hit the cap,
the submit endpoint still returns 202 Accepted — but with
status: "queued":
{
"id": "req_...",
"status": "queued",
"queuePosition": 3,
"activeRequests": 8,
"pollIntervalSeconds": 20
}
Queued requests start automatically in FIFO order when a slot frees up.
You don't need to retry submit. Just poll the returned requestId.
Poll-endpoint rate limit
The poll endpoint has its own minimum interval: 10 seconds per
requestId for non-terminal statuses. Polling faster returns 429 POLL_RATE_LIMITED with a Retry-After header.
Strategy recommendations
- Respect
pollIntervalSecondsfrom every response. - Implement exponential backoff on
429 RATE_LIMIT_EXCEEDED. - Don't treat
status: queuedas an error — it's a normal submit outcome. - Parallelize freely — queued runs cost nothing until they start processing.
See in the API reference
- POST /api/v1/speech/analyze — 429
RATE_LIMIT_EXCEEDEDresponse,queuePosition/activeRequestsfields on 202 queued response - GET /api/v1/speech/analyze/{id} — 429
POLL_RATE_LIMITEDresponse andRetry-Afterheader