Skip to main content

CV DeepSearch — Ingestion (fill the corpus)

Ingestion is the first of the two CV DeepSearch flows: you fill a corpus with parsed CVs once, then search it repeatedly. This guide covers ingesting candidates and reading their embedding status.

For the concepts (corpus, candidate, embedding status), see the CV DeepSearch overview. The full endpoint reference is the Ingestion group in the CV DeepSearch API reference sidebar.

Prerequisites

  • An API key — issued in the ZenHire dashboard. It looks like zh_api_…. Send it on every request as the X-API-Key header.
  • The cvdeepsearch permission on your client. Contact support if your key returns MISSING_PERMISSION.
  • Parsed CVs as JSON (you bring your own CV parser, or reuse the one from your ATS). CV DeepSearch ingests the parsed JSON — it does not parse PDFs.

Ingest a corpus

POST https://platform.zenhire.ai/api/v1/cvds/candidates

Batch your parsed CVs into a corpus. Each candidate is keyed by your own external_id and embedded for semantic retrieval.

curl -X POST "https://platform.zenhire.ai/api/v1/cvds/candidates" \
-H "X-API-Key: zh_api_…" \
-H "Content-Type: application/json" \
-d '{
"corpus_id": "acme-eng-pool",
"candidates": [
{ "external_id": "cand-001", "parsed_cv": { "name": "Jane Doe", "skills": ["Node.js", "PostgreSQL"], "experience": [] }, "tags": ["batch-2026-q2"] },
{ "external_id": "cand-002", "parsed_cv": { "name": "John Roe", "skills": ["Python", "AWS"], "experience": [] } }
]
}'

Response (HTTP 200):

{
"results": [
{ "external_id": "cand-001", "status": "accepted", "embedding_status": "embedded" },
{ "external_id": "cand-002", "status": "accepted", "embedding_status": "embedded" }
],
"summary": { "accepted": 2, "updated": 0, "unchanged": 0, "failed": 0 }
}

Node.js

const res = await fetch("https://platform.zenhire.ai/api/v1/cvds/candidates", {
method: "POST",
headers: {
"X-API-Key": process.env.ZENHIRE_API_KEY,
"Content-Type": "application/json",
},
body: JSON.stringify({
corpus_id: "acme-eng-pool",
candidates: [
{ external_id: "cand-001", parsed_cv: { name: "Jane Doe", skills: ["Node.js"] }, tags: ["batch-2026-q2"] },
],
}),
});
const json = await res.json();
console.log(json.summary);

Python

import requests

res = requests.post(
"https://platform.zenhire.ai/api/v1/cvds/candidates",
headers={"X-API-Key": "zh_…", "Content-Type": "application/json"},
json={
"corpus_id": "acme-eng-pool",
"candidates": [
{"external_id": "cand-001", "parsed_cv": {"name": "Jane Doe", "skills": ["Node.js"]}},
],
},
timeout=60,
)
res.raise_for_status()
print(res.json()["summary"])

Request shape — single or array

candidates is always an array. Send one candidate as a one-element array, or many at once:

// one candidate
{ "corpus_id": "acme-eng-pool", "candidates": [
{ "external_id": "cand-001", "parsed_cv": { "name": "Jane Doe" } }
] }

// many candidates
{ "corpus_id": "acme-eng-pool", "candidates": [
{ "external_id": "cand-001", "parsed_cv": { "name": "Jane Doe" } },
{ "external_id": "cand-002", "parsed_cv": { "name": "John Roe" } }
] }

Things to know

  • corpus_id is fail-closed. Set it top-level or per candidate. A candidate with no resolvable corpus_id rejects the whole batch with MISSING_CORPUS_ID.
  • Idempotent. Re-posting an unchanged candidate is a no-op (unchanged); a changed CV re-embeds it (updated). Re-sync a corpus by re-posting.
  • Limits: up to 500 candidates and ~10 MB per request. More than 500 candidates returns BATCH_TOO_LARGE; a body over ~10 MB returns PAYLOAD_TOO_LARGE. For larger corpora, split into multiple requests (≤ 500 candidates and ≤ ~10 MB each) or use bulk sync.
  • parsed_cv is PII. It's embedded but never returned by the read endpoints — you keep your own copy.
  • Embedding can be async. Large batches return pending and embed in the background. Poll the candidate-status endpoint (below) until the status is embedded before relying on the candidate appearing in a search.

Check embedding status

A candidate is only returned by a search once its embedding_status is embedded. There are two read endpoints.

List a corpus

GET https://platform.zenhire.ai/api/v1/cvds/candidates?corpus_id=acme-eng-pool

curl "https://platform.zenhire.ai/api/v1/cvds/candidates?corpus_id=acme-eng-pool&limit=50" \
-H "X-API-Key: zh_api_…"

Returns a page of candidate statuses (paginated via cursor / next_cursor). You can filter to one state with embedding_status=pending|embedded|failed. corpus_id is mandatory — a request without it is rejected with MISSING_CORPUS_ID (fail-closed).

Get one candidate

GET https://platform.zenhire.ai/api/v1/cvds/candidates/{external_id}?corpus_id=acme-eng-pool

curl "https://platform.zenhire.ai/api/v1/cvds/candidates/cand-001?corpus_id=acme-eng-pool" \
-H "X-API-Key: zh_api_…"

Poll this until embedding_status is embedded before searching.

Error codes

The ingestion endpoints use the shared standard error envelope:

error.codeHTTPMeaningRetry?
INVALID_INPUT400A field failed validation.After fix
MISSING_CORPUS_ID400A mandatory corpus_id was missing (fail-closed).After fix
BATCH_TOO_LARGE400More than 500 candidates in one ingest.After fix
PAYLOAD_TOO_LARGE413Request body over the ~10 MB per-request limit.Split / smaller batch
MISSING_PERMISSION401 / 403Missing/invalid key, or missing cvdeepsearch perm.Contact support
NOT_FOUND404No candidate with that id for your client.No
INTERNAL_ERROR500Uncategorised server error.After delay

Next step

Once your corpus is embedded, move on to the Search guide to query it for a position.