Candidate detection

Most interview recordings contain at least two speakers: an interviewer and a candidate. The scoring pipeline must know which speaker to score — otherwise you'd be scoring your recruiter's English instead of the candidate's.

How it works

After transcription and speaker diarization, the API inspects each speaker's segments and picks the candidate using:

Total speaking time (candidates typically speak more)
Nature of the speech (answering questions vs. asking them)
Word count and segment distribution

The decision is returned on every successful poll response:

{
  "candidateDetection": {
    "speaker": "speaker_1",
    "speakerLabel": "Speaker 2",
    "confidence": "high",
    "reason": "Speaker answers questions and describes experience",
    "wordCount": 432
  }
}

Confidence levels

Confidence	When you see it
`high`	One speaker clearly dominates in candidate-like behavior.
`medium`	Less clear — maybe the interview was very balanced.
`low`	Ambiguous. Consider reviewing the transcript manually.

Scores are always produced regardless of confidence level, but low results deserve extra human review before being used in hiring decisions.

Single-speaker recordings

If the recording contains only one speaker (e.g., a pre-recorded monologue or screening answer), that speaker is scored. speakerCount: 1 is returned.

Minimum useful duration

The pipeline needs at least 3 minutes of total audio and a reasonable amount of candidate speech to produce reliable scores. Submissions under 3 minutes are rejected with AUDIO_TOO_SHORT.

See in the API reference

The candidateDetection object is returned on every successful poll response:

GET /api/v1/speech/analyze/{id} — see the candidateDetection field in the 200 response schema

How it works​

Confidence levels​

Single-speaker recordings​

Minimum useful duration​

See in the API reference​

How it works

Confidence levels

Single-speaker recordings

Minimum useful duration

See in the API reference