Model cards · Picked.ai

Trust · Model cards

One card per model. Updated on every weight change.

Five models in production. Each has a card. Capability, training data, evaluation, known limitations, the version history, the failure modes we monitor for. Anthropic Claude is the underlying provider for reasoning; OpenAI Whisper for transcription.

Send me the legal pack →See the fairness audit

Card structure

Every card. Same six fields.

Purpose. What the model is for, what task it performs in the Picked pipeline.

Base model. The underlying foundation model (Claude Opus / Sonnet / Haiku, or Whisper) and version.

Inputs. The data the model sees, the data it does not.

Outputs. What the model produces (score, classification, transcript, embedding) and how it is consumed downstream.

Evaluation. How we test the model: golden datasets, accuracy targets, fairness metrics, red-team results.

Limitations. What the model is not good at. What can go wrong. What we monitor for.

Models in production

Model

picked-triage-1.0

Purpose

Fast read of CV and application data against role must-haves. First filter in the pipeline.

Base model

Claude Haiku 4.5 (via @anthropic-ai/sdk, zero retention).

Inputs

CV text, application form responses, role must-haves declared by the hiring manager. No protected-class signals.

Outputs

Numeric triage score (0-100), a written reason, a pass/review/reject classification.

Evaluation

Golden dataset of 2,800 labelled triage decisions; target accuracy 0.92 against human reviewers; current 0.93. Adverse-impact monitored per role.

Limitations

Performs worse on CVs from candidates with non-standard career paths (career changers, returners after a break). Surfaces these as `review` rather than `reject` by default.

Last updated

22 May 2026, v1.0.

Model

picked-screen-1.0

Purpose

Conduct the 15-minute conversational screen with the candidate. Score motivation, deal-breakers, role-fit signals.

Base model

Claude Sonnet 4.6 (via @anthropic-ai/sdk, zero retention) with tool use for structured outputs. OpenAI Whisper for STT.

Inputs

Live conversation audio (transcribed by Whisper), role context, candidate's CV summary. No protected-class signals.

Outputs

Per-trait scores (motivation, communication, deal-breakers), written reasoning per score, pass/review/reject for the next stage.

Evaluation

Golden dataset of 1,400 labelled screen conversations from beta; target accuracy 0.88; current 0.90. Calibration checked weekly against human recruiter ratings on a holdout set.

Limitations

Less reliable on candidates with strong accents or low-fluency English. Mitigation: candidates can switch language at any point; the language switch is logged but not used as a score input.

Last updated

22 May 2026, v1.0.

Model

picked-assess-1.0

Purpose

Score open-ended assessment responses (writing samples, case studies, role-play) against role-specific rubrics. Adaptive item selection on cognitive batteries.

Base model

Claude Sonnet 4.6 for open-ended scoring. Custom adaptive engine for item selection (not LLM-based).

Inputs

Candidate responses, rubric per item, role context. Cognitive-battery responses are scored by deterministic rules, not the LLM.

Outputs

Per-module scores, percentile against role norms, raw responses preserved.

Evaluation

Reliability (alpha) 0.78 to 0.91 across batteries. Differential item functioning (DIF) checked quarterly. Construct validity established by Neuroworx 2017 to 2026.

Limitations

Open-ended scoring is less reliable on highly creative responses (e.g. novel framings of a case). Surfaces these as `review` for human reviewer attention.

Last updated

22 May 2026, v1.0.

Model

picked-interview-1.0

Purpose

Conduct the 30-minute behavioural first-round by voice. Adaptive follow-ups. Score against role-specific rubric.

Base model

Claude Sonnet 4.6 with tool use; OpenAI Whisper for STT; LiveKit Cloud for transport.

Inputs

Live conversation audio (transcribed by Whisper), screen + assessment context, role rubric, candidate's prior answers in this session. No facial analysis. Voice-only at V1.

Outputs

Per-competency scores, written reasoning per score, full transcript, audio recording (candidate-owned).

Evaluation

Inter-rater reliability against human interviewers 0.84 on the 2026 calibration set. Confidence calibration checked against post-hire outcomes (V3 closed-loop).

Limitations

Voice-only is harder for candidates with hearing differences (mitigation: chat-only interview available on request). Audio quality affects transcription accuracy (mitigation: pre-flight audio check, retry option).

Last updated

22 May 2026, v1.0.

Model

picked-rank-3.1.0

Purpose

Aggregate per-stage scores into a final ranking with reasoning. Recommend top 3 finalists.

Base model

Claude Opus 4.7 with tool use for structured outputs.

Inputs

Per-stage scores from triage, screen, assess, interview. Hiring manager calibration. Role weights.

Outputs

Final ranking (1 to N), top-3 recommendation, written reasoning per ranking decision, score decomposition per candidate.

Evaluation

Manager agreement rate (acceptance of top 3 without disagreement) 0.78 in beta. Adverse-impact monitored at the role level on the final ranking, not just individual stages.

Limitations

The model can over-weight a single strong signal (e.g. an outstanding interview) against a balanced profile. Surfaces this via the score decomposition; managers can re-weight on the disagreement loop.

Last updated

22 May 2026, v3.1.0.

Out of scope on this page

Anti-fraud signals are detection layers, not models.

AI-assistance detection, proxy detection, deepfake liveness, and behavioural consistency are signal layers built on top of the production models. They are described on /product/decision-engine and /trust/responsible-ai. They are not separately versioned model cards.

TRUST · FAIRNESS

Fairness

Per-role bias monitoring. Four-fifths rule. Annual independent audit.

Read →

TRUST · SECURITY

Security

Infrastructure, encryption, access, audit trail. SOC 2 Type II in progress.

Read →

TRUST · RESPONSIBLE AI

Responsible AI

Six principles. Eight candidate rights. The lines we will not cross.

Read →

Last reviewed · 22 May 2026 · v1.0

Still on the legal review?

We'll send you the file.

Pre-filled vendor security questionnaire (SIG Lite), DPA, sub-processor list, SOC 2 letter, model card, latest bias audit. One zip. Reply within one business day.

Request the legal pack →Email legal@picked.ai ↗

Operated by Neuroworx Ltd · ICO #09910326623