Trust · Model cards
One card per model. Updated on every weight change.
Five models in production. Each has a card. Capability, training data, evaluation, known limitations, the version history, the failure modes we monitor for. Anthropic Claude is the underlying provider for reasoning; OpenAI Whisper for transcription.
Send me the legal pack →See the fairness audit
Card structure
Every card. Same six fields.
01
Purpose. What the model is for, what task it performs in the Picked pipeline.
02
Base model. The underlying foundation model (Claude Opus / Sonnet / Haiku, or Whisper) and version.
03
Inputs. The data the model sees, the data it does not.
04
Outputs. What the model produces (score, classification, transcript, embedding) and how it is consumed downstream.
05
Evaluation. How we test the model: golden datasets, accuracy targets, fairness metrics, red-team results.
06
Limitations. What the model is not good at. What can go wrong. What we monitor for.
Models in production
Model
picked-triage-1.0
Purpose
Fast read of CV and application data against role must-haves. First filter in the pipeline.
Base model
Claude Haiku 4.5 (via @anthropic-ai/sdk, zero retention).
Inputs
CV text, application form responses, role must-haves declared by the hiring manager. No protected-class signals.
Outputs
Numeric triage score (0-100), a written reason, a pass/review/reject classification.
Evaluation
Golden dataset of 2,800 labelled triage decisions; target accuracy 0.92 against human reviewers; current 0.93. Adverse-impact monitored per role.
Limitations
Performs worse on CVs from candidates with non-standard career paths (career changers, returners after a break). Surfaces these as `review` rather than `reject` by default.
Last updated
22 May 2026, v1.0.
Model
picked-screen-1.0
Purpose
Conduct the 15-minute conversational screen with the candidate. Score motivation, deal-breakers, role-fit signals.
Base model
Claude Sonnet 4.6 (via @anthropic-ai/sdk, zero retention) with tool use for structured outputs. OpenAI Whisper for STT.
Inputs
Live conversation audio (transcribed by Whisper), role context, candidate's CV summary. No protected-class signals.
Outputs
Per-trait scores (motivation, communication, deal-breakers), written reasoning per score, pass/review/reject for the next stage.
Evaluation
Golden dataset of 1,400 labelled screen conversations from beta; target accuracy 0.88; current 0.90. Calibration checked weekly against human recruiter ratings on a holdout set.
Limitations
Less reliable on candidates with strong accents or low-fluency English. Mitigation: candidates can switch language at any point; the language switch is logged but not used as a score input.
Last updated
22 May 2026, v1.0.
Model
picked-assess-1.0
Purpose
Score open-ended assessment responses (writing samples, case studies, role-play) against role-specific rubrics. Adaptive item selection on cognitive batteries.
Base model
Claude Sonnet 4.6 for open-ended scoring. Custom adaptive engine for item selection (not LLM-based).
Inputs
Candidate responses, rubric per item, role context. Cognitive-battery responses are scored by deterministic rules, not the LLM.
Outputs
Per-module scores, percentile against role norms, raw responses preserved.
Evaluation
Reliability (alpha) 0.78 to 0.91 across batteries. Differential item functioning (DIF) checked quarterly. Construct validity established by Neuroworx 2017 to 2026.
Limitations
Open-ended scoring is less reliable on highly creative responses (e.g. novel framings of a case). Surfaces these as `review` for human reviewer attention.
Last updated
22 May 2026, v1.0.
Model
picked-interview-1.0
Purpose
Conduct the 30-minute behavioural first-round by voice. Adaptive follow-ups. Score against role-specific rubric.
Base model
Claude Sonnet 4.6 with tool use; OpenAI Whisper for STT; LiveKit Cloud for transport.
Inputs
Live conversation audio (transcribed by Whisper), screen + assessment context, role rubric, candidate's prior answers in this session. No facial analysis. Voice-only at V1.
Outputs
Per-competency scores, written reasoning per score, full transcript, audio recording (candidate-owned).
Evaluation
Inter-rater reliability against human interviewers 0.84 on the 2026 calibration set. Confidence calibration checked against post-hire outcomes (V3 closed-loop).
Limitations
Voice-only is harder for candidates with hearing differences (mitigation: chat-only interview available on request). Audio quality affects transcription accuracy (mitigation: pre-flight audio check, retry option).
Last updated
22 May 2026, v1.0.
Model
picked-rank-3.1.0
Purpose
Aggregate per-stage scores into a final ranking with reasoning. Recommend top 3 finalists.
Base model
Claude Opus 4.7 with tool use for structured outputs.
Inputs
Per-stage scores from triage, screen, assess, interview. Hiring manager calibration. Role weights.
Outputs
Final ranking (1 to N), top-3 recommendation, written reasoning per ranking decision, score decomposition per candidate.
Evaluation
Manager agreement rate (acceptance of top 3 without disagreement) 0.78 in beta. Adverse-impact monitored at the role level on the final ranking, not just individual stages.
Limitations
The model can over-weight a single strong signal (e.g. an outstanding interview) against a balanced profile. Surfaces this via the score decomposition; managers can re-weight on the disagreement loop.
Last updated
22 May 2026, v3.1.0.
Out of scope on this page
Anti-fraud signals are detection layers, not models.
AI-assistance detection, proxy detection, deepfake liveness, and behavioural consistency are signal layers built on top of the production models. They are described on /product/decision-engine and /trust/responsible-ai. They are not separately versioned model cards.
Last reviewed · 22 May 2026 · v1.0
Still on the legal review?
We'll send you the file.
Pre-filled vendor security questionnaire (SIG Lite), DPA, sub-processor list, SOC 2 letter, model card, latest bias audit. One zip. Reply within one business day.
Request the legal pack Email legal@picked.ai
Operated by Neuroworx Ltd · ICO #09910326623
Model cards · Picked.ai