Assessment · Picked.ai

Product · Assessment

Eight years of psychometric science, now one stage of the pipeline.

The item bank we built at Neuroworx since 2018, validated against real hires, audited for bias, and now running adaptively per role. Not a single test. A library, with the right instrument chosen for the work.

Post a role →Read the validation methodology

The four families

Four kinds of signal, mixed per role.

01

Cognitive ability.

Numerical, verbal and logical reasoning in one timed section. Twelve questions, fifteen minutes, one combined score.

02

Work-style profile.

Twenty-five short statements about how someone likes to work. Untimed, and not scored: it maps preferences, it does not rank people.

03

Situational judgement.

Six scenarios written for the role. Untimed. What the candidate would actually do, not what sounds best.

04

Role skills and writing.

Eight role-specific questions, plus two short written answers: a writing prompt and a system trade-off. Generated per role from the brief.

Selection

One library. One assessment chosen per role.

When you post a role, the assessment selector tunes the mix from the library based on the role, the must-haves, and the seniority. Every candidate takes the same short cognitive section; what changes per role is the role skills questions, the situational judgement scenarios, and the two written prompts. An SDR is not asked to reason through a system trade-off written for a staff engineer. The selection is visible to you and editable before the role goes live.

●Role family shapes the battery: the role skills questions and situational judgement scenarios are written for the role.
●Must-haves steer it (e.g. "must have managed a quota team" points the situational judgement toward leadership).
●Two sections are timed, cognitive and role skills; the whole assessment runs about 45 to 50 minutes.
●You can review and adjust the assessment before the role publishes.

Adaptivity

Items adjust to the candidate, not the other way around.

On cognitive batteries, the next item depends on the last response. A strong run pushes the candidate into harder items quickly; a weak run keeps them at a fair level. Two outcomes: a more precise score, and a shorter test for the candidate. The median assessment runs 22 minutes, not 45.

22 min

median assessment length.

0.78 to 0.91

internal reliability (alpha) across batteries.

14%

reduction in candidate dropout vs the fixed-form test.

Anti-fraud

Four layers, no proctoring.

01

AI-assistance detection.

Patterns of paste, paste timing, response cadence, and answer fingerprinting against known-LLM outputs. Flagged, not auto-rejected.

02

Proxy detection.

Voice biometrics (where the candidate consents) compared between screen, assessment, and interview. Cross-stage signal, not a single test.

03

Copy-paste detection.

Open-ended items track keystroke patterns and clipboard events. Honest answers do not get flagged; verbatim paste does.

04

Deepfake video (V2).

When video lands, we run liveness and face-consistency checks across the session. V1 is voice-only so this is deferred.

No camera, no screen-share, no remote-take-over. Intrusive proctoring punishes honest candidates more than it catches dishonest ones.

Fairness

Monitored per role, per stage, per protected group.

Every assessment in production is monitored for adverse impact against protected groups, per role, per stage, per audit cycle. The 4/5ths rule is the floor. When a flag fires, the role's hiring manager and our internal review team both see it before the next candidate is processed.

Independent bias audit planned ahead of launch.
Per-role drift detection: if a role's pass rates shift on a protected attribute, we know and so do you.
Adverse-impact dashboard in product, shareable with HR or legal.
See /trust/fairness for the full posture.

Read the science

Every model in production has a card. We publish them.

Capability, training data, evaluation, known limitations, the version history, and the failure modes we monitor for. Updated quarterly.

Read the model cards →

Other product surfaces

PRODUCT · SCREENING

Screening, by AI conversation.

Every applicant. Fifteen minutes. In their language.

Read →

PRODUCT · INTERVIEW

A behavioural first-round, by voice.

Live conversation. Structured. Scored. Thirty minutes.