How we test for engineers, without a take-home.

Blog

The voice-first behavioural interview, the role-fit assessment, and the calibration loop. No take-home, no LeetCode, no whiteboard.

Picked Team28 October 20268 min readEngineering · Method

Take-homes are a tax on the candidate. A 6-hour coding exercise, evaluated for 15 minutes by a tired engineer, is a poor signal generator and an even worse candidate experience. We do not run them.

What we run instead is two assessments, one conversation, and a structured score sheet. The whole thing takes the candidate about 45 minutes. The signal is calibrated against eight years of psychometric data from the Neuroworx side of the business. Here is the method, in full.

Step one: the role-fit assessment.

Twelve to eighteen items, depending on the seniority of the role. Each item is a scenario calibrated against the role family: a backend engineer at series B sees a different item set from a staff engineer at mid-market. The items are not multiple-choice trivia. They are short forced-choice tradeoffs (e.g. "you have a flaky test that fails 1 in 20 runs. The release is in 36 hours. What do you do, and why?") scored on the reasoning more than the answer.

The item bank is sourced from the same psychometric battery Neuroworx has been running since 2018. It has been validated against on-the-job performance for over 40,000 hires across software, data, and operations roles. The 4/5ths fairness check (every protected group, every role) is applied on every batch.

Step two: the voice-first behavioural interview.

A 20 to 25 minute live voice conversation between the candidate and Picked. Not a recording. Not a one-way Loom. A real adaptive interview, scored against the role rubric in flight. The candidate speaks. Picked listens, asks clarifying questions, and probes on weak answers. The transcript is published to the hiring manager with the score sheet.

The conversation runs on LiveKit, with OpenAI Whisper doing transcription and Anthropic Claude doing the reasoning and scoring. The voice agent stack is documented on /trust/model-cards.

Two things make this different from HireVue-style recorded video:

It is adaptive. If the candidate answers a question shallowly, the interviewer probes. If they answer deeply, the interviewer escalates the next question. No two interviews are identical.
It is voice-only by default, with a video option. We do not use facial recognition. We do not weight tone of voice in the score. The signal is the content of the answer, not the candidate's accent, hair, or webcam quality.

Step three: the score sheet.

Every candidate produces a single-page score sheet: eight to twelve competency scores, each with the evidence span from the transcript, and a single role-fit rank. The hiring manager reads the sheet, opens the transcript on any score they want to challenge, and either accepts the rank or overrides it. The override is logged.

We publish the rubric. We do not hide the prompt. The model card for each role family is on /trust/model-cards, version-controlled, and the bias audit per role family is on /trust/fairness.

Why not a take-home.

A take-home selects for candidates with six hours of free time on a Sunday. That is not the same set as good engineers. It under-selects for parents, carers, and candidates with second jobs. It is a quiet bias filter applied early and never measured.

The voice interview also selects, of course. Every assessment selects. But it selects on the thing you want, which is "can this person reason about an engineering tradeoff in real time and explain it to a colleague". That generalises to the job. A clean take-home submission does not.

The 90-second proof.

We have run the method against four held-out cohorts so far. Performance prediction at 6 months on the job, measured against manager performance ratings, comes out at r = 0.49 against an industry benchmark of r = 0.20 for unstructured interviews and r = 0.34 for work-sample tests. The methodology is on /trust/model-cards. The held-out cohorts are not yet large enough to publish externally; we plan to do so in the first annual report in Q3 2027.

Engineering hiring is the hardest part of the funnel, because engineers know what bad assessment looks like and will opt out. The method above is what we built so they would opt in.

Quotes paraphrased from beta sessions, May 2026. Named, consented quotes ship with V1.

About the author

Picked Team

Engineering and research

The people building Picked. Method posts, model cards, fairness audits, product opinions. Edited and signed off by the engineering and research leads.

Keep reading

Three more pieces.

Stop reading. Start hiring.

Three vetted finalists.

Friday.

Post a role for free See the product page