What we shipped in week 21: a built-by-Claude diary.

Blog

A week of Picked, told in commits. The decision-engine refactor, the LiveKit upgrade, the bias-audit cron, and the JD analyser tool.

Guy Thornton11 November 20266 min readFounder note · Engineering diary

Picked is built with Claude as the primary engineering partner. I wanted to write this diary because the question I get most often, from candidates, customers, and other founders, is "what is that actually like to ship". Not the tooling pitch. The week-on-week reality.

This is week 21 of the V1 build. The week starts on Monday morning with a planning session in Cowork. It ends on Friday afternoon with a smoke-test run across the deployed monorepo. Here is what landed.

Monday: the decision-engine refactor.

The decision engine is the surface where the manager reviews three vetted finalists and either advances, parks, or overrides. We had built it as a single 1,200-line React server component in week 12 because that was the fastest way to ship a working version. Time to refactor.

Claude (Sonnet 4.6 in Cowork) read the file, asked four clarifying questions about the override-logging contract, and proposed a four-file split: the score-sheet read model, the override-mutation handler, the decision-card UI, and the audit-trail tail. I approved the split in two passes; Claude wrote it, ran the smoke test, and opened a draft PR. Four hours start to PR. Review took an hour. Merge by close of play Monday.

Tuesday: LiveKit upgrade.

LiveKit cut a new region in Dublin (eu-west-1b) on the 9th of November. Our existing region (Frankfurt) was fine, but Dublin is lower-latency for our UK-heavy candidate pool. The upgrade required a config change in `infra/livekit/regions.ts` and a sanity check on the WebRTC reconnect logic. Two commits, both Claude-authored, both reviewed inside half an hour. Median p50 candidate-to-edge latency dropped from 78ms to 41ms.

Wednesday: the bias-audit cron.

We had been running the bias audit manually every Friday. Time to run it weekly, automatically, and surface the result to the hiring-manager dashboard. Claude scoped the cron job (Supabase pg_cron, on a Saturday 03:00 UTC slot), the query against the per-role-family selection-rate views, and the slack-notify hook for any 4/5ths flag.

The piece I had to push back on: Claude's first proposal queried the demographic table directly, which would have given the cron job read access to a column it should never see. We refactored to query a materialised view that exposes only the ratio, not the underlying counts. Two commits. The audit ran for the first time on the following Saturday. No flags.

Thursday: the JD analyser tool.

The JD analyser scores a job description for bias, clarity, and comp transparency. It lives at /tools/jd-analyser, scheduled as a Tier 2 batch but pulled forward because a beta customer asked for it on a call.

Claude wrote the analysis prompt, the scoring rubric, the result UI, and the API route, in that order. The hardest part was the bias-vocabulary list: my draft had 28 terms; Claude pushed back on five of them as too aggressive (e.g. "rockstar" got me a tradeoff conversation about the difference between vernacular and exclusion). We kept "rockstar" in for now; the override note is in the commit message. The tool went live on Thursday afternoon.

Friday: smoke-test, deploy, write this post.

The smoke-test suite runs against the deployed preview, hits every public marketing page, posts a synthetic role, runs a synthetic candidate through the assessment, and checks the dashboard render. All 144 cases pass. The week's changes ship to production at 16:42 Friday. I write this post on the Tube back from the Charterhouse Square office. The whole post is drafted in Cowork, edited by me, then routed back through the editorial-policy gate (no em dashes, no exclamation marks, no banned hype words).

The honest part.

Two things I would not have predicted six months ago.

One, the speed compounds. The decision-engine refactor took four hours because Claude already knew the codebase, the conventions, the smoke-test state, and the editorial rules. In week 6, the same refactor would have taken two days. We did not change the model; we changed how much context the model has on us.

Two, the editing matters more, not less. The bias-vocabulary tradeoff on Thursday was a judgement call, and the AI is the wrong place to make a judgement call about exclusion in tech-culture vocabulary. I made it; the commit has my name on it; the override is documented. The model is the engineer, I am the editor, and the customers are the readers. The editor matters more when the engineer is fast.

Six pieces of work. Five days. One Friday deploy. No incidents. Week 22 starts Monday.

This post was written with Claude. Edited and signed off by Guy.

About the author

Guy Thornton

Founder, Neuroworx

Founder of Neuroworx. Eight years building psychometrics into hiring. Writes about the unit economics, the candidate side of the funnel, and what shipping with Claude looks like in practice.

Keep reading

Three more pieces.

Stop reading. Start hiring.

Three vetted finalists.

Friday.

Post a role for free See the product page