Essay May 2026

Who you hire, who you grow, who you promote — and how AI gets this wrong

AI does not remove bias from people decisions. It removes the randomness, then applies whatever bias was already there, at scale, every time.

Who you hire, who you grow, who you promote. These are among the most consequential decisions a company makes. They are also deeply flawed, in ways most organizations have accepted as just how things work. I have been making and living through these decisions for over twenty years. The errors are frustratingly consistent: good people overlooked, wrong people advanced, processes that feel rigorous but are not. Companies pay for this in culture, in attrition, and in the quiet departure of the people who were actually doing the work.

AI is being deployed into these processes right now, at scale, with the promise of making them more objective. Some of that promise is real. Some of it is not. Understanding which is which requires being precise about the problem you are actually trying to solve.

Behavioral economists distinguish between two ways decisions go wrong.

Type of errorWhat it means
Noise
Random error
The same decision is made differently depending on who decides, when, and in what state of mind. No consistent direction. Just variability.
Bias
Systematic error
Decisions that consistently go in the same wrong direction. Predictable. Repeatable. Often invisible to the people making them.

They look similar from the outside. They have different causes. AI does very different things to each.

Resume Screening

The first pass through a resume takes about five seconds. You are scanning for pattern matches: company names, schools, titles, years of experience. Does this person fit the picture you already have in your head of what this hire should look like? That picture forms before you open the first file. Most resumes do not make it past this pass.

What actually determines whether a candidate gets five real seconds of attention is partly the reviewer. Whether you just came off a hard meeting. Whether it is 4:30pm and you are in a rush to pick up your kid from school. Whether the candidates before this one were strong or weak. The candidate has not changed. You have.

This is a noise problem. AI handles it well. A model evaluating 200 candidates against defined criteria does not get tired, does not carry the weight of the previous meeting, does not score the last resume differently than the first. That consistency matters.

What Amazon found out is what happens when you skip the next question. Their AI screening tool was trained on ten years of their own hiring data. The model learned to downgrade resumes with the word "women's" and penalized graduates of women's colleges, because the historical workforce it learned from was predominantly male. Nobody programmed it to discriminate. It learned the pattern from the data and applied it to every decision, without exception (Reuters, October 2018).

AI does not remove bias from people decisions. It removes the randomness, then applies whatever bias was already there, at scale, every time.

The noise was gone. The bias was proliferated. Not the same thing.

Promotion and Performance Calibration

Calibration meetings are where careers get decided. A VP holds the budget and rating distribution. Each manager walks through their people and makes the case for a rating or a promotion. The stakes are real: compensation, trajectory, whether someone stays.

Most companies have rubrics. Competencies mapped to levels: technical depth, leadership, influence. The frameworks exist. But the calibration conversation is almost always a subjective interpretation of those rubrics, because the rubrics themselves are rarely precise enough to settle a disagreement. What does "expert in leadership" actually look like at this level? Nobody fully understands or agrees.

What fills that gap is perception. The person who has cultivated their visibility, who made the VP look good in the one meeting the VP attended all year, gets the nod. The person who has been heads-down and is not naturally self-promotional has to fight to justify work that should speak for itself. Over time, the people actually driving results watch someone else get rewarded for talking about it. Then they leave. This is where you lose the 20% who were actually giving a damn.

AI can help with part of this. Surfacing a full year of documented goals, peer feedback, and outcomes fights recency bias with evidence. Flagging gendered language in written feedback catches a documented pattern. Useful.

But when AI scores promotion readiness or predicts leadership potential, it has to learn what those terms mean from somewhere. It learns from your historical promotion data. Which means it learns who your organization has promoted before. In most organizations, that history reflects the same perception-over-performance dynamic the tool was supposed to fix.

So where does AI actually add value?

Not where one would think. When an AI score agrees with the room's consensus, it confirms what everyone already thought. That is not insight. The value is when it disagrees: when it flags a candidate the committee was ready to pass on, or surfaces evidence that contradicts the story being built around someone. That friction is accountability. It forces a harder conversation in a room that might otherwise let things slide.

But only if you know what the model is measuring. A score is only as meaningful as your understanding of what produced it. Point AI at historical outcomes and it reproduces those outcomes, embedded patterns and all. Point it at an explicit, forward-looking definition of what good looks like and it becomes a check against evidence that does not know or care who presents well in meetings.

This is not a new problem. Every leader has watched a metric drive the wrong behavior because it was measuring a proxy. A sales team optimizing for call volume instead of revenue. A product team shipping features instead of solving problems. AI does not change the principle. It raises the cost of skipping the definition work, because now that skip runs in every decision, consistently, with a confidence score on top.

Resume ScreeningPromotion Decision
Dominant error Noise. High volume, repeating, different reviewers on different days. Bias. Recency, affinity, visibility. Who the VP has seen in a meeting.
What AI does well Applies identical criteria to every candidate. Removes fatigue, sequence effects, and occasion noise entirely. Surfaces the full year of evidence. Fights recency bias. Flags language patterns in written feedback.
Where AI creates risk If trained on historical hires, it learns and scales whatever bias shaped those hires. Consistently. Invisibly.Manageable with explicit criteria and regular auditing. A readiness score trained on past promotions learns who got promoted before, including by the same flawed process.Higher stakes. One decision, one career. No volume to average errors out.
Unexpected value When the score disagrees with the room. Not confirmation — friction. A candidate flagged that the committee was ready to pass on. Evidence that contradicts the narrative. That is accountability.A score you agree with tells you nothing new. A score that challenges the consensus is where the value is.

The principle

Define what good looks like before you measure it. Point AI at historical outcomes and it reproduces them, embedded patterns and all. Point it at an explicit, forward-looking definition and it becomes a check against evidence that does not know or care who presents well in meetings. That was true before AI. AI just made skipping that step expensive in a new way.


Before your next AI deployment on a people decision: do you know what the model was trained on? And does your definition of good exist anywhere outside someone's head?