Essay Jun 2026

Neither Tool Nor Colleague

One camp believes AI will restructure economies, industries, and human cognition. Another believes the hype will fade and the world after AI will look only incrementally different from the one before it. Both positions resolve cognitive dissonance rather than engage with what the technology actually is. There is a more accurate frame, and it has consequences.

Frontier AI labs are in a race to push the boundaries of what models can do. A researcher's ability to test new ideas and ship promising ones determines how effective and productive they are. But the infrastructure underneath including compute clusters, data pipelines, compliance, and governance takes weeks to learn and is unforgiving when misused. The gap between what researchers need to do and what the infrastructure asks of them is real and costly. My team and I were building APIs to close it.

The normal path would have been a design document. Weeks of iterations, countless comments and suggestions from SWEs to VPs, sweating through which feedback to take and which to push back on without losing the room, and at the end of it all, a spec that still deferred the hardest decisions to implementation. Instead, I built a working prototype with AI in days. Not a description of the API. The API itself: callable, explorable, opinionated in specific ways that engineers could immediately engage with and argue about.

The prototype changed the medium of the conversation. A specification describes what an API should do. A working prototype declares it. Description involves interpretation at both ends: the writer encodes intent, the reader decodes it, and the gap between those two acts is where most product problems live. Declaration removes that gap. The engineers were not reading about the API. They were reasoning about it directly, in the same language it was written in.

The prototype did not make anyone faster. It made the right problems visible at a stage when they could still be changed.

The anchor that broke

For the entirety of human history, two kinds of entities could operate in the world: humans, who reason, and machines, which execute. The boundary was clean even when machines became extraordinarily sophisticated. A jet engine is complex. A supply chain optimization algorithm is intricate. But a human engineer understands, in principle, every step of how they work. The machine does what it was built to do.

This separation was more than a technical fact. It was a cognitive anchor. It told us how to relate to our tools, how to assign responsibility, how to calibrate trust. When the machine outputs something unexpected, you debug it. When a human outputs something unexpected, you have a conversation.

AI breaks that anchor. Large language models, combined with the layers built around them, produce outputs that sit in neither category. They respond to nuance. They handle ambiguity. They adjust to context. They push back. They surface considerations their designers did not explicitly anticipate.

The machine no longer executes. It responds. And human cognition, built for a world with a clean line between tools and minds, has no ready machinery for that.

Where people land when the model breaks

When people encounter something that breaks a mental model they have never had to question, they find the nearest familiar category and pull the new thing into it. With AI, two categories are available, and the enterprise world has sorted itself between them.

One camp insists AI is another machine. Sophisticated, useful, but reducible to statistical pattern matching: next-token prediction, autocomplete at scale. The framing has the appeal of intellectual restraint. Manage AI this way and you look for bounded use cases, measure against narrow benchmarks, keep humans in the loop as supervisors of a process they fully understand. You fail to build the organizational capability to engage with it as it is.

The other camp extrapolates from a moment of surprise to a future of radical discontinuity. If it does not fit in the machine category, it must belong in the human one. Strategy orients around transformation. This produces paralysis, governance calibrated for a future that may not arrive on the expected schedule, and attention spent on questions that cannot yet be answered.

Neither position engages with what the technology is.

What has actually been built

The "just autocomplete" dismissal captures one narrow mechanism the way "controlled explosions in metal cylinders" captures a jet engine. Technically accurate. Useless as a basis for understanding what the system does.

I have worked on the internals of how these systems are built. The reductive framing belongs in a different category.

Modern AI systems are compound. At their foundation, models trained on enormous corpora of human-generated text develop internal representations of concepts, relationships, and reasoning patterns that were not explicitly programmed. On top of that, alignment processes shape the system to optimize for outputs that reflect human judgment about what is helpful and accurate. Domain fine-tuning builds depth in particular fields, developing facility with how those fields reason. Inference-time reasoning lets models work through problems iteratively, check their own logic, and arrive at conclusions they could not reach in a single pass. Retrieval-augmented systems ground outputs in specific knowledge: your enterprise data, specialized corpora, current information.

Figure 1 — The compound AI stack
The compound AI stack A layered diagram showing six components: foundation model, alignment, inference-time reasoning, domain fine-tuning, retrieval-augmented grounding, and enterprise integration. Foundation model Patterns, concepts, and reasoning learned from human-generated text at scale Alignment (RLHF + SFT) Optimizes for outputs that reflect human judgment about what is helpful and accurate Domain fine-tuning Builds facility with how specific fields reason, beyond vocabulary Inference-time reasoning Works through problems iteratively, checks its own logic, revises conclusions Retrieval-augmented grounding (RAG) Connects to your enterprise data, specialized corpora, and current information Enterprise integration APIs, agents, workflow triggers: outputs consumed by humans and machines base surface

The compound result exhibits something that functions like understanding. Not human understanding. Not consciousness. A form of contextual reasoning that is new in the world, and that no reductive framing accounts for.

What changes when humans work with it

Human cognition evolved in a world where language was the exclusive domain of minds. When something uses language fluently, tracking context, responding to nuance, producing synthesis across domains, the brain does not process it as output. It processes it as communication. The cognitive systems that activate are the same ones that activate with another person.

The feature store work made this concrete. The hard design problems: how to handle idempotency across agent retries, how to version immutably without breaking researcher workflows, how to make the same API legible to a human in a notebook and to a machine constructing calls from context. These are the problems that get papered over in a document. You write the decision down and it looks resolved. In working code, nothing is hidden. Every tradeoff is either handled or visibly absent.

AI made it possible to reach that level of concreteness before organizational momentum locked in a direction. That changed what engineers could engage with. The feedback loop that normally runs over weeks: write spec, review, revise, hand to engineering, discover the real problems. It collapsed. Not because anyone worked faster. Because the artifact changed what kind of feedback was possible. This is the same pattern as starting specific: you cannot see the unit from a whiteboard, and you cannot find the real problems from a document.

Software is the clearest domain to observe this because the output is directly verifiable. You run it, and it either works or it does not. That verifiability is what enabled real delegation: not AI-assisted coding where a human drives, but AI-primary coding where the human reviews, hardens, and architects.

Figure 2 — Observable changes in engineering and product teams
Before After What changed
PM output Specification document Working prototype Describe → show
Eng focus Build and implement Harden and architect Create → evaluate
Artifact consumer Human reader
docs, specs, decks
Agent + human
structured, parseable
Human-first → agent-first

One explicit design constraint in the feature store work was that the API had to serve both a human researcher in a notebook and an AI agent constructing calls from context. What became clear is that the two do not need the same polish. Agents tolerate rough edges that would frustrate humans. What they share is a need for semantic clarity. When the API is declarative and the services it exposes are named and abstracted precisely, both a human and an agent can reason about what to call and why. Ambiguity in the abstraction costs both equally. Clarity in the abstraction serves both equally. The design target is not identical, but the principle that gets you there is.

Every previous API was designed for humans, and agents worked around the friction. Here the friction disappeared for both, not because anyone optimized for agents, but because taking human cognitive load seriously enough produces the same result. That is not a productivity story. It is evidence of something categorically different happening at the human-machine boundary.

A more accurate mental model

AI systems are the first machines that operate in the human cognitive layer. Every previous technology operated below it. Machines processed physical matter, moved information, executed computation. Humans stood outside those processes, directing them. The interface between human and machine was always a translation: from human intent into machine language, from machine output back into human meaning.

Under the hood, the machine is doing mathematics. The semantic model it operates from is not biological. It has no feelings, no lived experience, no stakes in the outcome. But experientially, the translation layer is gone. These systems operate directly in the visible medium of human thought: language, reasoning, and analogy. They do not execute human intent. They engage with it. That is what makes them categorically different from previous technology, and what the existing mental models cannot accommodate.

AI systems that engage in the human cognitive layer are a new category. The rules of engagement are different, and frameworks built for previous technology produce systematically wrong results when applied here.

What changes operationally

If your product development process still treats written specifications as the primary alignment tool between product and engineering, you are optimizing for a workflow that AI has already made obsolete. The document describes intent. The prototype embodies decisions. Engineers can engage with decisions in ways they cannot engage with intent.

Teams building documentation formats and organizational outputs readable by both humans and machines today will have a structural advantage that grows as agent capabilities grow. The question is what you produce and who, or what, consumes it.

Productivity gains from AI do not distribute evenly. They concentrate in people and teams who have developed fluency in how to engage, evaluate, and iterate with these systems. That fluency is trainable. Treating it as a personal preference is how organizations leave the gains to chance.

Governance designed for deterministic systems will fail in ways that are hard to see until they are expensive. The relevant failure mode is not the AI producing a wrong answer. It is the human treating a fluent answer as a reliable one. Fluency and accuracy are not correlated in ways human cognition is built to detect. The governance question is not whether the output is authorized. It is whether the person receiving it is calibrated correctly.

What remains uncertain

The ceiling of these systems is not knowable from here. The trajectory of the human-AI dynamic as capability improves is not settled. Anyone who resolves that uncertainty with confidence is not reasoning from evidence.

The feature store work did not resolve cleanly into a success story. The prototype surfaced tradeoffs we had not anticipated. Some design decisions were wrong and got revised. That is the point. The AI did not produce answers. It produced a concrete artifact that made the right questions visible early enough to matter.

The constraint is not the technology.