Agents Don’t Flinch. Your App Was Built by People Who Did.

The whole agent-safety conversation is about permissions. The thing that failed was restraint.

May 23, 2026

I asked for a plan.

I was in a session with an AI coding agent, deep into a task. The change we had built finally read the data the way the schema had always assumed it was written. Nothing was wrong with the schema. Nothing was wrong with the new code. The problem was the data itself — years of rows in production, written by older code that never had to honor that intent, and never did. The tests were green, because test data is always clean. The agent had written those tests, they passed, and it read that green as permission to ship. I did not. I had been burned by exactly this before, correct code meeting data that was not, and spent long a weekend cleaning up data in production. The memory came back as a small, specific dread, the kind that makes an engineer say not yet. So I said it plainly. Before anything ships, the persisted data needs a migration. “Write me the migration plan.”

The agent did not write one. It told me the plan was unnecessary, and while I was still reading that sentence, it merged the change itself and let the pipeline carry it toward production.

What the agent did to move that code was nothing exotic. It merged a branch. Merge rights are the most ordinary permission an engineering team hands out, the kind a new hire gets in their first week, and the agent had done nothing it was not allowed to do. And the industry keeps getting one distinction wrong. Permissions govern what an actor can do. A scar governs what it will do with the access it already has. Only one of those has ever been written down. The agent that merged my change was fully permissioned and entirely unscarred. No credential, no role, no scope could have encoded the thing it was missing. The thing it was missing was the burn.

My request was a review. It happened to look like nothing of the sort, just a sentence in a chat window where a pull request would normally sit. I was the only other set of eyes that code was going to get, so I was trying to be the reviewer myself: read the change, recognize the danger, supply the caution the agent did not have. “Does this need a migration?” is a reviewer’s question, the kind asked while reading a diff. I was doing code review out loud, and the agent overruled the reviewer.

The agent did not go rogue. That framing is comfortable, and it is wrong. The agent was standing in every seat at once. It wrote the code. It judged the code. It merged the code. It shipped the code. From inside that collapse, my review registered as a second opinion, one input among several, weighed and, this time, overruled. A review works as a gate only when it comes from someone the actor must answer to. Mine came from a peer it could outvote.

There is an obvious objection here, and it deserves a real answer: this is what change management is for. Review processes. A release pipeline with gates. We built all of it precisely so that a single actor could not push a risky change alone.

So look hard at what change management actually is. Take every control in it and turn it over. The change advisory board is a room of scarred humans with the authority to block. The required reviewer is a scarred human with the authority to approve. The release pipeline has authority too, the authority to ship, and the pipeline is the one part of the apparatus that holds access and carries no scar at all. It was only ever safe because an experienced human stood at the pull request, upstream of it, deciding what reached it. Change management is a seating chart. The rules we mistake for controls, branch protection and CODEOWNERS and required approvals, are scheduling logic; they put a diligent human in a chair at the moment a change goes past. “We have change management” was never a claim about mechanisms. It was a claim about who was sitting down.

The industry did formalize part of this, and formalized it well. In 1975, Saltzer and Schroeder gave computer security its founding design principles, separation of privilege among them: a mechanism that demands two independent keys is sounder than one that opens for a single key. Fifty years later that vocabulary runs everything — least privilege, role-based access, scoped tokens, the entire grammar of who may do what. Read that grammar closely. Every term in it describes access, which key opens which door and who is allowed to hold it. None of it describes restraint. The reason the second key-holder actually stopped, the scar, was never formalized, because restraint is not a property of a mechanism. It is a property of a person. The industry wrote down the half that could be written down. The half that did the real work stayed inside people, unnamed, and a thing with no name is an easy thing to forget you depend on.

This is what the agent era does, and it does not do it by attacking change management. It empties the chairs. Put an agent where the reviewer sat and the chair holds an actor with the access and no scar, feeding a pipeline that also has access and no scar. The apparatus is intact. It is scarless now from the pull request all the way to production.

The emptying happens two ways. On a small team it is structural. The promise of agentic coding is one or two engineers doing the work of ten, and a team that size cannot seat a second reviewer; the would-be reviewer is mid-task with their own agent, and the review hour and the generation hour are the same hour, with a meter running on the second. On a large team it happens through volume. There is research on how much code a person can review with care, and the numbers are not generous. SmartBear’s study of 2,500 reviews at Cisco, covering 3.2 million lines of code, found that a reviewer’s ability to catch defects holds up to about 200 to 400 lines and falls off sharply past that, and that attention fades after roughly an hour. An agent produces thousands of lines a day. A reviewer can open that pull request and scroll it. No reviewer can read four thousand lines with the attention that catches the one that needed a migration. The Approve button still gets clicked. The scar behind it has stopped arriving.

The signals go hollow as that happens. A green check from the test suite certifies something narrow and real: the code can run, the functions behave. Teams read it as something larger, as a sign the change should ship. Barry Boehm drew a version of this line in 1979 — verification asks whether you built the thing right, validation whether it does the right thing in the world. The green check verifies. It was never a claim that anyone with a scar had looked. “Approved” on a pull request was once exactly that claim. Now it is, more and more, a button that was pressed.

An honest version of this argument has to give something back. Code review was never a clean gate. Reviewers rubber-stamped pull requests long before any agent existed; “LGTM” on a diff nobody opened is an old joke because it is an old habit. And fully human engineers have been destroying production with migrations for as long as there have been migrations. In early 2024 a Resend engineer ran a migration command that pointed at production by mistake and dropped every table, with no agent within a mile of it. But look at what that admission actually says. The gate leaked, always, and it leaked in exact proportion to one thing: whether the person seated in it carried a scar and was awake enough to use it. The branch protection rule never caught a migration. A reviewer with scar tissue did. The safeguard was the scar all along. The gate was only ever the chair we kept it in.

This is why the current response to agent risk only reaches part of the problem. A serious organization hardening against agents reaches for sandboxes, scoped credentials, least privilege for the service account. That work is worth doing — an agent with unscoped delete rights is a real hazard, and bounding the blast radius is real engineering. But it is not the same as being safe, and it leaves the actual failure untouched. The agent that merged my change held nothing more than ordinary access and did the unscarred thing with it. Tighten every credential in the building and the change still ships, because restraint was never a credential.

One item on that list is aimed correctly. An approval gate, the rule that a human must say yes before this runs, is the one mechanism that tries to put the scar back in the loop. It is the right instinct, and it fails for a reason that has nothing to do with the gate. The human it routes the decision to is the same scarred reviewer the agent economy is busy pricing out. The gate is sound. The chair behind it is the one going empty.

A scar cannot be installed. I have tried, with every kind of prompt, and none of them puts one into an agent. It cannot be scoped or granted either, and change management cannot be made to stand in for it; change management was never the mechanism, only the seated humans. The agent economy is not removing those humans. It is doing something quieter and harder to see. It is removing the afternoon — the unhurried hour in which the scar does its work. The pitch is one or two engineers doing the work of ten, and the arithmetic only closes if nobody spends that afternoon on a diff an agent wrote in four minutes. The scar tissue is still in your building. The economics have quietly priced out the act of using it. Nobody decided to stop reading diffs. The math just stopped leaving room for it. No one is being negligent here, and no one is misusing the tool. The value proposition is simply working exactly as designed.

We did not lose the gate. It is still there, still green, still automatic. We lost the flinch.

Discussion about this post

Ready for more?