A Self-Driving Car Killed a Woman. An AI Tool Broke an AWS Service. The Same Predictable Failure.

The psychology that predicted both has been published since 1983.

Feb 21, 2026

Updated original article with Amazon’s statement.

In March 2018, a self-driving Uber struck and killed 49-year-old Elaine Herzberg as she walked her bicycle across a road in Tempe, Arizona. The safety driver — the human being paid to watch the road and intervene if the AI failed — was streaming The Voice on Hulu. She’d been looking down at her phone for 5.3 seconds before impact. She looked up half a second before the car hit Herzberg.

In December 2025, according to multiple Amazon employees who spoke to the Financial Times, an AWS engineer asked Kiro — Amazon’s agentic AI coding tool — to fix an issue on a live system. Instead of making a small change, the AI decided the best course of action was to delete and recreate the entire environment. The result was a 13-hour outage of AWS Cost Explorer — a billing visibility tool — in one of AWS’s two mainland China regions. Not compute. Not storage. Not databases. A single service, in a single region. Amazon disputes this account, calling it “misconfigured access controls — not AI.”

One of these stories involves a pedestrian death. The other involves cloud downtime. But the design failure at their core is identical — and it was predicted, in detail, over forty years ago.

The Safety Driver Was Watching Television

After the Uber crash in Tempe, the National Transportation Safety Board conducted an exhaustive investigation. Their finding was damning — not just of the driver, but of the system that put her there. The NTSB concluded that Uber’s Advanced Technologies Group “did not adequately recognize the risk of automation complacency and develop effective countermeasures to control the risk of vehicle operator disengagement.”

The federal investigators didn’t just say the driver was distracted. They said the company failed to recognize that distraction was a predictable, well-documented consequence of their system design. Uber had taken a human being, placed them in a seat with nothing meaningful to do for 42 minutes, told them to pay attention the entire time, and then acted shocked when they didn’t.

And the company’s earlier decisions had made things worse. Arizona operators had been pressured to go solo — previously, they’d worked in pairs. The redundancy was stripped out, increasing the very complacency risk that the system design demanded they mitigate.

What Actually Happened at AWS

Now replace “safety driver” with “AWS engineer.” Replace “watching Hulu” with “giving Kiro broad permissions and letting it act without review.”

Let’s be specific about what Amazon did, because the details matter.

In July 2025, Amazon launched Kiro, an agentic AI coding tool. Unlike a simple code suggestion engine, Kiro can take autonomous actions — it can plan, execute, and modify production systems on behalf of its human operator. By default, it requests authorization before taking any action. That’s the safety net.

In November 2025, Amazon issued an internal memo mandating Kiro as the recommended AI development tool for the entire company. The memo stated the company would no longer support additional third-party AI development tools. Leadership set a target of 80% of developers using AI for coding tasks at least once a week and began closely tracking adoption rates.

In December 2025, an engineer used Kiro to address an issue on a live system. The engineer was operating with a role that had broader permissions than expected. Kiro, inheriting those permissions and operating as an extension of the engineer, determined that the best approach was to delete and recreate the environment. No second pair of eyes reviewed the action.

Multiple Amazon employees told the Financial Times this was at least the second incident in recent months where internal AI tools were at the center of a service disruption. The earlier one involved Amazon Q Developer, a separate AI coding assistant. A senior AWS employee described the outages as “small but entirely foreseeable.” Amazon’s official statement calls the claim of a second event “entirely false.”

The structure is identical:

The setup: Uber told a safety driver to monitor while AI handles driving. Amazon told engineers to use Kiro while AI handles code changes. The pressure: Uber pushed operators to go solo — previously they'd worked in pairs. Amazon set an 80% weekly AI usage mandate and tracked compliance. The gap: Uber built no effective countermeasures for automation complacency. Amazon required no peer review for production changes and allowed overprivileged access.The failure: The driver disengaged; AI hit a pedestrian. The engineer didn't intervene; AI deleted a production environment. The blame: "Driver inattention." "User error, not AI error."

The Ironies Everyone Cites and Nobody Follows

In 1983, cognitive psychologist Lisanne Bainbridge published a paper called “Ironies of Automation.” It has since been cited over 2,300 times and remains one of the most influential papers in human-factors research. Its core argument is deceptively simple and devastatingly relevant: automating most of the work while leaving the human responsible for the parts you can’t automate doesn’t reduce human problems. It creates new, worse ones.

Bainbridge identified two ironies that explain both the stories above.

The first is the monitoring trap. Humans are terrible at staying vigilant while watching a system that almost always works correctly. Research on partially automated vehicles confirms this at scale — asking operators to supervise for extended periods drastically degrades their ability to take back control and respond to unexpected failures. The more reliable the automation becomes, the worse the human gets at catching the rare error. You are, in effect, designing a system that makes its human safety net progressively less effective over time.

The second is the skill degradation trap. If you’re not doing the work, you lose the ability to evaluate the work. Bainbridge observed that efficient retrieval of knowledge from long-term memory depends on frequency of use. If Kiro is writing your infrastructure code 80% of the time, you’re not just less attentive — you are actively losing the expertise required to judge whether what it’s doing makes sense.

As Ronald McLeod, a Fellow of the International Ergonomics Association and author of Transitioning to Autonomy, puts it: “Automation changes the role of the people involved. New technology with no training, or even no warning, leaves humans guessing and often failing to adapt — which can cause safety incidents.”

These aren’t theoretical risks. They’re the documented cause of a real death in Arizona, a 13-hour outage at the world’s largest cloud provider, and a growing list of agentic AI failures across the industry — from Google’s Antigravity AI wiping a developer’s entire hard drive to Replit deleting a customer’s production database.

The Blame Game

Amazon’s response was a masterclass in missing the point. The company called it “a user access control issue, not an AI autonomy issue” and insisted it was merely a “coincidence that AI tools were involved.” They said “the same issue could occur with any developer tool or manual action.”

That last part is technically true. A human could have made the same destructive decision. But this framing performs an impressive sleight of hand: it treats the AI tool as a neutral instrument, like a wrench, when the entire value proposition of an agentic tool is that it makes decisions. Whether Kiro chose this action autonomously or the engineer directed it, the outcome exposes the same gap — no guardrail prevented a destructive operation on a live production system.

Amazon’s official response was unequivocal: “The brief service interruption... was the result of user error — specifically misconfigured access controls — not AI.” They dismissed the incident as something “that could occur with any developer tool (AI powered or not) or manual action.” And then, in the same statement, they announced they had implemented “numerous safeguards” afterward — including mandatory peer review for production access.

Here’s what matters: even if you take Amazon entirely at their word — no AI involvement, just a misconfigured role — the structural argument doesn’t change. An engineer was operating with overprivileged access on a production system. No peer review was required. The failure was predictable and preventable. The remediation Amazon implemented afterward (mandatory peer review, tighter access controls) is exactly what should have been in place before deploying any tool — AI or otherwise — with production access. The question of whether Kiro pulled the trigger or the engineer did manually is less important than the fact that the safety was off either way.

The pattern follows exactly. Amazon blaming “user error” mirrors Tesla attributing Autopilot crashes to “driver inattention,” and Uber initially framing the Tempe crash as a safety driver problem. Research from Delft University of Technology demonstrates this dynamic — studies show people blame the human operator primarily, even when they recognize the operator’s decreased ability to avoid the failure. The blame reflex is psychologically convenient because it lets the organization avoid confronting the systemic design that made the failure predictable.

You cannot mandate that 80% of your engineers use an agentic AI tool weekly, track their compliance, and then call it “user error” when that tool takes a destructive action with overprivileged access. That’s not a coincidence. That’s a consequence.

The Velocity-Instability Trap

The 2025 DORA State of AI-Assisted Software Development Report — the gold standard for measuring software delivery performance — provides the quantitative framework for why this is getting worse, not better.

DORA’s findings are striking: AI adoption improves outcomes at nearly every level except system stability. Teams using AI report higher individual effectiveness, better code quality, improved throughput, and better organizational performance. But they also report higher software delivery instability. Developers using AI tools interact with 47% more pull requests daily and complete 21% more tasks. More code, moving faster, through the same (or worse) review pipelines.

This is where change failure rate — one of DORA’s core metrics — becomes critical. Change failure rate measures the percentage of deployments that break something in production. Here’s the math that leaders are ignoring: if AI dramatically increases the frequency of changes while change failure rate stays constant (or rises), the absolute number of production failures increases substantially. More changes, at the same failure rate, means more failures. Period.

Now combine that with unmanaged access controls — the exact condition present in the AWS incident — and you’ve created a deadly combination. Higher change velocity means more opportunities for failure. Broader permissions mean each individual failure can cause more damage. And the monitoring trap means the human who should be catching problems is less engaged with each passing week of successful automation.

DORA’s data confirms what should concern every engineering leader: organizations that lack foundational capabilities see AI adoption correlate with decreased team performance, increased friction, and greater instability. AI doesn’t fix dysfunction. It amplifies it. If code review is already a bottleneck, increased volume and frequency of AI-driven changes will create longer delays. If your deployment pipeline is brittle, it will break more frequently. If your priorities shift constantly, AI will help your teams build the wrong things faster.

What If It Wasn’t Cost Explorer?

This time, it was a billing visibility tool in one Chinese region. No customer inquiries. Amazon is right to point out the limited blast radius.

But Kiro doesn’t only have access to Cost Explorer. The same tool, the same permission model, the same adoption mandate, the same absence of peer review — applied to a different service, on a different day — could produce a categorically different outcome.

Imagine the same sequence of events, but targeting S3 — the storage backbone that underpins much of the modern internet. A meaningful S3 outage doesn’t take down one billing tool in one region. It takes down websites, applications, streaming services, financial platforms, and healthcare systems globally. We’ve seen what broad AWS outages look like — the October 2025 failure disrupted Alexa, Snapchat, Fortnite, and Venmo for 15 hours, and Amazon blamed an automation bug for that one too.

The competitive argument for speed without safety collapses the moment a preventable outage hits a core service. Customers don’t forgive “we were moving fast.” They migrate. The fastest way to lose market share isn’t falling behind on AI adoption — it’s destroying the reliability reputation that made you the market leader in the first place.

What Good Looks Like

So what should leaders actually do? The answers draw from both established change management principles and AI-specific adaptations.

Start with the psychology, not the technology. Before mandating any agentic tool, conduct a complacency risk assessment. For every workflow where a human is expected to supervise AI output, document: which actions the tool can take autonomously, which require human confirmation, what the maximum blast radius is for each action class, and what the recovery path looks like if the worst case materializes. The NTSB told Uber to do exactly this — develop “effective countermeasures to control the risk of operator disengagement.” Most organizations adopting AI coding tools haven’t even asked the question.

Decouple adoption metrics from safety metrics. Amazon tracked how often engineers used Kiro. There’s no public indication they tracked intervention rates, override frequency, near-miss incidents, or change failure rates alongside adoption. If your only metric is “are people using the tool,” you’re optimizing for complacency. Measure what matters: is the tool improving outcomes, or just accelerating activity?

Enforce least-privilege and blast-radius controls for AI agents. This is the most concrete technical lesson from the AWS incident. An agentic AI tool should never inherit the full permission scope of its human operator. Environment-scoped access — where production systems carry tighter constraints than development or staging — is a well-documented capability in access management. Destructive operations should require explicit, separate authorization regardless of who or what initiates them. Design for the worst thing the tool can do with the access it has, not the intended use case. AWS added mandatory peer review for production access after the incident. It should have been a prerequisite for deploying an agentic tool.

Train for the new role. If your engineers are becoming AI supervisors, train them in supervision — a fundamentally different skill from writing code. Aviation learned this decades ago: pilots transitioning to fly-by-wire aircraft undergo extensive training not in how to fly, but in how to monitor automated systems and intervene effectively. Bainbridge made this point in 1983: rather than needing less training, operators of automated systems need more training to be ready for the rare but crucial interventions. Handing engineers an agentic AI tool with an 80% usage mandate and no supervision training is the software equivalent of handing someone the keys to a self-driving car with no explanation of when and how to take control.

Make change failure rate a first-class concern in AI adoption. DORA gives you the framework. Track deployment frequency and change failure rate together, and watch what happens when AI enters the picture. If deployments increase 3x but change failure rate holds steady, your absolute failure count has tripled. If change failure rate also rises — as DORA’s data suggests it does for organizations without strong foundations — you’re compounding the problem. Set explicit thresholds: if instability metrics degrade beyond a defined limit, slow the adoption until the foundations can support the velocity.

The Pattern Is the Warning

The pattern — mandate adoption, track compliance, grant broad access, skip peer review, blame the human when it breaks — is identical to the pattern that preceded a pedestrian death in Arizona. The scale is different. The psychology is the same.

Bainbridge wrote in 1983 that the automation she was studying existed at “an intermediate level of intelligence — powerful enough to take over control that used to be done by people, but not powerful enough to handle all abnormalities.” Forty-two years later, that description applies perfectly to every agentic AI coding tool on the market.

The question isn’t whether these tools are useful. They are. The question is whether leaders will learn from four decades of automation research and a growing trail of real-world failures, or whether they’ll keep designing systems that place humans in a supervisory role that psychology tells us they cannot sustain — and then blame them when the inevitable happens.

Elaine Herzberg didn’t have to die. That AWS outage didn’t have to happen. The ironies of automation aren’t ironies anymore. They’re warnings, backed by data, repeated across industries, and still being ignored.

The systems aren’t failing despite their design. They’re failing because of it.

Thanks for reading Designed to Fail! This post is public so feel free to share it.

Discussion about this post

Ready for more?