AI-Related Errors Have Caused Two Outages On AWS Products: Report

AI is writing more and more production code at companies than ever before, but not all the code it writes is working out perfectly.

Amazon Web Services’s products have suffered at least two outages tied to errors made by its own AI coding tools, raising internal questions about the company’s aggressive push to deploy these systems, according to a Financial Times report citing people familiar with the matter.

The most notable incident occurred in mid-December, when AWS engineers gave the company’s Kiro AI coding assistant — an agentic tool capable of taking autonomous actions on behalf of users — permission to make certain changes to a system. The AI determined that the best course of action was to delete and recreate the environment entirely, triggering a 13-hour service interruption for customers. A second, separate incident has also been linked to AI tool errors, though Amazon said it did not affect any customer-facing AWS services.

The outages to these products have been significant enough to prompt some AWS employees to raise doubts internally about the pace at which the company is rolling out AI coding assistants, particularly agentic ones that can act with minimal human oversight.

Amazon, however, is pushing back on the framing. The company called it a “coincidence that AI tools were involved,” arguing that the same issues could have occurred with any developer tool or manual action. Amazon also characterized the December incident as an “extremely limited event” that affected only a single service in parts of mainland China, and maintained that it has seen no evidence that mistakes occur more frequently when AI tools are involved compared to human engineers. “In both instances, this was user error, not AI error,” the company said.

“This brief event was the result of user error—specifically misconfigured access controls—not AI,” an AWS spokesperson said. “The service interruption was an extremely limited event last year when a single service (AWS Cost Explorer—which helps customers visualize, understand, and manage AWS costs and usage over time) in one of our two Regions in Mainland China was affected. This event didn’t impact compute, storage, database, AI technologies, or any other of the hundreds of services that we run. Following these events, we implemented numerous additional safeguards, including mandatory peer review for production access. Kiro puts developers in control—users need to configure which actions Kiro can take, and by default, Kiro requests authorization before taking any action,” they added.

The incidents nonetheless arrive at a sensitive moment for the broader tech industry. Companies across the sector are racing to integrate agentic AI systems into core engineering workflows, betting that the productivity gains will outweigh the risks. But as these tools are granted greater autonomy over production systems, the consequences of a misstep grow proportionally larger. An AI that confidently deletes and rebuilds a live environment — however logical the decision may seem to the model — is a different category of risk than a developer typo. The question the industry will have to grapple with is not just whether AI tools make more mistakes than humans, but whether the nature of those mistakes is fundamentally harder to predict and contain.

Posted in AI