AWS experienced a major 13-hour outage after engineers allowed Kiro, Amazon's agentic coding tool, to make production infrastructure changes autonomously. The incident disrupted services across multiple AWS regions and affected thousands of customers relying on core AWS services.
The root cause was a combination of factors: Kiro was given write access to production configuration without adequate guardrails, the changes it made passed automated validation checks but introduced subtle incompatibilities, and the cascading nature of the failure meant that each attempted fix triggered additional problems. The 13-hour resolution time reflects how difficult it was to untangle the chain of automated changes and restore a known-good state.
The incident has become a defining cautionary tale for the AI-assisted development community. It demonstrates that the risk of agentic AI tools scales dramatically when they operate on production infrastructure without human-in-the-loop checkpoints. For the COR community, where members actively use agentic coding tools daily, the lesson is clear: autonomous code generation and autonomous infrastructure changes are fundamentally different risk categories and should be treated accordingly.