We lived through the 3 AM pages, the cascading failures, and the post-mortems where everyone asked "why didn't anyone know how to handle this?" So we built the training tool we wish we had.
YouBrokeProd was born from a simple frustration: incident response training was terrible. Pages of runbooks nobody read. Tabletop exercises that felt nothing like the real thing. On-call engineers going into their first real outage with zero hands-on experience.
The best way to get good at incident response is to handle incidents. But that means your production users pay the price while you learn. That's backwards.
We built realistic incident simulations based on real post-mortems from across the industry - the actual failure modes that take down real systems. Not toy examples. Not contrived puzzles. The exact categories of problems that will wake you up at 3 AM.
Now you can build the muscle memory, learn the debugging patterns, and develop the calm confidence that separates good on-call engineers from great ones - without any users getting hurt.
Make every engineer confident handling production incidents - so the next real outage is just another problem to solve, not a panic spiral.
Pick an incident type and difficulty. You get realistic symptoms, logs, metrics, and access to the same debugging tools you use in real life.
Use the terminal to run commands, check logs, and analyze metrics. Race against the clock to find the root cause before the situation gets worse.
Apply the correct fix, earn points and reputation, then read the post-mortem. Every scenario has a detailed explanation of the real incident it was based on.
Join engineers who are building real incident response skills through realistic practice.
Start Training Free