Master production incidents through realistic simulations. Debug databases, wrestle Kubernetes, and save the day - without the 3 AM wake-up calls.

From junior devs to SRE directors - here's what the community is saying.
just spent 2 hours on YouBrokeProd instead of sleeping. diagnosed a connection pool leak in 47 seconds. my team lead is going to love this
"We replaced our entire incident response training program with YouBrokeProd. Our MTTR dropped 40% in 3 months."
the kubernetes crashloop scenario is TOO real. i got vietnam flashbacks from my last on-call shift. 10/10
finally a way to practice incident response without the 3am adrenaline. my juniors went from deer-in-headlights to confident in 2 weeks
"Our team treats the daily challenge like Wordle. The Slack channel is chaos every morning comparing times."
showed this to my CTO and now the whole engineering org has accounts. the leaderboard is getting competitive
just spent 2 hours on YouBrokeProd instead of sleeping. diagnosed a connection pool leak in 47 seconds. my team lead is going to love this
"We replaced our entire incident response training program with YouBrokeProd. Our MTTR dropped 40% in 3 months."
the kubernetes crashloop scenario is TOO real. i got vietnam flashbacks from my last on-call shift. 10/10
finally a way to practice incident response without the 3am adrenaline. my juniors went from deer-in-headlights to confident in 2 weeks
"Our team treats the daily challenge like Wordle. The Slack channel is chaos every morning comparing times."
showed this to my CTO and now the whole engineering org has accounts. the leaderboard is getting competitive
Connection pool exhaustion, replication lag, deadlocks, and that one query that's doing a full table scan.
Pods crashlooping, OOMKilled, PVCs stuck in Pending, and networking that makes no sense.
Credential leaks, suspicious traffic, rate limiting gone wrong, and that JWT that expired 6 months ago.
Pick an incident type and difficulty. You'll get realistic symptoms and access to logs, metrics, and debugging tools.
Use the terminal to run commands, check logs, and analyze metrics. Find the root cause before time runs out.
Apply the fix, earn points, and climb the leaderboard. Share your victory and challenge your team!
Join thousands of engineers who are leveling up their incident response skills - one simulated outage at a time.
Start Training Free β