Here’s a common situation that plagues many development teams. You run an application through your CI/CD pipeline and all of the tests pass, which is great. But when you deploy it to a live target environment the application just does not function as expected. You can’t always predict what will happen when your application is pushed live. The solution?
It’s 2 AM and you’re paged when you’re still awake – how well can you find what you need to fix the latest mistake? When the incident begins it might only be impacting a single service, but as time progresses, your brain boots, the coffee is poured, the docs are read, and all the while as the incident is escalating to other services and teams that you might not see the alerts for if they’re not in your scope of ownership.