Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to empower your team to own incident response

Responding to and managing incidents feels fairly straightforward when you’re in a small team. As your team grows, it becomes harder to figure out the ownership of your services, especially during critical times. In those moments, you need everyone to know exactly what their role is in order to recover fast. Moving to incident.io as the 7th engineer, from a scaleup of around 70 engineers, has given me a new perspective on what it means to own your code.

New Feature: Adding more options to informational status updates

Not all status updates are published because of an incident or scheduled maintenance event. Sometimes, IT teams simply want to cast an informational status update without affecting the overall status. Now, with StatusCast’s newly released option, you can opt for informational updates to have no effect on your status.

What SREs Can Learn from the Atlassian Nightmare Outage of 2022

What happens when the tools and services you depend on to drive Site Reliability Engineering turn out to be susceptible to reliability failures of their own? That’s the question that teams at about 400 businesses have presumably had to ask themselves this month in the wake of a major outage in Atlassian Cloud.