Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The fastest and most robust path to incident declaration from monitoring tools

Here’s a crazy question: why do we still require a human to manually declare an incident for the things that we know are incidents? If we have enough confidence to build SLOs and high-severity alert routes for these specific scenarios, why are we still asking a human to confirm it’s an incident and get the assembly process in motion? Isn’t that just another button to push when we could be problem solving instead?

Process Automation v4.12.0 and v4.13.0 Release Notes

Product Managers Jake Cohen and Forrest Evans are back to update us on what’s new in the 4.12.0 and 4.13.0 releases of PagerDuty Process Automation. New in these releases are features to support #Kubernetes automation, managing resources in multiple #AWS accounts, and a new plugin suite for Sensu.

7 Types of Incident Response Tools

Incident response tools are software applications or platforms designed to assist security teams in identifying, managing, and resolving cybersecurity incidents. Incident response is a crucial part of an organization’s cybersecurity strategy, making it possible to detect threats, analyze vulnerabilities, respond to attacks, and recover from security breaches. Incident response tools are vital for safeguarding organizations against evolving cyber threats.

Welcome To xMatters - Ep 2 - Organizing Your Teams

Even the most gifted and powerful people could do with a helping hand now and again. Thankfully, they are not alone in the multiverse! xMatters has made the process of organizing your teams and creating a customized on-call schedule as if by magic. This way, when help is urgently needed, the appropriate on-call individual will quickly join the team to save the day. To learn more about organizing your teams with xMatters, check out our tutorial videos on how to get started.

How Sony Interactive Entertainment drives better IT operations based on alert data

Sony Interactive Entertainment (SIE) is a multinational video game and digital entertainment company owned by global conglomerate Sony. SIE primarily operates the PlayStation brand of video game consoles and products.

Learning from incidents is not the goal

Learning from incidents has become something of a hot topic within the software industry, and for good reason. Analyzing mistakes and mishaps can help organizations avoid similar issues in the future, leading to improved operations and increased safety. But too often we treat learning from incidents as the end goal, rather than a means to achieving greater business success. The goal is not for our organisations to learn from incidents: it’s for them to be better, more successful businesses.