Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

On-Call Management Models

In today's fast-paced digital landscape, incident management is crucial for maintaining operational excellence. During this process, on-call management models play a critical role in promptly addressing and resolving incidents. On-call management involves the organization of teams to ensure prompt response and resolution of incidents and is necessary to streamline incident resolution, ensure 24/7 availability, and allow for fair and transparent on-call rotations.

The Unplanned Show, Ep. 22: CSOps at PagerDuty with Arturo Suarez Martin

Even with the best monitoring in the world, some customer-impacting issues still go undetected and are ultimately reported by customers. In this episode, we'll hear from PagerDuty's Senior Director of Global Support, Arturo Suarez Martin, about the journey that PagerDuty has been on to tighten feedback loops between Customer Support and Engineering and mitigate the risk of poor customer experiences.

Ping Command: A Comprehensive Guide to Network Connectivity Tests

The ping network test, a core utility since the 80s, plays a crucial role in confirming connectivity between IP-networked devices. In this guide, we'll delve into what the ping command is, how to run a ping network test, common IP addresses to ping, interpreting results, and troubleshooting errors.

Events vs. Alerts vs. Incidents

Event. Alert. Incident. These terms are bandied about, often interchangeably, in IT operations management. Broadly speaking, they all refer to situations where something is potentially amiss and needs to be investigated and resolved. Each of these three words does, however, have a distinct definition. Because they are used in scenarios where clear communication and timeliness are critical, it’s important to understand the differences and use them appropriately.

Reducing the burden of incident response on your teams

In this webinar, a panel of engineering leaders, including Chris Evans, CPO at incident.io, share how they reduce the burden of incident response for their teams. They advocate for a culture of shared responsibility across the board, offering practical strategies to educate the business about engineering practices during the chaos of an outage.