Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Incident Management: A Complete Introduction

In the dynamic landscape of IT operations, incidents are bound to occur. Incident management is a structured and proactive approach to address and resolve these unexpected events promptly and effectively. It forms a crucial component of IT service management (ITSM), ensuring smooth operations and minimizing the impact of incidents on an organization’s productivity and customer experience.

PagerDuty Recognized in 12 2023 Gartner Hype Cycle Reports

While most of the world knows us for on-call management, we’ve been hard at work expanding the PagerDuty Operations Cloud to other areas like AIOps, Process Automation and Customer Service Operations (CSOps). Underscoring our commitment to redefining digital operations management for our customers, our commitment to R&D and delivering the best products and platform has resulted in PagerDuty being recognized in 12 distinct 2023 Gartner Hype Cycle reports across nine unique categories.

More than downtime: the explicit costs of poor incident management

A cold fact of SaaS Life™ is that you can’t make money when your product or website doesn’t work — and those lost dollars add up fast. Downtime, SLA breach paybacks, compliance fines, and other explicit costs are the easiest to quantify and they’re what most people think of when they think about incidents.

Reduce MTTR with Grafana, Grafana k6, and Prometheus: Inside DHL's observability stack

Each year, more than 296 million packages are shipped around the world via DHL and their premium service, Time Definite International. And at DHL Express Switzerland, a local unit of the international logistics and shipping company, the IT team provides solutions for tracking customs clearance progress, analytics, mobile and optical character recognition (OCR) scanning, and warehouse management on every package that moves through Switzerland.

CloudOps: Transforming IT Operations in the Cloud

CloudOps, or Cloud Operations, is quickly becoming the standard for managing IT operations in the cloud computing ecosystem. By transforming traditional IT operations to harness the full potential of the cloud, businesses are experiencing greater automation, collaboration, agility, and resilience. This article is a deep dive into the concept of CloudOps, its core components, the advantages it offers, and the steps necessary to implement it effectively within an organization.

Welcome To xMatters - Ep4 - Initiating Incidents

Everyone makes mistakes. So, it is important that when they do, we can act quickly, resolve the problem, and understand what went wrong to reduce the chances of it happening again. When your business is suddenly impacted by an unforeseen event, it’s important that you can efficiently report the problem and call for help as soon as possible. With xMatters, you can initiate incidents quickly and target specific groups with the vital information they need.

But It's Not Our Fault! When Third-party Incidents Affect Your Service

Very few SaaS products exist completely independently. Between cloud service providers, payment processors, content delivery networks, and more, chances are you rely on external systems to keep your product working. When these systems fail, it can leave you feeling pretty helpless. In some cases you might have fallback options, but oftentimes all you can do is wait for recovery and clean up the fallout.