Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

PagerDuty Champions: Driving Excellence in Incident Management

As one customer put it: “We spend 99% of our time on our ITSM platform and only 1% on PagerDuty.” This simple statement highlights the beauty of PagerDuty—it’s a low-maintenance tool that just works. However, even the best tools benefit from a little governance to ensure they’re being used effectively. Enter the PagerDuty Champions—a small, part-time team dedicated to keeping your incident management practices sharp and your teams productive.

Reducing alert fatigue in incident management

Picture this scenario: It's 2 AM. Your phone starts ringing. There's an incident in staging. You grumble, wake up, check your notifications, only to realize it does not require your immediate attention. After twenty minutes of lost sleep, you're back to bed, only for the cycle to repeat itself a few days later. Sound familiar? For many SREs and on-call engineers, incidents and alerts are unavoidable realities.

How Port helps supercharge incident.io workflows

Great incident response starts with structure, speed, and the right context. At incident.io, we make it easy for teams to declare incidents, follow battle-tested workflows, and communicate clearly from the moment something breaks to the moment it's fixed. But resolving incidents isn’t just about what happens in the heat of the moment: it’s about having the right metadata and service information at your fingertips. That’s where Port comes in.

Sync Pagerduty Rotation Oncall with Slack Usergroup

Sync Pagerduty Rotations Schedule , Oncall with Slack Usergroup using Pagerly In pagerly, Choose your team name and Slack Usergroup Handle which would automatically sync with Pagerduty Latest Oncall Pagerly would remove the previous oncall and add the latest one automatically. Anyone can mention the oncall using the slack usergroup handle and they would be notified instantly Add permanent users if you want to have in slack usergroup even though they are not oncall.

Why clear success criteria are critical when evaluating incident management tools

Choosing the right incident management tool is more than feature matching. For site reliability engineers, it’s about providing your team with efficient workflows, clarity around roles during incidents, and integrations that match your operational realities, especially when things inevitably go wrong. We've helped hundreds of companies migrate from their existing tooling over to a modern incident management platform.

What Grafana OnCall's Maintenance Mode Means for On-Call Teams

If you’ve been using Grafana OnCall OSS for incident management, you may have already heard the news—it’s now in maintenance mode and will be archived within one year. Grafana Labs recently announced that Grafana OnCall OSS is now in maintenance mode and will be archived in 2026. This means no new features, limited updates, and eventually, no support.

An Ode to OpsGenie: A Look Back at One of Ops' Most Loved Tools

With the news of OpsGenie shutting down and everyone looking for possible alternatives, we wanted to take a moment—not just to acknowledge the end, but to rewind and revisit the journey that brought them here. Over the years, it carved out a meaningful place in a competitive market, and in the workflows of thousands of teams. This is a look back at where it all began, what made OpsGenie different, and the mark it leaves behind.

Postmortem Template to Optimize Your Incident Response

A postmortem template is a structured tool for documenting incidents, understanding their causes, and learning how to prevent them in the future. This article explains the essential elements of an effective postmortem and how ilert can streamline this process, making your incident response more efficient. It also offers a downloadable version of a postmortem template that you can use if you haven't yet utilized an incident management platform in your organization.

Introducing Agentic CTO: executive oversight in every incident

At incident.io, we've always focused on empowering your team to manage incidents calmly, confidently, and effectively. Today, we’re introducing a powerful new addition to our suite of AI incident responders — one designed to bring a new layer of strategic oversight to your engineering organization: Agentic CTO.