Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Introducing Runner Replicas: Scalable, Reliable Automation for Modern Ops

When you’re responsible for the reliability of complex systems, the execution layer of your automation is not something you want to think about—it should just work. Whether you’re deploying code, patching servers, or responding to an incident at 3 a.m., your automation engine should be as resilient and scalable as the infrastructure it’s operating on.

Service Intelligence Is the Future of Proactive Incident Management

This is the third post in our series on the future of incident management, which builds upon The Future of Incident Management: Your Blueprint for Operational Excellence and How Native Process Automation and Auto-Remediation Drive Operational Excellence. Organizations are facing increasing complexity across their IT landscapes.

What Does a Customer Support Technician Do?

A customer support technician is a technical professional who helps customers solve issues with hardware, software, and IT systems. They’re often the first point of contact when something breaks, whether that’s a computer glitch, a network outage, or a software error. The role is all about troubleshooting, guiding users through solutions, and making sure technology runs the way it’s supposed to.

Demo Roundups! Breaking the MTTR Bottleneck: Automating Diagnostics for Modern Incident Response

Discover how PagerDuty Automation eliminates the manual triage bottleneck that's slowing down your incident response. In this demo, you'll see how automating diagnostics can compress resolution times from hours to minutes by instantly analyzing your environment, correlating events across systems, and identifying root causes with transparent AI reasoning.

My Criteria for Automated Incident Response Tools

Managing incidents manually isn’t realistic when their number keeps growing. That’s where automated incident response tools come in. They handle routine tasks so you can focus on actual problem-solving. In this blog, I’ve put together a list of the 9 best automated incident response tools for you. I looked at each one based on four key areas of the incident response process. This will help you see how they handle everything from start to finish.

The Next Wave of Automation Makes More Room for Humans

When a system goes down, the impact isn’t just technical. It’s the people in the center of it who adapt, improvise, apply their judgment, and keep the business moving forward. I’ve worked in operations for more than 25 years, and one thing I’ve learned is that in any system, it’s the humans who are the truly resilient part.

From plan to practice to prevail: my conversation with Chris Johnson, host of the MSSP 1337 podcast

In cybersecurity, prevention often gets most of the attention. But no matter how strong your defenses are, incidents will happen. And how you respond in that moment of truth defines resilience. That’s why I really connected with a framework Chris Johnson shared with me on the MSSP 1337 podcast, the 3 P’s – plan, practice, prevail.

PagerDuty Joins Glean's AI Ecosystem: Unlocking More Seamless Incident Management

Today, we announced that PagerDuty is now officially part of the Glean MCP Directory! This partnership brings together two leaders in AI-powered productivity and operations, making it easier than ever for organizations to connect PagerDuty’s incident data directly to any AI tool or agent in their stack through the standardized Model Context Protocol (MCP). PagerDuty is the first (and currently only) incident management partner that is available via Glean’s AI ecosystem.