Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Demo Roundups! Identifying System Weaknesses to Improve Resilience

How do you proactively identify weaknesses before they lead to costly incidents? Find out how PagerDuty empowers teams to uncover vulnerabilities, streamline incident response, and enhance operational performance to build more resilient systems. Host: Mandi Walls, DevOps Advocate at PagerDuty Guests: Alex Nauda, CTO Nobl9; Rich Lafferty, Principal SRE at PagerDuty.

War rooms? Finger-pointing? We can help you.

Say goodbye to late-night firefighting and endless finger-pointing. Explore how Catchpoint helps eliminate the need for “war rooms” by giving teams the visibility and insight they need to detect, diagnose, and resolve internet performance issues—before they impact users. Learn how Internet Performance Monitoring (IPM) empowers IT, SRE, and DevOps teams to: Pinpoint root causes across the entire internet stack Collaborate effectively across teams and vendors Proactively prevent outages and performance degradation Replace reactive chaos with data-driven confidence.

Transforming the Incident Lifecycle With AI Agents

We’re in the midst of a fundamental shift in how organizations run operations. 51% of companies have already deployed AI agents. What was once reactive and manual is becoming intelligent, automated, and AI-driven. The organizations that embrace this shift gain more than just operational efficiency; they develop a strategic competitive advantage that directly impacts business outcomes.

Operational excellence in the age of AI and Automation

The future of operations is here with PagerDuty's groundbreaking AI and automation innovations. Learn how PagerDuty AI agents, powered by PagerDuty Advance, and new use cases like security incident management and LLMOps can help your organization achieve operational excellence to reduce cost, mitigate the risk of outages, and accelerate innovation.

xMatters Zaxxon Release

Incident management can sometimes feel like piloting a spaceship through enemy fortresses while trying to hit as many targets as possible without, you know... game over. But, even if your response processes don't quite involve pixelated robots and laser beams like in the video game, Zaxxon, our latest release is here to make sure your feet stay firmly on the ground whatever incidents may appear in your stratosphere! Let’s take a look...

From AI-pocalypse to AI-driven Resilience: 4 Lessons from The Last of Us

Critically-acclaimed TV show The Last of Us is back. As a huge fan, I find striking parallels between the series’ post-apocalyptic environment and modern digital operations. Just as Ellie and Joel’s (the main characters) world was fundamentally changed by an unstoppable force of nature, today’s operations are being radically transformed by increasingly complex, interconnected systems, and the power of AI and automation.

Reduce the impact of hybrid cloud incidents with AI-powered ITSM

Hybrid and multicloud IT environments have become standard for enterprises, and with good reason. These environments offer greater flexibility, improved resilience, and optimized performance by allowing organizations to leverage the best features of multiple cloud providers while maintaining the security of on-premises infrastructure.

How to Combat MSP Alert Fatigue

Managed service providers (MSPs) are responsible for monitoring hundreds or even thousands of devices, meaning that they must have a practical way of identifying incidents, vulnerabilities, and outages. The obvious choice is employing an incident alerting tool that can deliver alerts to the on-call engineers responsible for maintaining system health and performance.

Incident Alerting and On-Call Management for MSP (Managed IT Services) Explainer

Managing incidents, on-call, and mass notifications as an MSP just got easier. OnPage helps Managed Service Providers cut down MTTR, hit SLAs, and make sure critical alerts from tools like Jira, ConnectWise, Autotask, and ServiceNow reach the right people—fast. Plus, when urgent updates need to go out to your entire business ecosystem, BlastIT delivers instant mass notifications.