Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The EU AI Act and what it means for managing incidents

If you've been in earshot of tech leadership lately, you've probably heard the words 'EU,' 'AI,' and 'compliance' in conversation. The EU AI Act is officially upon us, and with it comes a whole new set of incident response and reporting requirements that might feel like a yet another bureaucratic set of requirements to worry about. But there's a different way to look at this legislation.

3 Ways to Use FinOps Automation for Cloud Cost Optimization

The cloud is the backbone of modern businesses, revolutionizing the trajectory of innovation, technology and business itself. While its promise of instant scalability and flexibility drives unprecedented growth, these same advantages can become a double-edged sword. The ease of spinning up new resources, automating deployments, and expanding services across regions—all of which make the cloud so powerful—can quickly lead to sprawling infrastructure and runaway costs if not carefully managed.

What's New: Gentle High Priority Alerts

A calmer way to respond quickly, without the shock. I’m really excited to share a new feature that’s been close to our hearts (and ears ): Gentle High Priority Alerts. This one’s for everyone who’s ever been jolted out of sleep, or even deep focus, by a high-priority notification/”page” that felt more like an alarm clock than an alert.

April Wrap-Up: Product Updates Across the PagerDuty Operations Cloud

PagerDuty is committed to redefining what digital operations look like in the era of AI and automation. This vision drives us to continuously innovate and enhance the PagerDuty Operations Cloud, ensuring every update brings our customers closer to achieving operational excellence. Building on the momentum of our Spring product launch at PagerDuty On Tour, we’re excited to showcase what we’ve shipped this quarter.
Sponsored Post

Incident Response Software: Master Operational Resilience

In the event that your business or work is highly dependent on technologies where reliability is a concern, you already know how critical a quick recovery from a technical crisis is for you. A robust incident response software and strategy is what really separates companies that swiftly recover from technical crises in today's fast-paced, ever-evolving digital environment from those that suffer prolonged outages.

April 2025 Update - Fully Redesigned Signl Center, Shift Tiers with Escalations, AI Shift and Duty Scheduling, and a new Chat View for the Mobile App

With our latest April update, we are setting a new benchmark in incident management excellence. The Signl Center in our web portal has undergone a major redesign, delivering a superior, more intuitive layout, enhanced tracking of notifications and escalation workflows, and an upgraded incident chat — redefining how operations and maintenance teams coordinate under pressure.

DevOps - Roles and Responsibilities

As DevOps grows within the tech industry, it continues to play a vital role in modern software development by bridging the gap between development and operations. DevOps engineers juggle a wide range of tasks in their daily life, combining coding, automation, system management, and team collaboration. In this blog, we’ll explore their core responsibilities, highlight essential best practices, and show how solutions like OnPage can help streamline their workflows.

Gett replaces paging tool with Exigence to achieve IR excellence

“By the time a pager alerts you to a problem, it’s too late to think about how to manage the incident.”(Google SRE Workbook) Gett, a global leader in urban mobility and corporate travel tech, knew that relying on its incumbent paging system and siloed manual processes for incident management was no longer sustainable. Any delay in response and service restoration could jeopardize customer satisfaction and business continuity.

How We Built Internet's Largest Incident Response Glossary for the Wider Community

Today, I’m excited to share the Internet’s Largest Incident Response Glossary. It’s a collection of over 500 terms covering on-call, alerting, monitoring, and system reliability. It took us over 2 weeks from ideation to completion of this project and in this post, I would like to share how we approached this beast!