Operations | Monitoring | ITSM | DevOps | Cloud

From Reactive Response to Systemic Resilience: The System That Gets Smarter With Every Incident

Most operations teams are stuck in a reactive loop: Resolving incidents as they happen, then moving on to fight the next fire. This approach keeps things running in the short term, but prevents responders from documenting their learnings in a way that improves overall system resilience. There are practical reasons for this.

Five key takeaways from EDUCAUSE 2025: Adopting AI while navigating change

Having just returned from the 2025 EDUCAUSE Annual Conference in Nashville, I want to share some insights on the future of campus IT from the higher education technology leaders in attendance. Every year, this conference provides an opportunity for technology providers and higher ed professionals to connect and explore the latest innovations in higher education technology. Two themes emerged as critical priorities.

Why Agentic AI Adoption Is Accelerating in Europe and What Comes Next

Across Europe, the cautious optimism business leaders held towards AI agents has evolved into more widespread enthusiasm. What was once a curiosity is now core to how many European organizations operate, respond, and innovate. According to PagerDuty’s latest agentic AI survey, three-quarters or more of organizations in France, Germany, and the UK are deploying multiple AI agents. This growing confidence reflects a broader trend.

How to Choose an AI SRE Solution

The AI SRE landscape has exploded over the past year, with vendors racing to add artificial intelligence capabilities to their platforms. For engineering leaders evaluating these solutions, the sheer number of options can feel overwhelming. Some vendors are building AI-native solutions from scratch, while others are retrofitting AI onto existing workflows. Cloud providers are embedding agents into their ecosystems, and observability platforms are adding intelligence layers to their telemetry data.

Work Where Your Teams Already Are with PagerDuty's AI Agents for Slack

Modern operations happen in Slack, where teams spend their days collaborating, troubleshooting, and resolving incidents. And while many incident management tools offer Slack-friendly experiences, they lack end-to-end capabilities that teams need. During critical moments, other tools may require users to switch between Slack and their own interfaces, creating friction.

We Built an SRE Agent With Memory And It's Transforming Incident Response

If you feel like your incidents are multiplying while your stack gets more complex by the week, you’re not alone. Event volumes keep climbing, signals live in a dozen tools, and human responders are stretched thin. That’s exactly why we built the PagerDuty SRE Agent—a vendor‑agnostic AI teammate that improves with every response to make the next one faster, smarter, and more reliable.

Too Late to Learn: Why Security Post-Mortems Fail and How AI Can Help

An effective post-mortem can turn a security breach into a blueprint for lasting resilience. But too often, in the stress of an incident, documenting what happened takes a back seat to containment and recovery. The resulting analysis relies heavily on memory, scattered notes, and competing narratives. Valuable context gets lost, timelines blur, and lessons that could strengthen defenses never become institutional knowledge.

Your Next Incident Has Already Started. You Just Haven't Noticed Yet.

The best way to minimize the impact of an incident is to catch it early, before small issues snowball into major disruptions. That requires maintaining healthy systems and ensuring sufficient resources are available when problems arise. But developers and IT operations pros working in large enterprises face a challenge: Complex systems operate in an inherently degraded state. In his essay “How Complex Systems Fail,” Dr.

Your Top Engineers Should Be More than Expensive Button-Pushers

The engineer you pay $200,000 a year just spent an hour copy-pasting data between dashboards. Again. Software engineers have critical skills that are in the highest demand. And yet, many world-class engineers are currently spending too much of their time clearing tickets, routing alerts, and responding to the same types of incidents over and over again. This operational toil is costing you.

Meeting Developers Where They Work: PagerDuty + Spotify Portal for Backstage

From the beginning, PagerDuty has been built by developers, for developers. Our mission has always been to help development teams build faster and resolve incidents more efficiently by meeting them where they work. Building on PagerDuty’s existing plugin for Spotify for Backstage, we are thrilled to announce the PagerDuty plugin for Spotify Portal for Backstage to continue bringing enterprise-grade incident management into even more developer workflows.