Operations | Monitoring | ITSM | DevOps | Cloud

Slash Observability Costs Without Sacrificing Reliability: The OTEL + PagerDuty Advantage

In a time when budgets are tight but reliability still needs to be high, observability is under the spotlight. Monitoring and observability tools are some of the most expensive parts of a tech stack, often eating up the bulk of the budget. Luckily, there are strategies organizations can implement to reduce costs, such as utilizing open-source solutions like OpenTelemetry (OTEL), which provides a flexible, open standard for data collection without the price tag of proprietary tooling.

When the Internet Blinked: What the June 12 Outage Teaches Us About Resilience

On June 12, 2025, the internet blinked. Email vanished, apps froze, and many of us lost contact with our digital coworkers (both AI and human). The world felt it instantly; businesses stalled, teams scrambled, and digital operations everywhere took a hit. Felt a little like deja vu. Does anyone remember July 19, 2024?

PagerDuty Advance and Amazon Q Business announce General Availability of their AI-powered, chat-first integration

When it comes to incident management, the ability to quickly access and act on operational data can mean the difference between brand loyalty and costly downtime. PagerDuty’s integration with the Amazon Q Business index addresses this challenge head-on by providing a seamless, more secure, and faster way to search and access enterprise knowledge across the IT ecosystem.

Engineering Time is Your Most Valuable Asset: Are You Spending It Right?

Technology leaders often face a tempting proposition from their engineering teams: “We could build this ourselves.” It’s a natural instinct, especially when discussing incident management systems. Your team’s confidence isn’t misplaced – they absolutely could build a basic alerting system. However, the question isn’t about capability; it’s about strategic resource allocation and long-term operational excellence.

Beyond Playbooks: Unleashing Enterprise-Wide Automation with Ansible + PagerDuty Runbook Automation

Playbooks are nice. Results are better. This simple truth highlights a critical challenge in modern enterprises: while technical teams have mastered infrastructure automation with Ansible, they need more than just technical playbooks that can only be used by SMEs—they need comprehensive automation that drives measurable business outcomes.

Accelerate Government IT Innovation

Government IT operations across public sector face unprecedented challenges this year. As digital demands intensify and legacy systems strain under pressure, agencies must accelerate IT innovation while delivering measurable ROI. The PagerDuty Operations Cloud emerges as the catalyst for government transformation, enabling agencies to revolutionize their digital operations while achieving operational excellence, according to The Government Guide for Agency Innovation ebook.

PagerDuty + Microsoft Build 2025: Transforming critical work with AI and automation

At Microsoft Build 2025, PagerDuty was featured in key announcements showcasing how intelligent agents and real-time automation redefine digital operations. From Microsoft Copilot to the launch of a new Azure SRE Agent, PagerDuty was highlighted as a strategic partner in enabling intelligent, scalable incident response.

Healthcare and Crisis Teams Harness PagerDuty to Stay Ready and Resilient

For organizations providing vital mental health assistance, safety crisis services and delivering critical humanitarian support when disaster strikes, reliable digital infrastructure is essential. Whether connecting individuals to crisis counselors via text or coordinating face-to-face healthcare support, these digital services must operate seamlessly.

Your Observability Platform Has a Blind Spot: Don't Risk Your Operations on Bolt-on Incident Response Modules

Observability platforms want to do it all—from data collection to incident response. Their pitch is appealing: one platform to eliminate context switching and reduce overhead. But when critical systems fail—and they will fail—, add-on incident management modules won’t save you. You need an end-to-end system built specifically for high-stakes incident management.

When Minutes Matter: The Iberian Peninsula Outage and the Future of Digital Resilience

On April 28, 2025, Spain, Portugal, and briefly some parts of France experienced what would become one of Europe’s most significant power outages in recent history. As millions across the Iberian Peninsula found themselves suddenly disconnected, a stark reality emerged: in our interconnected world, the ripple effects of major incidents extend far beyond their immediate impact zone.