Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on IT Operations Management and related technologies.

The Next Wave of Automation Makes More Room for Humans

When a system goes down, the impact isn’t just technical. It’s the people in the center of it who adapt, improvise, apply their judgment, and keep the business moving forward. I’ve worked in operations for more than 25 years, and one thing I’ve learned is that in any system, it’s the humans who are the truly resilient part.

Agentic AI Becomes Essential: Why Adoption Is Accelerating and What Comes Next

The cautious optimism business leaders held towards AI agents has evolved into more widespread enthusiasm. In our last survey from April 2025, just over half (51%) of companies had deployed AI agents in their organization. Six months later, 75% of companies are deploying more than one agent, according to PagerDuty’s latest research.

Automate or Elevate? 5 Steps to Build an AI-Powered Incident Playbook

Modern development tools, CI/CD infrastructure, and AI have accelerated the pace at which companies release software. This speed supports innovation, but it also increases complexity and the chance of something breaking in ways that aren’t immediately obvious. Teams now deal with more operational data, complex failure patterns, and systems where a small configuration change can ripple across dozens of microservices.

Why Comprehensive IT Risk Mitigation Is Essential in Modern Operations

The digital economy offers unprecedented opportunities for innovation, but it also presents a high-stakes risk that must be effectively managed to ensure operational resilience. Organisations that are heavily reliant on IT to provide services, control data, and establish trust with customers must prioritise risk avoidance as part of their operational resilience plan.

You Don't Need a Five-Year AI Plan. You Need a Five-Week One.

In my travels, I constantly hear about plans that promise to “unlock the full power of AI” down the road. The usual advice is to start small with a few pilots, then gradually scale up from there. It looks good on paper, but in practice, it becomes a months-long slog of one-off experiments that burn a lot of capital, but usually generate little impact on their own.

How to Choose Incident Management Software

Choosing the right incident management software can make or break your organization’s operational resilience. Modern IT environments are growing complex, and so are customer expectations for always-on services. Having robust incident management capabilities isn’t just nice to have, it’s essential for business continuity.

A Leader's Guide to Upskilling Teams for the AI Era

Every week, we hear about new AI breakthroughs. AI models write code, create videos, or analyze data in ways we couldn’t imagine just months ago. But there’s a gap: While most companies have adopted AI tools, the majority of employees still don’t use AI in their everyday work. As a manager, you see AI’s potential to change how your team works. Yet your employees struggle to figure out how AI fits into their daily tasks.

Space-Conscious Ops: Why Commercial Bar Stools Are a Smart Switch in NOC Areas

Inside a Network Operations Center, every square foot is valuable. Operators spend long shifts monitoring data, responding to alerts, and making decisions that affect critical systems. In such a demanding environment, furniture is not just decoration. It directly influences how people move, how they see their screens, and how alert they remain.

From Alert to Resolution: How Incident Response Automation Cuts MTTR and Closes Gaps

Every minute of downtime costs money. Every manual handoff adds risk. And every incident without a standardized fix becomes an opportunity for inconsistency, delay, and escalation. That’s why more operations and SRE teams are turning to Incident Response Automation. Through the PagerDuty Operations Cloud, teams can leverage safe, pre-defined remediation actions, enabling responders to go from alert to resolution in minutes, not hours, reducing MTTR and improving response consistency.