Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Is Your Incident Management Tool a Single Point of Failure? The Case for a Multi-Channel Approach

When we’re talking about incidents, we know it’s not a matter of if, but when. It spares no systems: ours, yours or your vendors’. We’ve all seen widely-used products experience incidents, and the domino effect it has on all operations relying on them for seamless functionality. Vendors offering narrow, chat-centered incident management tools might seem attractive at first glance, but they fundamentally misunderstand the complexity of enterprise operations.

Enhancing SAP Monitoring and Incident Management with IT-Conductor and ilert

We are excited to announce the integration of ilert with IT-Conductor, a SaaS-based IT operations management and automation platform. This partnership enhances IT-Conductor’s powerful capabilities with ilert’s advanced alerting and incident management, ensuring that IT teams can address issues faster and more efficiently.

How AI broke serverless and what to do about it with Vercel's Mariano Fernández Cocirio

Mariano, Staff Product Manager at Vercel, explains why serverless architectures are hitting unexpected limits—they’re too fast. The industry has spent millions optimizing serverless for speed, but AI workloads are changing the game. In the AI realm, slower execution often leads to better results. The challenge? Paying for all that idle compute time while waiting for AI responses.

Getting MTTR to zero: the failed promise of observability

There’s an old cliche about sales and jobs to be done - no one wants to buy a drill, they need a hole… actually, they want a home with pictures on the wall. To get to that beautifully designed home, they will buy a drill, make holes for brackets that can support their various artwork and family photos, and progress toward their dream home experience. Similarly, no one wants to buy observability software. They want their mean time to resolve (MTTR) issues to be zero.

What is Digital Customer Experience? Create a Great Online Experience

Customer expectations are higher than ever for a great online experience. A seamless, intuitive, and personalized experience across every digital interaction is expected, whether browsing a website, engaging with a mobile app, or having their questions answered by customer support. A successful digital customer experience isn’t just a competitive advantage; it’s essential for building brand loyalty and driving business success.
Featured Post

Personal resilience boosts operational resilience

Winter is a grinding time. The temperature, the darkness and the rain all take a toll on people. As a business, it's worth remembering that the human element of IT operations needs looking after just as much as the technology they maintain. Business leaders can't have one without the other.

Operations as Code: Operational Excellence with PagerDuty

The push towards digital transformation and cloud-native infrastructure is massive, yet organizations also need to maintain legacy capabilities. With this pressure comes the need to manage operations with the same rigor and automation we apply to infrastructure, coding, and security. Many organizations have embraced the ideas of everything in a pipeline and all things as code.

Revolutionizing Incident Management with AI: Meet Mo Copilot

Join us for this webinar as we explore how our newly launched Sumo Logic Mo Copilot redefines incident management with the power of AI. We'll examine the limitations of traditional troubleshooting methods and why they fall short in today’s fast-paced environments. Discover how Mo Copilot leverages advanced machine learning and automation to streamline root cause analysis and reduce mean time to resolution (MTTR). We'll also showcase a live demonstration and highlight how Mo Copilot integrates into your workflow, transforming how you manage operational reliability.

Introducing Audit Logs: Ensuring Visibility, Security, and Compliance in FireHydrant

When something goes wrong, the first question is always: what changed? Whether it’s an unexpected change to your on-call schedule, a broken automation, or a modified Runbook that just seems off, understanding the issue starts with knowing who made what change, when it happened, and what exactly changed. But in an organization with many users, keeping track of every action can feel impossible.