Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Everbridge Flow Designer - Overview

Flow Designer is a stunningly simple, visual workflow builder that’s as easy as drag, drop, and done. Built-in steps make it easy to create virtually any workflow connecting your applications. Just drop in the steps you need to launch a critical event management process, post progress updates to a public page, and create spaces for personnel to collaborate.

Adobe Experience Cloud Outage: The Impact of Relying on Third-party Services

On December 8, 2023, Adobe's extensive customer base was impacted by a series of outages in the Adobe Experience Cloud, starting from 8:00 AM EST and continuing until 1:45 AM EST on December 9. We haven't seen a third-party outage of this magnitude since the DoubleClick outage of 2018.

The Debrief: Incident management for data teams

If you're on a data team, have you ever considered using an incident management tool to respond to pipeline issues? If the answer is no, then you might want to check out this episode. Here, we chat with Jack, Data Analyst at incident.io, to better understand why data teams can—and should—look to incident management tools like incident.io to manage issues. We chat about: Read Jack's blog post about incident management for data teams.

How BookMyShow Empowered SREs - Incidentally Reliable Podcast #incidentmanagement #devops #shorts

Incidentally Reliable Episode 4 dropping this Thursday the 14th, chatting about BookMyShow's journey from inception to the entertainment behemoth it is today, their experience innovating at the forefront of the mobile and e-commerce revolutions, and their harmony with reliability in the colourful yet challenging world of movies. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

What is Mean Time to Resolution - and why does it matter?

Mean Time to Resolution (MTTR) is a key performance indicator (KPI) that measures the average duration needed to restore normal operation for an application, service or piece of infrastructure component. Your MTTR directly impacts customer satisfaction, so you must have a keen understanding how it influences the reliability and availability of your services and applications to make informed decisions, enable operational efficiency, and ensure a seamless customer experience.

Incident vs Bug: Understanding the Key Differences

Incidents and bugs are two common occurrences that can disrupt the smooth operation of systems and applications. While these terms may seem similar, they represent distinct concepts with different implications. Understanding the nuances between incidents and bugs is crucial for effective incident management and proactive problem resolution.

What is Mean Time to Detect (MTTD) - and why does it matter for ITOps?

Have you ever wondered about your IT team’s efficiency in detecting incidents? Your Mean Time to Detect (MTTD) is an incident management Key Performance Indicator (KPI) that reveals your productivity during the first stage of incident resolution and enables investigation into opportunities for improvement. ITOps and DevOps teams that can lower their MTTD can more quickly identify issues, minimize potential downtime, and maintain system reliability too.

Understanding IT event analytics: From basics to AIOps

A wise person once said, “What’s measured is what matters.” This couldn’t be more true than in the high-stakes world of IT operations, where the ability to swiftly measure, analyze, and respond to events is crucial for improving IT operational performance. This blog delves into defining IT event analytics, guiding you on getting started, showcasing real-world examples, and introducing essential methods to transforming your incident response strategy.