Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Everbridge Signal - Open Source Threat Intelligence to Keep People Safe and Operations Running

There are billions of people online right now. Among that noise is information that could be vital to your organization’s safety and security. Everbridge Signal will help you find relevant information using Artificial Intelligence and Machine Learning. Detect incidents in real-time by gathering data from public sources including the dark web, deep web and social media. Whether your issues are cyber or physical, Signal can help.

Everbridge Flow Designer - Overview

Flow Designer is a stunningly simple, visual workflow builder that’s as easy as drag, drop, and done. Built-in steps make it easy to create virtually any workflow connecting your applications. Just drop in the steps you need to launch a critical event management process, post progress updates to a public page, and create spaces for personnel to collaborate.

Adobe Experience Cloud Outage: The Impact of Relying on Third-party Services

On December 8, 2023, Adobe's extensive customer base was impacted by a series of outages in the Adobe Experience Cloud, starting from 8:00 AM EST and continuing until 1:45 AM EST on December 9. We haven't seen a third-party outage of this magnitude since the DoubleClick outage of 2018.

The Debrief: Incident management for data teams

If you're on a data team, have you ever considered using an incident management tool to respond to pipeline issues? If the answer is no, then you might want to check out this episode. Here, we chat with Jack, Data Analyst at incident.io, to better understand why data teams can—and should—look to incident management tools like incident.io to manage issues. We chat about: Read Jack's blog post about incident management for data teams.

How BookMyShow Empowered SREs - Incidentally Reliable Podcast #incidentmanagement #devops #shorts

Incidentally Reliable Episode 4 dropping this Thursday the 14th, chatting about BookMyShow's journey from inception to the entertainment behemoth it is today, their experience innovating at the forefront of the mobile and e-commerce revolutions, and their harmony with reliability in the colourful yet challenging world of movies. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

What is Mean Time to Resolution - and why does it matter?

Mean Time to Resolution (MTTR) is a key performance indicator (KPI) that measures the average duration needed to restore normal operation for an application, service or piece of infrastructure component. Your MTTR directly impacts customer satisfaction, so you must have a keen understanding how it influences the reliability and availability of your services and applications to make informed decisions, enable operational efficiency, and ensure a seamless customer experience.

Incident vs Bug: Understanding the Key Differences

Incidents and bugs are two common occurrences that can disrupt the smooth operation of systems and applications. While these terms may seem similar, they represent distinct concepts with different implications. Understanding the nuances between incidents and bugs is crucial for effective incident management and proactive problem resolution.

What is Mean Time to Detect (MTTD) - and why does it matter for ITOps?

Have you ever wondered about your IT team’s efficiency in detecting incidents? Your Mean Time to Detect (MTTD) is an incident management Key Performance Indicator (KPI) that reveals your productivity during the first stage of incident resolution and enables investigation into opportunities for improvement. ITOps and DevOps teams that can lower their MTTD can more quickly identify issues, minimize potential downtime, and maintain system reliability too.