Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

All I want for Christmas... from Slack

When declaring and responding to an incident with incident.io, most of your interactions with our product will go via Slack. You might configure your forms in our web dashboard, but the responder using them to declare an incident is most likely doing so from a Slack modal, and the incident announcement will be posted as a Slack message. This means a lot of our product design falls within the constraints of what we can build using Slack’s block kit.

Understanding ServiceNow Incident Management: A comprehensive guide

You’re focused on swiftly identifying, analyzing, and resolving disruptions in IT services. And you know all too well that correctly deploying and adopting incident management holds the key to delivering a more reliable and responsive IT environment for your applications and services. That’s why you’re using or are considering using ServiceNow’s incident management to ensure a structured and efficient approach to handling your IT service incidents.

Better Incidents Winter Bonfire: Inside On-Call

Engineers are bombarded with pages left and right. There's uncertainty about how to escalate. A constant blur exists between what's urgent and what can wait. This never-ending ping-pong game takes a toll. Burnout creeps in, and your engineering culture has taken a nose dive before you know it.

Automated incident response in ITOps: Here's everything you need to know

If you’re like most IT leaders, you realize that automating repetitive, low-level incident response actions is key to unlocking enhanced workforce productivity, improved IT services, minimized downtime, better user experiences, cost savings, and the freedom to focus on innovation. Yet you don’t know where to start – or maybe aren’t sure of the best approach.

BookMyShow's Cinematic Product Journey - Incidentally Reliable Podcast with Viraj Patel

Grab some popcorn and catch Viraj talk about his experiences and BookMyShow's journey from its inception in the early 2000s to the entertainment behemoth it is today, their stints innovating at the forefront of the mobile and e-commerce revolutions, and their harmony with reliability engineering in the colourful, ever-changing yet challenging world of movies and online ticketing. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.

LLM Monitoring and Observability

Large Language Models (LLMs) are advanced artificial intelligence models designed to comprehend and generate human-like language. With millions or even billions of [parameters, these models, like GPT-3, excel in natural language processing, understanding context, and generating coherent and contextually relevant text across various applications.

The Everbridge Risk Intelligence Monitoring Center (RIMC) real-time alerting

The Everbridge Risk Intelligence Monitoring Center (RIMC) analyzes thousands of trustworthy, vetted, and hyper-local data sources – across over 100 risk categories – using machine-learning and AI technology, complemented by an experienced team of global risk analysts. The RIMC team’s real-time alerting streamlines your organization’s ability to monitor and analyze worldwide incidents and events, dramatically increasing your ability to respond to risks that threaten your people, organization, supply chain, and more.