Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

What are AIOps platforms?

Oct 12, 2023 By BigPanda In BigPanda

IT operations teams are challenged to keep pace with the rapid speed of digital transformation. As companies use more cloud-based apps, increase agile deployments, and develop new microservices-based applications, they add layers and complexity to their technology stacks, making life increasingly challenging for ITOps performance.

Read Post

BigPanda

Read more about What are AIOps platforms?

A Detailed Guide to Setting Up Effective On-Call Rotations

Oct 11, 2023 By Chitra Bisht In Squadcast

On-Call Schedules are predefined rotations/shifts assigning team members to be available for incident response at specific times. They are essential for ensuring round-the-clock support, swift issue/incident resolution, and continuous service availability. For a robust On-Call system, proper schedules are essential serving as the backbone of reliable Incident Response, and ensuring your team is well-prepared to address technical challenges effectively.

Read Post

Squadcast

Read more about A Detailed Guide to Setting Up Effective On-Call Rotations

The Debrief: Build vs buy

Oct 11, 2023 By Incident.io In Incident.io

Almost every organization around will eventually face an important crossroad: should I build the tooling I need, or buy it? But more often that not, the decision to buy is the most sensible one that'll save you the most time, effort, and even money. But there are some edge cases where building can be the right choice. In this chat with Isaac, product engineer at incident.io, we dive into this nuanced debate and explain why buying is your best bet...most of the time.

View Video

Incident.io

Incident Management

Read more about The Debrief: Build vs buy

After Hours Alerting for ConnectWise

Oct 11, 2023 By SIGNL4 In SIGNL4

A short demo video on how to add After Hours Alerting with SIGNL4 to your ConnectWise PSA. We show you the complete workflow and what to keep in mind for seamless connectivity and targeted mobile alerting including duty scheduling for your teams.

View Video

SIGNL4

Read more about After Hours Alerting for ConnectWise

SLA vs. SLO vs. SLI: What's the Difference?

Oct 11, 2023 By Laura Clayton In Uptime Robot

When it comes to managing services effectively, terms like SLA, SLO, and SLI are often thrown around like confetti at a parade. They’re in meetings, in documents, and even in casual office conversations. But if you’re new to the field or simply haven’t had the chance to dig into these acronyms, they can feel like a bewildering alphabet soup. And they can’t be missing on an uptime monitoring blog such as ours! So, what do these terms really mean?

Read Post

Uptime Robot

Read more about SLA vs. SLO vs. SLI: What's the Difference?

A guide to post-mortem meetings and how we run them at incident.io

Oct 11, 2023 By Luis Gonzalez In Incident.io

You've just made it through a particularly tough incident. It was a short outage affecting a subset of customers, so not exactly the end of the world, but bad enough that it involved multiple people across a number of teams to resolve. Either way, the incident was well managed, and the dust has settled. Now what? Most guidance would say that putting together a post-mortem document is a good idea, given the severity of the incident. You've also done this, so what's next?

Read Post

Incident.io

Read more about A guide to post-mortem meetings and how we run them at incident.io

Introduction to ilert AI

Oct 10, 2023 By iLert In iLert

During the intensity of incident response, it is crucial to maintain concentration on resolving the problem promptly. At times, crafting a thorough and precise incident communication can be difficult, particularly when under pressure. This is where ilert's AI-powered incident communication feature becomes valuable.

View Video

iLert

Read more about Introduction to ilert AI

Three Ways to Better Appreciate your SREs and DevOps Engineers

Oct 10, 2023 By Emily Arnott In Blameless

DevOps engineers and Site Reliability Engineers are vitally important to the continued health of your product and business. We all know it’s true, and yet people in these roles often feel underappreciated and undervalued. This sort of work runs into the issue of “when process and infrastructure break, it gets shoved in the spotlight; but when everything works perfectly, no one notices.” ‍

Read Post

Blameless

Read more about Three Ways to Better Appreciate your SREs and DevOps Engineers

The Unplanned Show, Episode 16: Resiliency with Sam Newman

Oct 10, 2023 By PagerDuty In PagerDuty

When the author of Building Microservices (O'Reilly) tweets asking for a "plurality of views" on resiliency, I, for one, am intrigued. In this episode, we'll hear from Sam Newman about his latest thinking on resiliency.

View Video

PagerDuty

Incident Management

Read more about The Unplanned Show, Episode 16: Resiliency with Sam Newman

How AIOps modernizes CMDBs to drive accuracy and value

Oct 10, 2023 By Blair Sibille In BigPanda

Maintaining your Configuration Management Database’s (CMDB) accuracy, keeping it fully updated, and improving its performance is a frustrating and elusive goal for ITOps and IT leaders. Aiming for this ‘golden’ CMDB standard can feel like running on a treadmill where you’re putting in a lot of work, but remain as distant as ever from your goal. Can IT leaders ever catch up?

Read Post