%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Runbook Automation and Rundeck v5.7 Release Notes

Nov 7, 2024 By PagerDuty In PagerDuty

Product Managers Jake and Forrest join us for a spooky stream to talk about the Runbook Automation and Rundeck release v5.7. Project Runner Management is now generally available.

View Video

PagerDuty

Read more about Runbook Automation and Rundeck v5.7 Release Notes

Engineering an AI Proxy for ilert

Nov 7, 2024 By Daria Yankevich In iLert

Building an AI proxy for our AI features was one of the best decisions we made a year ago. In this article, we will share why and what challenges we faced.

Read Post

iLert

Read more about Engineering an AI Proxy for ilert

Lessons from 4 years of weekly changelogs

Nov 7, 2024 By Pete Hamilton In Incident.io

Writing a meaningful update for customers every week has been held sacred at incident.io since we started the company. We've written over 200 of them in the past 4 years, and we recently celebrated going 2 years straight without missing a single a single week The numbers themselves are not the goal, but the consistency of this habit and what it represents for our customers and our team is very real, and special to me.

Read Post

Incident.io

Read more about Lessons from 4 years of weekly changelogs

Operationalizing AI for IT operations

Nov 6, 2024 By Conor Castronovo In BigPanda

Advances in artificial intelligence are rapidly transforming the IT operations landscape. According to Enterprise Strategy Group, 85% of organizations use or plan to deploy AI across many functional areas, including IT operations. Among its many benefits, AI can help ITOps teams: AI has immense potential to transform how IT operations, service management, and infrastructure teams function. Adoption is the first step toward creating organizational change.

Read Post

BigPanda

Read more about Operationalizing AI for IT operations

Did Delta's slow web performance signal trouble before CrowdStrike?

Nov 6, 2024 By Denton Chikura In Catchpoint

The CrowdStrike outage was a reminder of how quickly the dominoes can fall—especially when the foundation is shaky. Delta Airlines was hit harder than its competitors. While United and American Airlines were able to recover within days, Delta faced ongoing struggles, leading to the cancellation of 7,000 flights over five days.

Read Post

Catchpoint

Read more about Did Delta's slow web performance signal trouble before CrowdStrike?

What is Incident Management? Keys to Business Continuity and Resilience

Nov 5, 2024 By InvGate In InvGate

Learn about the benefits and types of Incident Management. Discover how to build an effective process and follow best practices to ensure your organization is prepared for any incident.

View Video

InvGate

Read more about What is Incident Management? Keys to Business Continuity and Resilience

Against Incident Severities and in Favor of Incident Types

Nov 4, 2024 By Fred Hebert In Honeycomb

About a year ago, Honeycomb kicked off an internal experiment to structure how we do incident response. We looked at the usual severity-based approach (usually using a SEV scale), but decided to adopt an approach based on types, aiming to better play the role of quick definitions for multiple departments put together. This post is a short report on our experience doing it.

Read Post

Honeycomb

Read more about Against Incident Severities and in Favor of Incident Types

Observability as a superpower

Nov 4, 2024 By Sam Starling In Incident.io

With every job I have, I come across a new observability tool that I can’t live without. It’s also something that’s a superpower for us at incident.io: we often detect bugs faster than our customers can report them to us. A couple of jobs ago, that was Prometheus. In my previous job, it was the fact that we retained all of our logs for 30 days, and had them available to search using the Elastic stack (back then, the ELK stack: Elasticsearch, Logstash, and Kibana).

Read Post

Incident.io

Read more about Observability as a superpower

#8 Virtual Meetup EMEA Rundeck by PagerDuty

Nov 4, 2024 By PagerDuty In PagerDuty

Join us for an informal 1-hour virtual gathering where the open-source Rundeck by PagerDuty community comes together to share knowledge and insights. Whether you're new to Rundeck or looking to elevate your automation game, this meetup is packed with valuable takeaways for everyone!

View Video

PagerDuty

Read more about #8 Virtual Meetup EMEA Rundeck by PagerDuty

The No-Nonsense Guide to Runbook Best Practices

Nov 2, 2024 By Hrishikesh Barua In IncidentHub

Runbooks are a key part of incident management and preserve institutional knowledge. They can be used for both incident response as well as routine tasks like db maintenance and generating a complex report. We are mostly focused on incident response runbooks here.

Read Post