Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Now in beta: alerting for modern DevOps teams

Although FireHydrant has spent five years focused on what happens after your team (erg, I mean service 🙄) gets paged, the topic of alerting often comes up in discussions with our community. People are tired of paying big bucks for software that’s expensive, bloated, and hasn’t seen much innovation. Clearly, there’s a problem here – and we’re tackling it head on.

Autocorrelate Alerts With Squadcast's Key-Based Deduplication

With the increasing complexity of technology stacks and monitoring tools, managing incidents can become overwhelming, leading to alert noise, alert fatigue, and delayed responses. This is where Key-Based Deduplication comes to the rescue, streamlining incident handling and enhancing the effectiveness of your Incident Management platform.

How to create an on-call policy and rotation in OneUptime?

In this tutorial video, we walk you through the process of creating an on-call policy and rotation in OneUptime. We start by explaining what an on-call policy is and why it’s crucial for your organization. We then guide you step-by-step on how to set up a policy, including defining the policy name, setting the escalation rules, and adding users to the policy. Next, we delve into creating a rotation for the policy. We explain how to set the rotation length, start time, and participants. We also show you how to handle holidays and time-off requests within the rotation.

How to build workflows in OneUptime and integrate OneUptime with anything?

OneUptime is a complete open-source observability platform. It allows you to create workflows and integrate with over 5000 different services and products without writing any code. This integration capability allows OneUptime to connect with the rest of your software stack. Building workflows in OneUptime likely involves defining the sequence of operations that should occur based on certain triggers or conditions. These workflows can help automate processes, such as incident management, alerting the right people at the right time, and more.

When More Incident Commanders are Better

It has been lightly revised and reposted with his permission from the original article on Medium. Leading major incident responses can be extremely stressful. You have to quickly gather an ad-hoc team, figure out what went wrong, identify a fix and make sure this doesn't make things worse, all the while with senior leadership breathing down your neck. Are we having fun yet? Many people think having a dedicated incident commander role will solve the problem.

On-Call Management Models

In today's fast-paced digital landscape, incident management is crucial for maintaining operational excellence. During this process, on-call management models play a critical role in promptly addressing and resolving incidents. On-call management involves the organization of teams to ensure prompt response and resolution of incidents and is necessary to streamline incident resolution, ensure 24/7 availability, and allow for fair and transparent on-call rotations.

The Unplanned Show, Ep. 22: CSOps at PagerDuty with Arturo Suarez Martin

Even with the best monitoring in the world, some customer-impacting issues still go undetected and are ultimately reported by customers. In this episode, we'll hear from PagerDuty's Senior Director of Global Support, Arturo Suarez Martin, about the journey that PagerDuty has been on to tighten feedback loops between Customer Support and Engineering and mitigate the risk of poor customer experiences.