Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Unveiling Past Incidents: Accelerating Incident Resolution with Historical Context

Having the context of how similar issues were handled in the past can be invaluable. It can help incident responders grasp the nature of recurring problems, their causes, and effective solutions that have worked in the past. Introducing Squadcast’s Past Incidents feature that assists incident responders by presenting them with a list of similar past incidents related to the same service they are currently investigating.

Introducing Grafana OnCall shift swaps: A simpler way to exchange on-call shifts with teammates

A family member’s birthday, that concert you’ve waited all year to see, an impromptu weekend getaway with friends — there are a lot of reasons software engineers might want to switch on-call shifts. And rather than have to frantically send Slack messages to your teammates, wouldn’t it be nice to automate the process and quickly find the coverage you need?

Product Spotlight: Enhancing Incident Resolution with Blameless' Microsoft Teams Integration

In today's fast-paced digital landscape, swiftly responding to incidents is paramount for engineering teams. Downtime is not just costly; it can tarnish your organization's reputation. The pressure felt by engineering operations, DevOps, and SRE leaders to architect and run an effective incident response process is immense. Fortunately, over the last several years, effective engineering organizations have developed a standard toolkit for running a good incident response process.

The importance of testing emergency warning systems

On Oct. 4, 2023, the Federal Emergency Management Agency (FEMA) plans a nationwide mobile alert test which will send an emergency SMS to all cellphones in the United States. In coordination with the Federal Communications Commission (FCC), the national test will be administered at approximately 2:20 p.m. ET on Wednesday, Oct. 4. It will consist of two portions that will test Wireless Emergency Alerts (WEA) and Emergency Alert System (EAS) capabilities.

Better learning from incidents: A guide to incident post-mortem documents

If you’re just starting out in the world of incident response, then you’ve probably come across the phrase “post-mortem” at least once or twice. And if you’re a seasoned incident responder, the phrase probably invokes mixed feelings. Just to clarify, here, we’re talking about post-mortem documents, not meetings. It’s a distinction we have to make since lots of teams use the phrase to refer to the meeting they have after an incident.

Sponsored Post

Status Pages 101: Everything You Need to Know About Status Pages

Status Pages are critical for effective Incident Management. Just as an ill-structured On-Call Schedule can wreak havoc, ineffective Status Pages can leave customers and stakeholders, adrift, underscoring the need for a meticulous approach. Here are two, Matsuri Japon, a Non-Profit Organization and Sport1, a premier live-stream sports content platform, both integrate Squadcast Status Pages to enhance their incident response strategies discreetly. You may read about them later. Crafting these Status Pages demands precision, offering dynamic updates and collaboration.