Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to consolidate your incident response stack using PagerDuty

PagerDuty is a comprehensive incident response solution that unifies disparate tools into a single platform. This helps teams respond to incidents faster and more effectively while reducing operational costs. PagerDuty also supports a shift from manual, reactive incident management to an automated, proactive approach, making the incident response process more efficient and resilient.

Here's what to focus on when reviewing an incident

Incidents can be a bit noisy. Especially when it’s one of higher severity, there are a lot of moving parts that can make it difficult to come away with the information you want at a glance. And if you’re someone who isn’t necessarily tapped into the day-to-day of incident response, such as a head of a department or executive, you’ll want to be able to glean the most actionable information in just a few seconds without having to dig through dense documents.

Top 5 Tools for SRE 2023 (Updated)

Site reliability engineers (SREs) are involved in scaling systems and making them reliable and efficient for organizations. But SREs often fail to build system resiliency when they do not have the right tools at their disposal. In this post, we’ll uncover the top 5 tools for SRE that can be used to drive the reliability and stability of software systems. It also examines how SREs can use the tools to improve operations tasks and infrastructure processes.

Enterprise Alert 9.4.1 comes with fixes and the revised version of the sentinel connector app

In this release, we have addressed a number of bugs that were impacting the performance and functionality of the system. In the Kernel, we have resolved an issue where the broadcast was not being stopped after the first user acknowledged it. Additionally, we have fixed a crash that was occurring when loading component infos and an error log that was being generated when the Kernel started in suspended mode.

Announcing: Blameless + OpsGenie Integration

In the opening moments of an engineering incident, the most important aspect of a response plan is speed. Getting out of the gate quickly by leveraging automation to assemble the team can save precious moments during a critical engineering incident and make the difference between happy and unhappy customers downstream. This is why we’re excited to announce the integration of Blameless with OpsGenie.

Extend the Power of Your ServiceNow Application with PagerDuty for Customer Service

The last few years have led to an increasingly digital world. We are all online, streaming, shopping, or simply surfing. In this new world, customer experience is more critical than ever. Customers want things to work as seamlessly as possible, and when things go wrong, so goes their trust and business. The key priority for many businesses is keeping those systems running as smoothly as possible to keep customers happy and build their loyalty.

Analytics in Squadcast | Visualize Team and Organization Level Analytics | MTTA MTTR | Squadcast

Analyzing incident data plays a key role to do better SRE. Squadcast's Analytics Dashboard helps you analyze the performance of your Organization/ Team, for a given time period. It also gives you more insight into past outages that affected your systems.