Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Getting AWS CloudTrail alerts via SNS Endpoint

Logging and auditing have been an essential part of troubleshooting application and infrastructure performance. You can instantly spot areas of risk to ensure quick correction and prevention of issues. In this blog, we will explore the AWS CloudTrail service and discuss how integrating it with Squadcast can help you route alerts to the right users for quick and efficient incident response. Let's get started.

xMatters Notification Override Feature

Now you can sleep easy knowing xMatters notification override will let you know when a critical alert happens, regardless of your device status. Discover more about how xMatters can help ensure applications are always working, automate workflows, and deliver remarkable products at scale with the xMatters service reliability platform.
Sponsored Post

Simplifying SLO and Error Budget tracking for SRE teams

Service level objectives (SLOs), and the subsequent service level indicators (SLIs) are the foundation to establishing a strong SRE culture and how they promote accountability, trust and timely innovation. We are on a mission to simplify SLO and Error Budget tracking and with that aim in mind, we have added the SLO Tracker feature to the Squadcast platform. SLO Tracker seeks to provide a simple and effective way to keep track of your error budget burn rate without the hassle of configuring and aggregating multiple data sources.

5 Tips If You're the 1st SRE Hire by Instacart's First SRE

Site Reliability Engineers (SREs) have a considerable set of tasks to juggle no matter where they work or how long their company has had an SRE practice. But if you’re the very first SRE to join an organization – as many SREs are these days, given that the SRE trend is trickling down into smaller and smaller companies – you face a special group of challenges. You may find it difficult to get buy-in for SRE from other technical teams.

Introducing Incident Types

We believe incident.io should be used across an organisation, from SRE teams to Customer Success and People Ops. Until now, the way you set up your incident response flows has relied on having one set of roles and fields for every incident, meaning you have to choose between having lots of irrelevant fields to cover every use-case, or not getting the full incident.io experience on some incidents. That’s changing today with incident types, conditional fields and roles!

Webinar: combating tool sprawl with AIOps

Dexcom is more than a business. For its customers, the organization’s innovative continuous glucose monitoring platform provides them with a way to take control of their health and better manage their diabetes. Given the critical services Dexcom provides to its customers, their IT Operations teams have highly specific needs when it comes to the many tools and platforms, they rely on to keep their organization’s services up and running.

Everbridge Live: Don't Be Afraid of the Dark Web

The dark web is often seen as a den of inequity and an underworld of illegal activity. However, this den can be a valuable source of information. Right now, more than 4 billion active internet users are online, creating posts, pictures, and videos. Billions of different points of view in real-time on the open web, social media, dark and deep web. Watch our short webinar where you will see just how easy it is to use Everbridge Signal to find valuable open source intelligence.