Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Zen Your Life With IT Event Noise Reduction

IT incident responders have been inundated with alerts since the start of the COVID-19 pandemic. These engineers must dig through their messages to collect and respond to real alerts for real critical events. This process wastes time and prolongs incident response. The objective is to focus on IT event noise reduction to recognize and resolve real incidents promptly.

Incident Management in Mattermost: Creating an Incident Playbook

The idea behind Incident Management is to be ready. Not ready for anything, as that can be an unrealistic expectation, but ready to respond when the unexpected inevitably happens. DevOps teams often create incident playbooks in order to ensure they are as ready as possible to handle situations as they arise. Luckily, there is some amazing documentation on how to do just that from our friends at PagerDuty.

Escalating Prometheus alerts to SMS/Phone/Slack/Microsoft-Teams via AlertManager and Zenduty

Prometheus is by far, one of the most popular open-source monitoring tools used by millions of engineering teams globally with a robust community and continued adoption and evolution. We at Zenduty shipped our Prometheus integration integration a while back and we’re happy to report that the adoption of our Prometheus integration has been absolutely through the roof!

Improve Customer Satisfaction With Customer Service Incident Commanders

The global pandemic has drastically accelerated digital transformation initiatives and forced organizations to reimagine customer service by having them take on the incident commander role in managing and responding to customer issues and engaging with customers. In addition to prioritizing digital services, many businesses have migrated to the cloud to increase business agility, develop and deliver new features faster, and meet the growing demands of end users.

An end-to-end incident in Blameless and PagerDuty

PagerDuty is a leading on-call management platform that aggregates monitoring and alerting data, notifies on-call teams, and accelerates incident resolution. The platform is used by thousands of teams responsible for software experiences. It integrates incident triage with rapid responder mobilization, so teams can resolve incidents in real time.

Curb alert noise for better productivity : How-To's and Best Practices

On the quest to provide the best uptime, software platforms depend on complex interconnected microservices. This often leaves them vulnerable to cascading failures creating a massive deluge of alerts from monitoring tools when things go wrong. In this blog, we explore how Squadcast can be configured to curb alert noise for better productivity with the help of the most advanced deduplication features.

How to create a custom ServiceNow incident report dashboard in Canvas

Welcome back once again! This is the third and final part of this series on using the Elastic Stack with ServiceNow for incident management. In the first blog, we introduced the project and set up ServiceNow so changes to an incident are automatically pushed back to Elasticsearch. In the second blog, we implemented the logic to glue ServiceNow and Elasticsearch together through alerts and transforms as well as some general Elasticsearch configuration.

Automating Incident Callouts for Canadian Pacific's Engineering Team

Canadian Pacific (CP) is a historic Canadian Class I railroad incorporated in 1881. It was CP that connected the country and became Canada’s first transcontinental railway. Headquartered in Calgary, Alberta, it owns approximately 13,000 miles of track across Canada and the United States. Canadian Pacific initially introduced Enterprise Alert in 2016 to increase speed and effectiveness of incident callouts to information workers, and staff in various departments.