Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Then and Now: Distributed Systems Alerting and Monitoring

Distributed systems are everywhere. Although many teams don’t think of their applications as distributed systems, if they’re developing using container-based microservices and serverless functions instead of a monolith, they’re creating a distributed system. This change also means that monitoring needs are becoming more complex.

From Metrics to Valuable Insights: Incident Post-Mortem Reports

IT organizations, such as managed service providers (MSPs), deploy incident alerting and on-call management solutions to accelerate software delivery and ensure seamless customer experiences. Incident alert management platforms orchestrate the distribution of alerts to ensure that technicians continue to maintain system uptime and minimize service disruptions.

Troubleshooting Outages at 3 AM with Alert Response

Imagine you are an on-call engineer, who receives an alert at 3 AM in the morning informing you that customers are experiencing high latency on your website, and are unable to shop. Being an Incident response coordinator myself at Sumo Logic, I can tell you, I don’t envy being that engineer. If this alert fired, this is what would likely follow: The biggest challenge is how to gather this information quickly, so you can decide whether to jump out of the bed or go back to sleep.

Automate, Group, and Get Alerted: A Best Practices Guide to Monitoring your Code - Part 1

As companies grow, so do their products, teams, and the number of external tools. For engineers, that can mean code sprawl, data silos, notification fatigue, and some “what the…?” moments along the way as they try to make sense of it all.