Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Incident management vs problem management

The video explores the difference between incident management and problem management in modern organizations. It describes a common scenario where operations teams focus on immediate fixes, like rebooting systems, without addressing the root causes. Once the immediate issue is resolved, these teams pass the incident report to the developers, who are then responsible for digging deeper to prevent future occurrences.

Generative AI for IT Operations: Your Questions Answered

IT leaders are thrilled about the potential of Generative AI for IT Operations. But they also want to know how it works, why it works, and what it will do for them before taking the leap and adopting this new technology. Allow me to share my perspective on the hype and the truth behind Generative AI. I’m the Field CTO for BigPanda, Operational Intelligence and Automation driven by AIOps.

Observability Pillars: Exploring Logs, Metrics and Traces

The ability to measure the internal states of a system by examining its outputs is called Observability. A system becomes 'observable' when it is possible to estimate the current state using only information from outputs, namely sensor data. You can use the data from Observability to identify and troubleshoot problems, optimize performance, and improve security. In the next few sections, we'll take a closer look at the three pillars of Observability: Metrics, Logs, and Traces.

Alternatives to SMS alerts

While SMS alerts are handy, they also tend to be tricky. Across 120+ countries, we continuously deal with compliances & regulations from Vendors, Government, and Phone carrier companies. Other alert channels similar to SMS are a lot less cumbersome with higher delivery rates. Let’s take a look at the available options to switch from SMS.