Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on AIOps, alerting in complex systems and related technologies.

The doctor is in: why domain agnostic AIOps is a necessity for diagnosis

Gartner recently identified two different high-level categories of AIOps: domain-centric and domain-agnostic. Elik Eizenberg, CTO at BigPanda, explains the difference and why would you need the latter to gain an overall view and understanding of your IT Ops.

Automate and Virtualize the NOC: A Gannett/USA TODAY Network Case Study

Mission creep is a phenomenon that occurs after a project begins and gains momentum, but then gradually grows beyond the original, intended scope. One day you wake up and realize that, instead of an efficient, manageable project, you’ve got a monster on your hands. For enterprises in the midst of dynamic growth, IT infrastructure is often beset by mission creep. The incumbent organization acquires smaller operations, integrates their technology, and soon things are out of control.

How to Clear Up Alert Storms by 90%?

Alerts are notifications from AIOps monitoring tools that indicate that there is an anomaly. IT teams get these alerts on their monitoring dashboard via emails or enterprise collaboration tools such as Slack or Teams. Service level agreements expect IT teams to analyze every alert within a specific timeframe and take appropriate action.

How to Control Alert Fatigue?

Alerts are indispensable to any IT operations system today. Site reliability engineers (SREs) or ITOps executives set up several monitoring tools for their IT landscape. When there is a change, high-risk action, or outage in any of these incidents, the monitoring tool triggers an automated alert. This could happen on the monitoring tool’s dashboard itself, via email, or enterprise collaboration tools like Slack or Teams.

The Five Data Pillars of Effective Root-Cause Analysis

The most effective way to understand an incident, resolve it and prevent it from occurring again is root-cause analysis. Simply put, root-cause analysis is the study performed by ITOps teams or site reliability engineers (SREs) to pinpoint the exact element/error that caused the unexpected behavior. Based on this, they plan remediation. Accurate and timely root-cause analysis can have a direct impact on the company’s top and bottom line.

Monthly Moo Update | September 2021

This has been quite the summer to remember as we continue to witness our customers achieve remarkable efficiencies through automation such as deep integrations with change pipelines to suppress alerts during maintenance windows and correlating alerts to create incidents with dynamic and evolving descriptions that dramatically improve Incident management processes.