Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

About two months ago, an incident at Grafana Labs was kicked off in typical fashion: A series of alerts were triggered, our on-call engineer acknowledged it on Slack, and the rest of the team quickly began hypothesizing about the potential culprit. But the way the incident was resolved was anything but typical. Yes, our internal team followed best practices to resolve the incident as quickly as possible.

What Is a Data Pipeline

In today’s tech world, IT and security technologies are the functional equivalent of Pokemon. To gain the insights you need, you “gotta catch ‘em all” by ingesting, correlating, and analyzing as much security data as possible. Data pipelines organize chaotic information flows into structured streams, ensuring that data is reliable, processed, and ready for use.

Agentic AI and the End of Traditional IT (w/ Robb Wilson)

In a wide-ranging conversation, Robb Wilson—CEO and co-founder of OneReach.ai and author of The Age of Invisible Machines—joins Tim and Tom to explore the rise of agentic AI and its seismic implications for IT, organizations, and society. Robb breaks down the concept of agent runtimes, why conversational interfaces matter more than ever, and how adaptive, self-orchestrating systems will reshape work far beyond today’s service models.

Mezmo's AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)

We are thrilled to announce the availability of Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)—a truly transformative leap forward for engineering and operations teams included in your existing subscription at no additional charge. We are paving the way for a new era of observability, moving beyond passive, reactive monitoring to a world of proactive AI-driven observability.

What is Network Observability vs. Network Monitoring?

Network observability may be seen as a newer term in the world of networking, but it has become critical for managing modern distributed networks. As networks grow more complex with cloud services, remote workers, and distributed applications, traditional network monitoring approaches no longer provide sufficient visibility into network health and performance.

Synthetic Monitoring for Internal Applications: SAP, ERP & More

Modern IT teams know the story by heart: uptime dashboards look green, the public website is fast, yet somewhere inside the corporate network, the finance team can’t submit purchase orders and the factory floor’s ERP terminals are frozen. What broke isn’t the internet—it’s the internal backbone. These internal systems—SAP, Oracle, Microsoft Dynamics, homegrown ERPs, HR and payroll platforms—keep the business running.

Google Workspace outage on November 12: How StatusGator detected it first

On November 12, 2025, users around the world faced difficulty accessing Google Workspace products including Google Drive, Google Docs, Google Sheets, and Google Slides. While the outage did not impact every user, it was widespread and disruptive. StatusGator detected the incident early using real user data and issued an Early Warning Signal long before Google officially acknowledged the issue.

The Hidden Bottleneck in Latency: GetYourGuide's Database Performance Journey

Fast front-end and back-end code alone won’t guarantee low end-to-end latency as hidden bottlenecks in the database can undermine even the best engineering efforts. In this session, Oleksii Serhiienko, Senior Site Reliability Engineer at GetYourGuide, will share how his team put database performance at the center of their monitoring strategy. He will highlight how they identified and fixed slow queries, uncovered load balancing issues that drove significant cost savings, and built monitoring practices that improved both reliability and investigation workflows.

From Error to Fix: AI-Powered Debugging with Sentry and GitHub

​This session will focus on the agent based features of Sentry for debugging an issue in a web application. We'll move through the broken issue - and show how tools like Sentry Seer and the GitHub repo integration make it easy to determine the root cause of an issue by bringing all the context of Sentry and code in GitHub together, and how the Sentry MCP makes it easy to pull all that context down into GitHub CoPilot to fix it locally.