%term

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

Nov 17, 2025 By Ryan Perry In Grafana

About two months ago, an incident at Grafana Labs was kicked off in typical fashion: A series of alerts were triggered, our on-call engineer acknowledged it on Slack, and the rest of the team quickly began hypothesizing about the potential culprit. But the way the incident was resolved was anything but typical. Yes, our internal team followed best practices to resolve the incident as quickly as possible.

Read Post

Grafana

Read more about A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

What Is a Data Pipeline

Nov 17, 2025 By Jeff Darrington In Graylog

In today’s tech world, IT and security technologies are the functional equivalent of Pokemon. To gain the insights you need, you “gotta catch ‘em all” by ingesting, correlating, and analyzing as much security data as possible. Data pipelines organize chaotic information flows into structured streams, ensuring that data is reliable, processed, and ready for use.

Read Post

Graylog

Read more about What Is a Data Pipeline

Agentic AI and the End of Traditional IT (w/ Robb Wilson)

Nov 17, 2025 By Nexthink In Nexthink

In a wide-ranging conversation, Robb Wilson—CEO and co-founder of OneReach.ai and author of The Age of Invisible Machines—joins Tim and Tom to explore the rise of agentic AI and its seismic implications for IT, organizations, and society. Robb breaks down the concept of agent runtimes, why conversational interfaces matter more than ever, and how adaptive, self-orchestrating systems will reshape work far beyond today’s service models.

View Video

Nexthink

Read more about Agentic AI and the End of Traditional IT (w/ Robb Wilson)

Mezmo's AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)

Nov 17, 2025 By Mezmo In Mezmo

We are thrilled to announce the availability of Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)—a truly transformative leap forward for engineering and operations teams included in your existing subscription at no additional charge. We are paving the way for a new era of observability, moving beyond passive, reactive monitoring to a world of proactive AI-driven observability.

Read Post

Mezmo

Read more about Mezmo's AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)

November 2025 Product Updates

Nov 17, 2025 By Leo Baecker In Hyperping

November delivers our most requested feature—multiple API keys—alongside significant performance improvements and enhanced monitoring capabilities. Here's everything new this month.

Read Post

Hyperping

Read more about November 2025 Product Updates

What is Network Observability vs. Network Monitoring?

Nov 16, 2025 By Alyssa Lamberti In Obkio

Network observability may be seen as a newer term in the world of networking, but it has become critical for managing modern distributed networks. As networks grow more complex with cloud services, remote workers, and distributed applications, traditional network monitoring approaches no longer provide sufficient visibility into network health and performance.

Read Post

Obkio

Read more about What is Network Observability vs. Network Monitoring?

Synthetic Monitoring for Internal Applications: SAP, ERP & More

Nov 15, 2025 By Dotcom-Monitor In Dotcom-Monitor

Modern IT teams know the story by heart: uptime dashboards look green, the public website is fast, yet somewhere inside the corporate network, the finance team can’t submit purchase orders and the factory floor’s ERP terminals are frozen. What broke isn’t the internet—it’s the internal backbone. These internal systems—SAP, Oracle, Microsoft Dynamics, homegrown ERPs, HR and payroll platforms—keep the business running.

Read Post

Dotcom-Monitor

Read more about Synthetic Monitoring for Internal Applications: SAP, ERP & More

Google Workspace outage on November 12: How StatusGator detected it first

Nov 14, 2025 By Colin Bartlett In StatusGator

On November 12, 2025, users around the world faced difficulty accessing Google Workspace products including Google Drive, Google Docs, Google Sheets, and Google Slides. While the outage did not impact every user, it was widespread and disruptive. StatusGator detected the incident early using real user data and issued an Early Warning Signal long before Google officially acknowledged the issue.

Read Post

StatusGator

Read more about Google Workspace outage on November 12: How StatusGator detected it first

The Hidden Bottleneck in Latency: GetYourGuide's Database Performance Journey

Nov 14, 2025 By Datadog In Datadog

Fast front-end and back-end code alone won’t guarantee low end-to-end latency as hidden bottlenecks in the database can undermine even the best engineering efforts. In this session, Oleksii Serhiienko, Senior Site Reliability Engineer at GetYourGuide, will share how his team put database performance at the center of their monitoring strategy. He will highlight how they identified and fixed slow queries, uncovered load balancing issues that drove significant cost savings, and built monitoring practices that improved both reliability and investigation workflows.

View Video

Datadog

Read more about The Hidden Bottleneck in Latency: GetYourGuide's Database Performance Journey

From Error to Fix: AI-Powered Debugging with Sentry and GitHub

Nov 14, 2025 By Sentry In Sentry

This session will focus on the agent based features of Sentry for debugging an issue in a web application. We'll move through the broken issue - and show how tools like Sentry Seer and the GitHub repo integration make it easy to determine the root cause of an issue by bringing all the context of Sentry and code in GitHub together, and how the Sentry MCP makes it easy to pull all that context down into GitHub CoPilot to fix it locally.

View Video