Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

May the Logs Be With You: Graylog 7.1 Is Here

A long time ago, in a SOC far, far away…analysts were drowning in alerts, chasing context across fragmented screens, and watching real threats slip past detection gaps. Today, the Rebellion fights back. This isn’t a release built around a single marquee feature. It’s the result of our team listening to you on the front lines with an ear for removing the friction that makes your jobs harder than they need to be.

This Month in Datadog - April 2026

In the latest episode of This Month in Datadog, Jeremy shares how to run autonomous Cloud SIEM investigations, remediate vulnerabilities with auto-generated fixes, and use natural language to explore Datadog. Later, Sumedha Mehta spotlights the Datadog MCP Server, which gives AI agents real-time access to Datadog’s observability data. Then, Chetan Sharma walks through Datadog Experiments, which measures how product changes impact the user journey.

Monitor and optimize Supabase query performance with Datadog Database Monitoring

Built on Postgres, Supabase is an open source, all-in-one backend platform for developers who want to ship applications without managing infrastructure. This makes it especially popular with frontend developers and vibe coders who may have little to no database expertise. Datadog's Supabase integration provides high-level infrastructure metrics, but developers also need query-level visibility to easily diagnose, optimize, and trace performance issues back to their source.

Taming Log Noise With the OpenTelemetry Collector's Drain Processor

Do you receive 50 million log lines per day and struggle to see what actually matters? Health checks, heartbeat pings, connection pool messages—they all drown out the errors and anomalies you're trying to find. Most teams deal with this by writing filter rules to drop the noisy patterns. But those rules are manual, per-pattern, and brittle. A new deployment changes a log format and the filter misses it. A new service starts logging a chatty startup sequence nobody thought to exclude.

NVIDIA DCGM Collector: Deep GPU Monitoring for Data Center and AI Infrastructure

GPU infrastructure is expensive and increasingly central to production workloads. Whether you’re running ML training jobs, inference serving, video transcoding, or HPC workloads, understanding what your GPUs are actually doing, and what’s going wrong when performance degrades, is not optional.

Obkio Microsoft Teams Monitoring vs. Microsoft Teams Admin Center

Most IT teams rely on Microsoft Teams Admin Center as their default monitoring tool to find and fix Microsoft Teams issues, but there's a gap between what it shows and what actually causes call quality problems. Teams Admin Center gives you Microsoft's perspective on what happened after an MS Teams call ended. It doesn't tell you what was happening on your network, on your users' devices, or in the five minutes before the complaints started coming in.

What's New with Progress WhatsUp Gold 2026.0

Progress WhatsUp Gold 2026.0 helps IT teams improve network visibility, strengthen security and work more efficiently. In this recorded webinar, explore what’s included in this free upgrade for customers with an active service agreement, including: Learn how Progress WhatsUp Gold 2026.0 can deliver proactive visibility with trusted security across your IT infrastructure.

April 2026: IsDown Users Saved 16.5 Hours with Early Outage Detection

In April 2026, IsDown's early detection system gave users a 3.6-hour head start on a major outage — plenty of time to implement workarounds before the vendor even acknowledged the problem. Across 45 early detections, our users saved a collective 16.5 hours by knowing about outages an average of 22 minutes before official status pages were updated.

Real-Time Database Monitoring: Solving Database Latency with Zero-Code eBPF Tracing

In high-throughput database environments, a latency spike is rarely a simple story. Modern data layers are distributed, stateful, and constantly changing as shards move, nodes rebalance, caches warm, queries evolve, and connections churn. In practice, spikes usually come from one of three places: For many SRE and Platform teams, the real challenge is disconnected tooling. As one engineering lead recently shared during a technical workshop: “It’s all disconnected.