Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Cribl and Cloudflare give you full network visibility with real time telemetry

Glenn Block explains how the new Cloudflare source and R2 destination in Cribl Stream lets you ingest WAF, DNS, and Zero Trust logs for full visibility and real time intelligence. Better security, better performance, and lower cost for modern IT and security teams.

9 Third-Party Risk Monitoring Tools That Actually Cut Vendor Assessment Time

Nearly one in three cyber breaches now start with a supplier, McKinsey found in 2024. A single vendor review cycle often spans 3 to 5 weeks due to manual evidence chasing, according to Forrester's 2024 State of Third-Party Risk Report. And a May 2025 Gartner brief warns that this "perfect storm" of attacks, supply-chain shocks and new regulations is forcing boards to modernize third-party risk-fast.

Patterns for Deploying OpenTelemetry Collector at Scale

So, you've embraced OpenTelemetry, and it's been great. Pat, Pat. That single, vendor-neutral pipeline for your traces, metrics, and logs felt like the future. But now, the future is getting bigger. That simple OTel Collector configuration that worked perfectly for a few services is starting to show its limits as you scale. The data volume is climbing, reliability is becoming a concern, and you're wondering if that single collector instance is now a bottleneck waiting to happen.

Datadog Bits AI SRE: Your new teammate for on-call shifts

Bits AI SRE is an always-on SRE agent built to handle complex troubleshooting and late-night alerts. Developed against thousands of real-world incidents and powered by Datadog’s platform, Bits AI SRE analyzes your entire stack, tests hypotheses, and identifies root causes in minutes. Resolve faster, get back to sleep sooner, and give your on-call team the confidence and capacity they need.

Optimize Your Oracle Cloud (OCI) Spend with Datadog Cloud Cost Management

Support for Oracle Cloud Infrastructure (OCI) is now live in Datadog Cloud Cost Management. In this short demo, you’ll learn how to: Get granular visibility into OCI cost and usage—by service, compartment, tag, and resource tier. Uncover savings opportunities by combining cost data with observability metrics like CPU, memory, and storage utilization. Set up anomaly monitors and budgets to avoid cost overruns—especially for high-risk workloads like AI and GPU training.

New Feature: Filter HTTP Pings by Keywords

Healthchecks.io can now classify HTTP pings from clients as start, success, or failure signals not only by URL suffixes (no suffix, /start, /fail, /{exit-status}) but also by looking for specific keywords or phrases in the HTTP request body. The content filtering feature was already available for email pings, and now it has been extended to HTTP pings as well.

Contextual, in-product guidance for every Grafana user: A closer look at Interactive Learning

As developer advocates at Grafana Labs, we’re always looking for new ways to help our users better understand and learn observability. You might remember our previous project that brought learning to life through an adventure-style game, and now we’re really excited to share something else we’ve been working on: Interactive Learning, a new way to get the technical help you need directly in Grafana.

Part 1: What If Data Wasn't Just the Fuel for AI but the Foundation of Everything It Knows?

Every breakthrough begins with a question. What if we looked beyond today’s tools, buzzwords, and hype and examined the design principles shaping tomorrow’s intelligent enterprises? The What If series explores those inflection points: moments where technology meets human judgment, where automation meets accountability, and where AI begins to resemble something more like understanding than output.