Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Don't fly blind... monitor from your users' perspective.

Most monitoring strategies focus only on what happens inside their applications... but that’s not what your users experience. From your backend to the cloud, through third-party APIs, DNS, CDNs, ISPs, and finally to the user’s device, every link in the chain matters. Without that visibility, you're flying blind when something breaks in your Internet Stack. Catchpoint’s 3,000+ intelligent agents across 100+ countries deliver true end-to-end visibility, capturing every hop, every variable, and every moment of user impact.

Evals are just tests, so why aren't engineers writing them?

You’ve shipped an AI feature. Prompts are tuned, models wired up, everything looks solid in local testing. But in production, things fall apart—responses are inconsistent, quality drops, weird edge cases appear out of nowhere. You set up evals to improve quality and consistency. You use Langfuse, Braintrust, Promptfoo—whatever fits. You start running your evals, tracking regressions, fixing issues, and confidence goes up as a result. Things improve.

New in APM

Datadog’s Latency Investigator for APM—now in Preview—automatically investigates hypotheses in the background, comparing historical traces and correlating change tracking, DBM, and profiling signals. This helps teams quickly isolate root causes and understand impact without combing through raw telemetry data. You can go from detection to resolution in a single workflow, and generate a pull request to apply a recommended fix, all without leaving Datadog..

AI Agents Console: Monitor the behavior and interactions of any AI agent in your stack

With Datadog's AI Agents Console, you can monitor the behavior and interactions of any AI agent that’s a part of your enterprise stack, whether that’s a computer use agent like OpenAI’s Operator, IDE agent like Cursor, DevOps agent like Github Copilot, enterprise business agent like Agentforce, or your internally built agents. You'll have full visibility into every agent's actions, insights into the security and performance of your agents, analytics on user engagement, and measurable business value from every agent, all in a centralized location.

Multi Factor Authentication for Synthetic Monitoring for AVD

Today, I’ll cover some of the basics of monitoring Multi-Factor Authentication and why ensuring MFA is implemented is essential, particularly in environments where remote access is possible. I’ll cover some recent, specific case studies where a lack of MFA has led to security breaches and the mechanisms the bad actors used.

Splunk Expands Data Management Capabilities To Include Ingest Monitoring

Managing data ingestion at scale is no easy task. As organizations onboard hundreds or even thousands of data sources into the Splunk platform for security, observability, and other business-critical use cases, it becomes increasingly complex to ensure data is consistently available and onboarded efficiently.

How we're killing YAML fatigue with our new K8s integration process

Kubernetes has rapidly grown in adoption, with more than 84% of surveyed users evaluating or actively using Kubernetes in some way. It has become the go-to container orchestration deployment. As we grow the Coralogix platform, we continuously go back and improve flows that we believe will have a high impact on our user base.

Diagnosing Wi-Fi failures that traditional tools miss: a case study

A global airline experienced persistent Google Meet connectivity issues with no apparent network infrastructure faults. While their APM tool offered visibility into network paths, it didn’t surface any local anomalies. Catchpoint’s endpoint monitoring, however, revealed performance degradation specifically on Wi-Fi Channel 44 (5GHz band), where signal strength dropped to -80 dBm compared to optimal ranges of -30 to -50 dBm.

Real-Time Flight Telemetry Monitoring with InfluxDB 3 Enterprise

When Microsoft Flight Simulator 2024 generates telemetry data at 30-60 FPS, capturing and processing that stream in real-time becomes a fascinating engineering challenge. We built a complete telemetry pipeline that reads over 90 flight parameters through FSUIPC, streams them to InfluxDB 3 Enterprise, and displays them in real-time dashboards that respond in under 5 milliseconds.