Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Streamline Incident Management with the New Netdata-ServiceNow Integration

When a critical alert fires at 2 AM, the last thing your on-call engineer should be doing is manual administrative work. Yet, for many teams, that’s exactly what happens. You see the alert in your monitoring tool, then you have to switch contexts, open a new browser tab, log into your ITSM platform, and manually create an incident—all while your systems are failing.

Show Me the AI: Rethinking How AI Fits Into Network Operations

Over the last couple of years, nearly every network and infrastructure observability platform has added the word “AI” to its messaging. Some have introduced helpful capabilities. Others have simply added a chatbot on top of the same dashboards that have existed for a decade. In many ways, the term has started to lose meaning. But inside network operations, the conversation hasn’t disappeared. It has simply become more blunt.

Service Observability, Service Operations and Service Orchestration: Unifying Visibility and Action Across the Enterprise

For large enterprises, the health and resilience of Business Services define customer experience and business reputation. Yet as technology estates grow in complexity, fragmented toolsets and siloed teams make it difficult to maintain service availability and prevent incidents before they impact the business and ultimately, customers.

What is APM? Understanding application performance monitoring

The rapid advancement of technology has revolutionised the way businesses operate and engage with their customers. A delay of even a few seconds can lead to significant drop-offs in engagement and conversions. According to Google's findings, "just a 100-millisecond lag can reduce revenue by 1%, and a half-second delay can cause a 20% drop in search engine traffic".

Top tips for staying focused in a notification-heavy world

Top tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we’re tackling one of modern work’s biggest challenges: staying focused in a world overflowing with notifications. Focus has become an uncommon ability in today's hyper-connected world. Every ping, pop-up, and alert demands our attention, pulling us away from focused work and substance thinking.

Bits AI SRE, Flex Frozen, and GPU Monitoring | DASH 2025

Get a first look at Datadog’s biggest product reveals from DASH 2025. Meet Bits AI SRE, your 24/7 autonomous AI Site Reliability Engineer, Flex Frozen for up to 7 years of managed log retention, and GPU Monitoring for full visibility into your AI workloads. Experience the future of observability in action.

Connecting the dots: Solving IT asset visibility with Dataprime

In large tech organizations, keeping track of every laptop, desktop, and endpoint is one of the IT department’s toughest challenges. Each device needs to be accounted for, properly assigned, and compliant with the organization’s policies, all while teams, offices, and contractors constantly change.

How to Visualize Time Series Data with InfluxDB 3 & Apache Superset

Learn how to visualize time series data from InfluxDB 3 Core using popular open source Apache Superset. This tutorial walks you through setting up both systems with Docker, writing sample IoT data, and creating your first visualization. For more information about Apache Superset, this article may be helpful.

How Prometheus Exporters Work With OpenTelemetry

Running distributed systems means you need clear visibility into how your services behave. Prometheus has been the standard for metrics for a long time, and OpenTelemetry is now giving teams a more consistent way to collect telemetry across their stack. In many setups, you'll have both: existing Prometheus instrumentation that's already in place, and new components instrumented with OpenTelemetry.

Key learnings from the State of Containers and Serverless report

We recently released the 2025 State of Containers and Serverless report, which examines cloud usage data from tens of thousands of Datadog customers. The study shows adoption trends across container orchestration platforms and serverless offerings, and it explores how organizations use those resources to optimize workloads for efficiency, cost, and simplicity.