Operations | Monitoring | ITSM | DevOps | Cloud

Tracing asynchronous systems in your event-driven architecture: When to use parent-child vs. span links

Asynchronous communication patterns are commonly used in distributed systems, especially in those that rely on events or messages to coordinate activity. Rather than responding to direct API calls like in a traditional request-response architecture, services in an asynchronous system produce, route, or consume events and messages independently.

How to build reliable and accurate synthetic tests for your mobile apps

Mobile applications offer increased flexibility to both users and developers. Users can access content on a wide range of devices, operating systems, and network types, while developers can leverage touch screens and orientation-based layouts to create more responsive features. However, all of these factors create new testing challenges. To ensure a good user experience (UX), developers have to test their apps across many device models and platforms, which can become costly and time-consuming.

Prevent cloud misconfigurations from reaching production with Datadog IaC Security

Modern infrastructure is built and deployed faster than ever, but increased speed can elevate risk. Developers who work on cloud-native applications often use infrastructure as code (IaC) to define cloud resources in configuration files, which are then shared across teams and deployed automatically. Although this approach is efficient, undetected misconfigurations in IaC can quickly introduce security risks into production environments.

A guide to cloud unit economics

As you analyze your organization's cloud spending, you'll often find that stakeholders have different perceptions of what that spending brings you. This is especially true when overall costs are rising and it's hard to distinguish waste from valuable investments in growth. But when finance, engineering, and product teams can all connect cloud spending to specific business outcomes, you gain the ability to make data-driven decisions about how to maximize the value of that spending.

Patterns for safe and efficient cache purging in CI/CD pipelines

"There are only two hard things in Computer Science: cache invalidation and naming things."—Phil Karlton In the age of increasingly frequent deploys, edge caching, and Jamstack adoption, caching plays a key role across the software delivery life cycle. In build and CI pipelines, caching compiled assets or dependencies helps reduce compute costs, speed up job runtimes, and lower the environmental impact (regarding energy usage) of repeated builds.

This Month in Datadog - July 2025

In July’s episode of This Month in Datadog, we’re doing things differently by spotlighting the people behind the products you rely on. Jeremy is joined by Tristan Ratchford to discuss saving time and effort when you’re on call with Bits AI SRE, and by Kevin Hu to explore gaining visibility into datasets across the entire data lifecycle with Data Observability.

Datadog Disaster Recovery mitigates cloud provider outages

A loss in infrastructure and applications observability can leave SRE and DevOps teams without insight into the real-time state of their production systems, causing them to temporarily pause code deployments and limit their ability to troubleshoot issues or respond to critical alerts. In modern cloud environments, where services are distributed and deeply interconnected, this lack of visibility can escalate quickly.

Bring high-performance observability to secure Kubernetes environments with Datadog's new CSI driver

In Kubernetes environments, applications often communicate with the Datadog Agent to send telemetry data such as custom metrics via DogStatsD or traces through Datadog APM. How this communication takes place depends on the communication mode set on the Datadog Cluster Agent's Admission Controller. With the sockets option, communication takes place through local inter-process communication via Unix domain sockets (UDS), whereas the service and default hostip options rely on network communication.

This Month in Datadog: Bits AI SRE, Datadog Data Observability, and more

Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. To learn more about Datadog and start a free 14-day trial, visit Cloud Monitoring as a Service | Datadog. This month, we chat with two guests about Bits AI SRE and Datadog Data Observability.

AI Agents Console: Monitor the behavior and interactions of any AI agent in your stack

With Datadog's AI Agents Console, you can monitor the behavior and interactions of any AI agent that’s a part of your enterprise stack, whether that’s a computer use agent like OpenAI’s Operator, IDE agent like Cursor, DevOps agent like Github Copilot, enterprise business agent like Agentforce, or your internally built agents. You'll have full visibility into every agent's actions, insights into the security and performance of your agents, analytics on user engagement, and measurable business value from every agent, all in a centralized location.