%term

Datadog GPU Monitoring: Optimize and troubleshoot AI infrastructure

Nov 18, 2025 By Datadog In Datadog

With Datadog GPU Monitoring, engineering and ML teams can monitor GPU fleet health across cloud, on-prem, and GPU-as-a-Service platforms like Coreweave and Lambda Labs. Real-time insights into allocation, utilization, and failure patterns make it easy to spot bottlenecks, eliminate idle GPU spend, and resolve provisioning gaps. By tying usage metrics directly to cost and surfacing hardware and networking issues impacting performance, Datadog helps teams make fast, cost-efficient decisions to keep AI workloads running reliably at scale.

View Video

Datadog

Read more about Datadog GPU Monitoring: Optimize and troubleshoot AI infrastructure

Bringing Observability to Data

Nov 14, 2025 By Datadog In Datadog

While observability practices have evolved in recent years, they have largely focused on application services and infrastructure. Yet it is data what powers our applications, businesses, and AI models. When data issues occur, the consequences can be far reaching, from poor product experiences to billing errors to misinformed AI outcomes. In this session, Jonathan Morin, Group Product Manager at Datadog, shares real-world examples of incidents and explains how data observability can address them, helping teams detect issues earlier, reduce costly downtime, and restore trust in their data.

View Video

Datadog

Read more about Bringing Observability to Data

The Hidden Bottleneck in Latency: GetYourGuide's Database Performance Journey

Nov 14, 2025 By Datadog In Datadog

Fast front-end and back-end code alone won’t guarantee low end-to-end latency as hidden bottlenecks in the database can undermine even the best engineering efforts. In this session, Oleksii Serhiienko, Senior Site Reliability Engineer at GetYourGuide, will share how his team put database performance at the center of their monitoring strategy. He will highlight how they identified and fixed slow queries, uncovered load balancing issues that drove significant cost savings, and built monitoring practices that improved both reliability and investigation workflows.

View Video

Datadog

Read more about The Hidden Bottleneck in Latency: GetYourGuide's Database Performance Journey

Use Grok parsing to extract fields from logs | Datadog Tips & Tricks

Nov 12, 2025 By Datadog In Datadog

When your logs don’t follow a standard format, it can be difficult to extract valuable information, like key-value pairs and nested JSON objects. Grok parsing lets you define flexible patterns that match unstructured log data so you can extract specific fields to query, filter, and visualize. In this video, you’ll learn how to: By refining your Grok parsers, you can make your logs more useful for analytics, dashboards, or alerts, and get even more value from your logs.

View Video

Datadog

Read more about Use Grok parsing to extract fields from logs | Datadog Tips & Tricks

Sync your Backstage catalog with Datadog IDP

Nov 11, 2025 By Mark Avery In Datadog

Backstage is a popular open source framework for building internal developer portals (IDPs) used by organizations to aggregate service metadata and create a single source of truth for their software developers. However, data stored in the Backstage Software Catalog can quickly become siloed and inaccessible from monitoring tools such as Datadog.

Read Post

Datadog

Read more about Sync your Backstage catalog with Datadog IDP

Eliminate unnecessary costs in your Amazon S3 buckets with Datadog Storage Management

Nov 10, 2025 By Mahashree Rajendran In Datadog

Cloud object storage powers a wide range of workloads, from AI training datasets to customer-facing media libraries. As your data grows into the petabyte scale, managing storage costs and ensuring reliability requires fine-grained visibility. You need answers to questions like: Which specific teams, services, workloads, or datasets are driving spend? Which data is cold and should be archived? What fixes will have the biggest impact on cost and performance?

Read Post

Datadog

Read more about Eliminate unnecessary costs in your Amazon S3 buckets with Datadog Storage Management

Observability and FedRAMP in Action: The VA's Mission to Deliver Reliable Digital Service

Nov 10, 2025 By Greg Reeder In Datadog

Ensuring digital services remain accessible, reliable, and secure is a high priority for any organization operating at scale. For the Department of Veterans Affairs (VA), this focus is central to its mission of providing quality care to veterans, their families, and caregivers. Often described as “the largest IT shop in the United States,” the VA manages 2.7 million pieces of equipment across a vast network of interconnected systems.

Read Post

Datadog

Read more about Observability and FedRAMP in Action: The VA's Mission to Deliver Reliable Digital Service

How feedback loops power progressive software delivery

Nov 10, 2025 By Candace Shamieh In Datadog

Modern engineering teams face competing priorities. Developers are expected to deliver new features faster than ever, but users expect rock-solid reliability with every release. Shipping quickly can feel like you’re gambling with user trust. If you move too fast, you risk outages, but if you move too slowly, innovation stalls.

Read Post

Datadog

Read more about How feedback loops power progressive software delivery

Import Snowflake, Salesforce, ServiceNow, and Databricks metadata into Datadog with Reference Tables

Nov 6, 2025 By Jinwu Liu In Datadog

Engineering, operations, and security teams can struggle to make sense of their telemetry data in isolation. Logs, metrics, and events tell what is happening but are often missing critical metadata like who owns what, where it's coming from, or indicators of attack. These gaps in visibility slow down incident response, complicate cost control, and make business or security analytics much harder.

Read Post

Datadog

Read more about Import Snowflake, Salesforce, ServiceNow, and Databricks metadata into Datadog with Reference Tables

Catch and remediate ECS issues faster with default monitors and the ECS Explorer

Nov 6, 2025 By Sumedha Mehta In Datadog

Organizations that run applications on Amazon Elastic Container Service (Amazon ECS) often juggle signals across container and task metrics, logs, and events while they hunt for the change or condition that broke a deployment. This work adds operational overhead and extends incident timelines as teams switch between tools and manually correlate symptoms.

Read Post

Datadog

Read more about Catch and remediate ECS issues faster with default monitors and the ECS Explorer

Operations | Monitoring | ITSM | DevOps | Cloud

Datadog GPU Monitoring: Optimize and troubleshoot AI infrastructure

Bringing Observability to Data

The Hidden Bottleneck in Latency: GetYourGuide's Database Performance Journey

Use Grok parsing to extract fields from logs | Datadog Tips & Tricks

Sync your Backstage catalog with Datadog IDP

Eliminate unnecessary costs in your Amazon S3 buckets with Datadog Storage Management

Observability and FedRAMP in Action: The VA's Mission to Deliver Reliable Digital Service

How feedback loops power progressive software delivery

Import Snowflake, Salesforce, ServiceNow, and Databricks metadata into Datadog with Reference Tables

Catch and remediate ECS issues faster with default monitors and the ECS Explorer

Monthly Archive

Follow Us