Operations | Monitoring | ITSM | DevOps | Cloud

January 2024

Monitor processes running on AWS Fargate with Datadog

Serverless platforms like AWS Fargate enable teams to focus on delivering value to customers by freeing up time otherwise spent managing infrastructure and operations. However, maintaining a deep level of observability into applications running on these fully managed platforms remains challenging.

Go memory metrics demystified

For engineers in charge of supporting Go applications, diagnosing and resolving memory issues such as OOM kills or memory leaks can be a daunting task. Practical and easy-to-understand information about Go memory metrics is hard to come by, so it’s often challenging to reconcile your system metrics—such as process resident set size (RSS)—with the metrics provided by the old runtime.MemStats, with the newer runtime/metrics, or with profiling data.

Troubleshoot streaming data pipelines directly from APM with Datadog Data Streams Monitoring

When monitoring applications with streaming data pipelines, there are additional complexities to consider that are not present in traditional batch-processing systems. Whether you’re using streaming data pipelines to power a digital trading platform, capture sensor data from an IoT device, or recommend news articles to users, it can be challenging to identify the root cause of delays when you’re dealing with distributed systems, real-time data, and the dynamic nature of events.

Streamline Azure container monitoring with the Datadog AKS cluster extension

Azure Kubernetes Service (AKS) enables you to easily deploy and manage containerized applications in Azure while leveraging Microsoft resources such as development tools, security features, and more. As with any Kubernetes service, the sheer volume of containers being orchestrated makes monitoring AKS cluster health challenging, which can slow response times to critical incidents and create bottlenecks around long-term optimizations.

Monitor BigQuery with Datadog

BigQuery is Google Cloud Platform’s fully managed serverless data warehouse. It enables data analysis and storage at petabyte scale while eliminating the overhead of managing infrastructure. As a managed service, BigQuery autoscales and provisions compute resources and storage as needed, helping you reduce the overhead of managing infrastructure but also reducing your visibility into performance. And BigQuery users face other challenges when it comes to visibility.

Monitor Oracle managed databases with Datadog DBM

Datadog Database Monitoring (DBM), which provides host-level and query performance metrics and insights for PostgreSQL, MySQL, and SQL Server, is now available for Oracle. Oracle is one of the most common database types, and now teams that operate Oracle databases can use Datadog to monitor these resources alongside telemetry from across their environments.

Datadog on Design Systems

Over the last five years, the Datadog platform has grown. We added Application Performance Monitoring to complement our core infrastructure monitoring product, Log Management, Synthetic and Real User Monitoring, and more. For an enterprise software platform to be successful, the whole has to be greater than the sum of its parts. In Datadog’s case, this means users must be able to connect different types of data, pivot seamlessly from one context to another, and follow the thread of an investigation wherever it might lead.

Transform Your Customer Experience with DevOps Collaboration

Learn how end-to-end monitoring and observability enable enterprises to break down team silos and deliver industry-leading experiences for their customers and achieve business benefits such as: Improved business resilience by identifying and resolving IT risks faster before they result in customer service outages Increased competitive standing with DevOps and shift-left best practices to accelerate software releases.

Scaling Down Kubernetes Clusters

Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. This infrastructure is used by a wide variety of engineering teams at Datadog, with different feature and capacity needs.

Provisioning and Autoscaling

Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. This infrastructure is used by a wide variety of engineering teams at Datadog, with different feature and capacity needs.

Simplify customer support with Datadog's integrations for Zendesk

Zendesk provides support teams with an integrated solution for processing all types of customer inquiries and feedback. But as organizations scale, support tickets multiply, making it increasingly difficult to parse all of your customers’ feedback and time-consuming to investigate issues. Customers often report issues without providing the detailed context needed for troubleshooting, creating unclear and indirect paths to remediation.

Paving the Road for Proactive Reliability

At Expedia Group, Kaushik Patel and Nikos Katirtzis have thousands of engineers and micro-services. Heterogeneity in terms of infrastructure and technologies used over the years created inefficiencies and posed the need for a set of automated best practices for our engineering teams. Over the past 2 years, using a data-driven approach, we’ve worked on creating a set of platforms that helps teams to adopt good reliability practices, including chaos engineering, release safety, or automatic failover between cloud regions. In this talk Kaushik and Nikos will cover the platforms they’ve built, including how they used data to drive their investment decisions.

Detect Java code-level issues with Seagence and Datadog

In Java applications, concurrency issues can be difficult to reproduce and debug. Because work is scheduled nondeterministically across threads, the conditions that have led to an error in one execution of the program may not trigger the same issue the next time around. Exceptions that are silently handled—also known as swallowed exceptions—can also be challenging to debug because they typically do not leave any trace in the logs.

Quickly remediate issues in your Azure applications with Datadog Workflow Automation

Datadog Workflow Automation speeds up incident response and remediation for DevOps, SRE, and security teams by enabling them to automatically run predefined task sequences whenever specific alerts or security signals are triggered. After the feature’s initial release in 2023, Datadog is now excited to announce a significant expansion of its Workflow Automation capabilities with Azure actions, allowing engineers to create automated workflows for their Azure resources for the first time.