Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Understanding AWS Lambda proactive initialization

AJ Stuyvenberg is a Staff Engineer at Datadog and an AWS Serverless Hero. A version of this post was originally published on his blog. In AWS Lambda, a cold start occurs when a function is invoked and an idle, initialized sandbox is not ready to receive the request. Features like Provisioned Concurrency and SnapStart are designed to reduce cold starts by pre-initializing execution environments.

Monitor your NVIDIA GPUs with Datadog

NVIDIA is well known for its computing advancements across a broad range of industries and has become the clear leader in the artificial intelligence (AI) space. Due to their high-performance capabilities, NVIDIA’s discrete graphics processing units (GPUs) now account for approximately 80 percent of the market share for production-level AI, gaming, graphics rendering, and other complex data processing tasks.

Query unsampled logs in real time with Live Search

With thousands of logs generated every minute from your infrastructure, applications, services, and devices, retaining this copious amount of data for active search and analysis can be cost-prohibitive. Because log volumes continue to grow rapidly as operations scale, it’s common for organizations to implement log management strategies and store only a limited number to minimize costs.

I use GitHub Actions for Datadog's Service Catalog, and you should, too

Today’s guest blog is by Mike Stem­le, a software engineer and Principal Architect for the Ar­c XP di­vi­sion of the Wash­ing­ton Post. In his role, Mike focuses on AppSec and large-scale architecture. Any­body who works with me knows that I love the Data­dog Service Catalog.

Best practices for monitoring static web applications

Static sites are currently a popular solution for many lightweight web applications, such as corpsites, blogs, job listings, and documentation repositories. In static web architecture, pages are generated and pre-rendered at build time from markup files, and usually cached in a content delivery network (CDN) for efficient delivery. This saves teams the effort and cost of server management while enabling fast page load times.

Monitor runtime metrics from OTel-instrumented apps with Datadog APM

OpenTelemetry (OTel) is an open source, vendor-neutral observability framework that supplies APIs, SDKs, and tools for the instrumentation of applications and services. As part of our ongoing commitment to OTel, we are excited to announce support for the ingestion and visualization of runtime metrics from OTel-instrumented applications in Java, .NET, and Go.

Datadog named Leader in 2023 Gartner Magic Quadrant for APM and Observability

We are thrilled to announce that, for the third consecutive year, Datadog has been named a Leader in the 2023 Gartner® Magic Quadrant™ for APM and Observability. We believe that this placement reflects Datadog’s continued commitment to understanding our customers’ most complex challenges and building products and services that give them the visibility they need into their applications.

Monitor Windows event logs with Datadog

Whenever an event occurs on your Windows machine, the operating system records an event log that includes details about the nature of the event (e.g., critical runtime error) or security identifiers (for audit events). Windows event logs not only record system and application activity but also user actions and background processes, making them an invaluable tool for monitoring the security and health of your systems.

Best practices for monitoring CDN logs

By storing copies of your content in geographically distributed servers, content delivery networks (CDNs) enable you to extend the reach of your app without sacrificing performance. CDNs lessen the demand on individual web hosts by increasing the number and regional spread of servers that are able to respond to incoming requests for cached content. As a result, they can deliver web content faster and provide a better experience for your end users.

Troubleshoot with Kubernetes events

When Kubernetes components like nodes, pods, or containers change state—for example, if a pod transitions from pending to running—they automatically generate objects called events to document the change. Events provide key information about the health and status of your clusters—for example, they inform you if container creations are failing, or if pods are being rescheduled again and again. Monitoring these events can help you troubleshoot issues affecting your infrastructure.