Operations | Monitoring | ITSM | DevOps | Cloud

November 2023

Introducing CoTerm, your collaborative terminal for pair programming and debugging

For too long, engineers have had to piece together an unwieldy combination of tools to collaboratively debug and resolve incidents while pair programming in real time. These activities normally require developers to work individually through a terminal, but the patchwork solutions that allow teams to work together in terminals all have significant drawbacks.

Monitor Amazon S3 Express One Zone with Datadog

Amazon Simple Storage Service (S3) now offers a high-performance storage class, S3 Express One Zone, that delivers consistent single-digit millisecond data access for your most latency-sensitive applications. Designed for your most frequently accessed datasets, S3 Express One Zone replicates and stores your data within a single AWS Availability Zone, scales to process millions of requests per minute, and uses hardware and software optimized for low latency.

Govern your infrastructure resources with the Datadog Resource Catalog

As an administrator of an expanding, highly distributed infrastructure, you may be responsible for overseeing thousands of on-premise and cloud resources from multiple providers—governed under dozens of accounts by a complex nest of RBAC rules. To query all these resources for purposes such as compliance audits and access management, you may be required to write custom scripts and painstakingly sift through data across disparate tools.

Enhance your visibility into OTel-instrumented apps in AWS Lambda

Enabling auto-instrumentation for your Lambda functions provides detailed insights into the performance and security of your serverless applications. Developers often also use custom instrumentation to fine-tune visibility and further tailor telemetry to their business needs. However, different teams within your organization might use a variety of instrumentation libraries, and achieving more granular visibility can come at the expense of data portability and interoperability.

Monitor and improve your CI/CD on AWS CodePipeline with Datadog CI Visibility

CI/CD services such as AWS CodePipeline enable developers to automate and accelerate the process of building, testing, and deploying code. But with the speed, scale, and complexity of the modern software development life cycle, even small performance regressions or increases in failure rates in your CI system can quickly snowball, slowing or even halting releases and causing cost overruns.

Enhance your troubleshooting workflow with Container Images in Datadog Container Monitoring

Containers are powerful tools for scaling and deploying your applications, but with so many components pulled from different sources, there’s a greater potential for issues within them to go undetected. As a result, you need to monitor every layer of your containerized environments for vulnerabilities and performance problems—from your application to your container images.

Build custom monitoring and remediation tools with the Datadog App Builder

When you’re responding to an issue with your application in the heat of on-call, you need reliable, well-maintained tooling that’s painless to use. Otherwise, the time you’ll spend combing through monitoring data for context, connecting to hosts and other infrastructure resources, and pivoting between consoles for various managed services can add up quickly and slow your response.

Visualize AWS Step Functions with the State Machine Map

AWS Step Functions allows you to coordinate activity from hundreds of services—including AWS Lambda, Amazon EKS, and Amazon API Gateway—to build and orchestrate serverless workflows. With Step Functions, you organize work into workflows known as state machines, in which each state defines a task or decision and specifies the next state in the workflow.

Monitor Amazon Bedrock with Datadog

Amazon Bedrock is a fully managed service that offers foundation models (FMs) built by leading AI companies, such as AI21 labs, Meta, and Amazon along with other tools for building generative AI applications. After enabling access to validation and training data stored in Amazon S3, customers can fine-tune their FMs to invoke tasks such as text generation, content creation, and chatbot Q&A—without provisioning or managing any infrastructure.

Monitor the state of your Tailscale private network with Datadog

Tailscale is a modern remote access solution that allows customers to easily scale, segment, and manage a private network as their business grows. It enables encrypted point-to-point connections using the open source WireGuard protocol, so that devices on your private network can only communicate with each other.

Secure and monitor infrastructure networking with Buoyant Enterprise for Linkerd in the Datadog Marketplace

As organizations adopt Kubernetes, they face gaps in security, reliability, and observability such as unencrypted communication, lack of multi-cluster support, and missing reliability features like circuit breaking. Buoyant Cloud is the dashboarding and automated monitoring component of Buoyant Enterprise for Linkerd, which helps organizations secure and monitor communication between Kubernetes workloads.

Centrally govern and remotely manage Datadog Agents at scale with Fleet Automation

As customers scale to thousands of hosts and deploy increasingly complex applications, it can be difficult to ensure that every host is configured to give you the visibility you need to monitor your infrastructure and applications. To ensure visibility across a growing number of hosts, you need to know that your observability strategy is implemented uniformly across your entire fleet of Datadog Agents installed on these hosts.

Chasing the Rainbow: Towards Unified Service Metrics

As Zendesk migrated from a monolithic application to an ecosystem of hundreds of services, its need for fully unified and standardized observability became a chief concern. In this talk, Senior Principal Engineer Daniel Schierbeck shares how adopting a service mesh has helped Zendesk teams manage its growing number of services while standardizing its observability. He also explains how Zendesk’s approach to monitoring service interactions has evolved as it adopted Datadog metrics and Datadog APM.

Datadog acquires Actiondesk

Datadog customers have an abundance of observability data at their fingertips. Using this data effectively requires having the right visualizations and analysis tools. For some teams, the powerful functionality of spreadsheets is critical to their ability to make data-driven forecasting and business decisions. That’s why we are pleased to announce that Actiondesk—a spreadsheet-powered connection to your live data—is joining Datadog.

How Mercado Libre scales its AWS microservices without losing visibility

Learn how Mercado Libre acts more quickly, strategically, and proactively thanks to Datadog’s centralized platform and context-rich alerting.Mercado Libre hosts the largest online commerce and payments ecosystem in Latin America, which means thousands of dollars can be lost if some of their critical applications stop working for even 1 minute. Senior Technical Manager Juliano Martins and software expert Marcelo Quadros share a few reasons why they chose Datadog as their observability platform of choice for their AWS environment: the power of our infrastructure monitoring solution, extensive range of integrations, strong reputation in the market, and more.

Formalize your organization's best practices with custom Scorecards in Datadog

The Datadog Service Catalog is a centralized hub of information around the performance, reliability, security, efficiency, and ownership of your distributed services. By using the Service Catalog, teams can eliminate knowledge silos and realize seamless DevSecOps workflows.

How we manage incidents at Datadog

Incidents put systems and organizations to the test. They pose particular challenges at scale: in complex distributed environments overseen by many different teams, managing incidents requires extensive structure and planning. But incidents, by definition, break structures and foil plans. As a result, they demand carefully orchestrated yet highly flexible forms of response. This post will provide a look into how we manage incidents at Datadog. We’ll cover our entire process.

I've Made a Huge Mistake: Implementing Agile on Infrastructure Teams

Bad planning methods can damage team morale and prevent teams from improving the systems they maintain. In this talk, Sam Handler from Shopify explains how his attempts to fix poor infrastructure planning processes through Agile methods failed. Drawing from this experience, he offers several principles that can help infrastructure teams improve the way they work.

How Uber Freight Powers Intelligent Logistics with Datadog

Thiyagarajan Anandan, Uber Freight, shares how he and his team have created a center of excellence for monitoring and DevOps culture. Uber Freight, a division of Uber, delivers an end-to-end enterprise suite of Relational Logistics to advance supply chains and move the world’s goods. With more than 1,000 shippers across $18B freight under management (FUM), it’s critical for Uber Freight to provide a 99.99% uptime for its shippers and customers. Since migrating to the Datadog platform, Uber Freight for the first time has unlocked the full breadth and depth of their systems, thereby significantly decreasing MTTR/MTTD and delivering an improved customer experience.