Operations | Monitoring | ITSM | DevOps | Cloud

December 2023

Improve your shift-left observability with the Datadog Service Catalog

Your applications are only as powerful as they are iterable. To keep up with their rapidly changing production environments, your teams need reliable CI/CD systems that implement best practices—including build and test automation, flaky test management, and deployment management. By optimizing their CI/CD pipelines, your teams can build their apps more efficiently, deploy them more safely, and catch bugs and security vulnerabilities before they make it to production.

How Toyota is using Datadog and AI/ML to invent new ways for humans to be more mobile #datadog

Toyota is best known for making great cars and trucks, and as a leader in technology and mobility, they are on a mission to build a better future where everyone has the freedom to move. By partnering with Datadog, Toyota is taking advantage of the latest AI/ML to innovate and invent new ways for humans to be more mobile, while future proofing Toyota’s tech stack.

Investigate your log processing with the Datadog Log Pipeline Scanner

Large-scale organizations typically collect and manage millions of logs a day from various services. Within these orgs, many different teams may set up processing pipelines to modify and enrich logs for security monitoring, compliance audits, and DevOps. Datadog Log Pipeline let you ingest logs from your entire stack, parse and enrich them with contextual information, add tags for usage attribution, generate metrics, and quickly identify log anomalies.

Scaling Up, One Network Bottleneck at a Time #shorts #datadog

Processing data at scale involves moving packets through a network—but what happens when that network isn't cooperative? Anatole Beuzon, a Software Engineer at Datadog, discusses how he investigated and resolved network issues in Datadog’s larger data-processing apps and how you can apply these same methods to your own production workloads.

Monitor Ray applications and clusters with Datadog

Ray is an open source compute framework that simplifies the scaling of AI and Python workloads for on-premise and cloud clusters. Ray integrates with popular libraries, data stores, and tools within the machine learning (ML) ecosystem, including Scikit-learn, PyTorch, and TensorFlow. This gives developers the flexibility to scale complex AI applications without making changes to their existing workflows or AI stack.

Track service provider outages with IsDown and Datadog

When your apps and infrastructure rely on dozens of third-party providers for key functionality, it’s important to closely track their outages. If a service you rely on goes down, you need to move quickly to limit the outage’s impact on your users. IsDown provides a detailed status page aggregator and uptime monitoring for all your third-party dependencies.

Building an Internal Development Platform (IDP): A Journey of Innovation and Growth #shorts

As your organization grows, the increased number of engineers and services can put a strain on your infrastructure and ops teams. As Latin America’s largest online commerce and payments ecosystem, MercadoLibre needed to solve this scaling challenge. So we embarked on a mission to build an Internal Development Platform (IDP). We’ll highlight our transformative journey and how the IDP grew to manage over 26,000 microservices, while delivering a highly productive environment to MercadoLibre’s 12,000+ developers. In this session, you’ll learn about the challenges and solutions required to successfully build your own IDP.

Monitor your chaos engineering experiments with Steadybit's offering in the Datadog Marketplace

Steadybit is a software reliability platform that uses chaos engineering and fault injection to help organizations improve the stability and performance of their applications. By allowing customers to simulate turbulent scenarios in a controlled environment, Steadybit enables you to identify and mitigate potential system issues to reduce downtime and improve resilience.

FinOps and Cloud Cost Optimization #shorts #datadog #cloudservices

As companies scale, it’s become increasingly important to keep cloud cost management and optimization top of mind. In this talk, Yuval Yogev from Sygnia walks you through Sygnia’s optimization journey of cutting their total cloud costs in half. Yogev also shares insights into how you can optimize your own organization’s cloud usage and spend.

A deep dive into CPU requests and limits in Kubernetes

In a previous blog post, we explained how containers’ CPU and memory requests can affect how they are scheduled. We also introduced some of the effects CPU and memory limits can have on applications, assuming that CPU limits were enforced by the Completely Fair Scheduler (CFS) quota. In this post, we are going to dive a bit deeper into CPU and share some general recommendations for specifying CPU requests and limits.

CTO Fireside Chat #cto #asana #datadog #leadership #ml #ai #shorts

Building large scale technical systems is hard, but building and scaling high performing technical organizations is even more difficult. In this session, Datadog Co-founder and CTO Alexis Lê-Quôc will sit down with Prashant Pandey, Head of Engineering at Asana, to discuss their approach to engineering leadership. They’ll share the hard-learned lessons from their long careers to help you cultivate better technical teams, covering topics from staying in tune with new technologies, enabling innovation , shipping modern ML and AI-based features, and scaling teams.

Highlights from AWS re:Invent 2023

Whether or not you made the journey to this year’s re:Invent, there’s always a variety of great announcements lost amid an action-packed week of keynotes, breakouts, expo hall demos, and networking sessions. No need to worry—we’re always happy to be a big part of the re:Invent experience and share our observations with you.

Datadog on Kubernetes Node Management #datadog #kubernetes #observability #infrastructure #shorts

Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-#cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. This infrastructure is used by a wide variety of engineering teams at Datadog, with different feature and capacity needs.

re:Invent Recap Livestream

Did you miss this year’s re:Invent? Or maybe you were onsite but too busy deep diving on certifications, new products, and networking. Don’t worry – the Datadog team is streaming right to your home on December 5th to recap all of the highlights from the event. Join Andrew Krug from Datadog’s Technical Community and a host of AWS guests LIVE to hear about exciting announcements from AWS re:Invent 2023, Datadog’s latest product launches, and a run-down of the best On Demand sessions that you’ll want to make sure to tune into.

How Toyota Connected uses Datadog Workflow Automation to reduce time to resolution #datadog #shorts

Hear from Toyota Connected’s DevOps Engineers about how Datadog Workflow Automation helps them easily automate their infrastructure tasks, thereby reducing the time needed to resolve incidents and disruptions.