Datadog on Kubernetes Autoscaling

Datadog

Feb 9, 2024

Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. Also, this infrastructure is used by a wide variety of engineering teams at Datadog, with different features and capacity needs that may also change overtime.

How do we make sure our applications have the compute resources they need at any given time? How do we ensure that our cloud costs are a real reflection of that compute need and we are not wasting resources? What metrics do we use to drive those autoscaling events?

In this session, Ara Pulido, Staff Developer Advocate, chatted with Charly Fontaine, Engineering Manager and Corentin Chary, Senior Staff Software Engineer about Datadog autoscaling strategies in Kubernetes –vertical and horizontal–, including what metrics teams are using to drive their autoscaling events.

Links referenced in the episode:

Watermark Pod Autoscaler: https://github.com/DataDog/watermarkpodautoscaler
Kubernetes Cluster Autoscaler: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
Horizontal Pod Autoscaling: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Vertical Pod Autoscaler: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler
Datadog Careers Page: https://careers.datadoghq.com/

Chapter marks:

00:00 Welcome

05:34 Kubernetes Cluster Autoscaler

10:16 Horizontal Pod Autoscaling

18:05 Watermark Pod Autoscaling

23:32 Best practices for Horizontal Scaling

35:05 Vertical Pod Autoscaling

50:56 Q&A