Operations | Monitoring | ITSM | DevOps | Cloud

Datadog on Building Responsive UX

Datadog product designers and frontend developers have been working together to create a new, better UX for creating dashboards, which is one of the most important parts of using Datadog. A central part of this effort was building a new layout engine. Working on this project was a bit different from the usual feature work, so the collaboration cycle between our developers and designers had to change for us to more closely and quickly design, build, and test constraints and new ideas in the browser.

Dynamically control your custom metrics volume with Metrics without Limits

Sending custom metrics to Datadog allows you to monitor important data specific to your business and applications, such as latency, dollars per customer, items bought, or trips taken. And tags are key to being able to slice and dice these custom metrics to quickly find the information you need. But collecting enough custom metrics to have complete visibility can be cost prohibitive. For example, you might run microservices instrumented across thousands of containers.

Dash 2021 Keynote

The Datadog team deliver the annual Dash keynote. At Dash 2021, we announced new products and features that give your team even greater visibility into the health and performance of your code, databases, CI/CD pipelines, and more. Now, you can monitor network devices, get visibility into your services' golden signal metrics without touching a single line of code, and integrate third-party tools into our platform with Datadog Apps. We expanded RUM to include iOS error tracking, Session Replay, and Watchdog Insights. And we introduced Datadog Observability Pipelines, which run on your infrastructure and put you in control of your observability data, from how it’s processed to where it’s sent.

Panel: Improving Monitoring & Reliability with Chaos Engineering - Dash 2021 (Datadog,Gremlin,Pismo)

Monitoring and observability are critical for knowing how your systems are behaving, but how do you create the feedback loops to shift from reactive monitoring for incidents to proactively preventing them? In this roundtable discussion Mauricio Galdieri, Software Architect at Pismo.io and Kolton Andrus, CEO and co-founder of Gremlin join Tay Nishimura, Site Reliability Engineer on the Chaos Engineering team at Datadog to chat about monitoring, Chaos Engineering, and using them together to build more reliable systems.

Scaling HashiCorp's Cloud Platform - Dash 2021 (HashiCorp)

Identifying bottlenecks during times of high load is critical to building a scalable software platform. Stress testing is one way to simulate high load on a system and allows you to proactively capture potential bottlenecks before they impact customers. Once a solution is implemented to address the bottleneck, you need a way to measure success and find a new limit. See how HashiCorp Cloud Platform (HCP) has developed a stress testing framework which heavily relies on Datadog’s custom metric capabilities in combination with some out of the box integrations to give HCP engineers a comprehensive view of their platform and how they used these insights to scale their concurrent data-plane provisioning by 300%.

Panel: Handling Incident Response - Dash 2021 (Datadog, PagerDuty)

When customer-impacting downtime happens, it’s crucial that responders are prepared and can resolve these issues as quickly as possible. Knowing the right tools to use, from wherever you are working from, will help to have a well-defined strategy in place to come together as a team, work the problem, and get to a solution quickly. In this roundtable discussion, PagerDuty and Datadog engineers chat about incident responses and how we use all the tools at our disposal to respond quickly and effectively.

Roundtable: The Complexities of Cloud Migration - Dash 2021 (Datadog, LaunchDarkly, StockX)

Often when completing a migration project, you’re having your organisation straddle between two systems. You’re fighting habits and changing attitudes while also attempting to complete a high-risk operation. Every software team at one stage in their career will have to complete a migration. Whether it’s to improve scalability and performance, or transition between an on-prem to cloud solution, you’ll need a deep understanding of your current environment to create a strategy that minimises downtime for your team.

Monitor NS1 with Datadog

NS1 is an intelligent DNS and traffic management platform that helps optimize the performance of your network infrastructure and speed application delivery to your end users. Since even a small increase in service latency can lead to churn and revenue loss, it’s critical to remove any inefficiencies embedded in basic network functions. NS1 helps ensure high performance for name resolution and routing through support for the edns0-client-subnet (ECS) DNS extension and for Filter Chain technology.

Metrics for Apache Kafka with Datadog and Aiven | Ryan Martin (Aiven)

Using managed services is all very well, but how do you get the data you need from the different services into Datadog so you can see it all in one place? This session will walk through the configuration for bringing your Aiven-managed Apache Kafka service metrics into your Datadog explorer. You’ll see how to filter the metrics to focus on specific topics or consumer groups, and how to use the Aiven client to create a repeatable, scriptable setup. This session is recommended for anyone living in the as-a-Service world who cares about data and is interested in using metrics to optimize their Kafka clusters.

Monitoring Open Source Success in Arduino | Silvano Cerza (Arduino)

Arduino is an open-source hardware and software company, project, and user community that designs and manufactures single-board microcontrollers and microcontroller kits for building digital devices. In the course of developing software downloaded and used by millions around the world, we have found it vitally important to be aware of the quality and performance of our software.