Operations | Monitoring | ITSM | DevOps | Cloud

June 2020

Observability at The Edge with Fastly and Datadog

You use CDNs because they allow you to serve content as quickly and reliably as possible. But how well are your systems performing? How securely are you moving data—and how do you know which parts of your environment are slowing you down? Learn how to improve end user experiences, accelerate development, and take full advantage of edge computing in this joint webinar.

Driving Service Reliability Through Autoscaling Workloads on OpenShift

In this webinar, Ara Pulido, Technical Evangelist at Datadog, will demonstrate how to autoscale your application workloads on OpenShift. You will learn frameworks for how to identify their key work and resource metrics; as well as how to use them to drive horizontal and vertical pod autoscaling so that you can maximize efficiency, while ensuring service reliability.

Best practices for managing your SLOs with Datadog

Collaboration and communication are critical to the successful implementation of service level objectives. Development and operational teams need to evaluate the impact of their work against established service reliability targets in order to improve their end user experience. Datadog simplifies cross-team collaboration by enabling everyone in your organization to track, manage, and monitor the status of all of their SLOs and error budgets in one place.

Service level objectives 101: Establishing effective SLOs

In recent years, organizations have increasingly adopted service level objectives, or SLOs, as a fundamental part of their site reliability engineering (SRE) practice. Best practices around SLOs have been pioneered by Google—the Google SRE book and a webinar that we jointly hosted with Google both provide great introductions to this concept. In essence, SLOs are rooted in the idea that service reliability and user happiness go hand in hand.

Monitor HiveMQ with Datadog

HiveMQ is an open source MQTT-compliant broker for enterprise-scale IoT environments that lets you reliably and securely transfer data between connected devices and downstream applications and services. With HiveMQ, you can provision horizontally scalable broker clusters in order to achieve maximum message throughput and prevent single points of failure.

Best practices for creating end-to-end tests

Browser (or UI) tests are a key part of end-to-end (E2E) testing. They are critical for monitoring key application workflows—such as creating a new account or adding items to a cart—and ensuring that customers using your application don’t run into broken functionalities. But browser tests can be difficult to create and maintain. They take time to implement, and configurations for executing tests become more complex as your infrastructure grows.

Datadog Application Performance Monitoring

Datadog APM provides deep visibility into application performance and code efficiency, so you can monitor and optimize your stack at any scale and provide the best digital experience for your users. APM and distributed tracing are fully integrated with the rest of Datadog, giving you rich context for troubleshooting issues in real time.

How to categorize logs for more effective monitoring

Logs provide a wealth of information that is invaluable for use cases like root cause analysis and audits. However, you typically don’t need to view the granular details of every log, particularly in dynamic environments that generate large volumes of them. Instead, it’s generally more useful to perform analytics on your logs in aggregate.

Monitor RethinkDB with Datadog

RethinkDB is a document-oriented database that enables clients to listen for updates in real time using streams called changefeeds. RethinkDB was built for easy sharding and replication, and its query language integrates with popular programming languages, with no need for clients to parse commands from strings. The open source project began in 2012, and joined the Linux Foundation in 2017.

Test file uploads and downloads with Datadog Browser Tests

Understanding how your users experience your application is critical—downtime, broken features, and slow page loads can lead to customer churn and lost revenue. Last year, we introduced Datadog Browser Tests, which enable you to simulate key user journeys and validate that users are able to complete business-critical transactions.

Monitor Carbon Black Defense logs with Datadog

Creating security policies for the devices connected to your network is critical to ensuring that company data is safe. This is especially true as companies adopt a bring-your-own-device model and allow more personal phones, tablets, and laptops to connect to internal services. These devices, or endpoints, introduce unique vulnerabilities that can expose sensitive data if they are not monitored.

Introducing our AWS 1-click integration

Datadog’s AWS integration brings you deep visibility into key AWS services like EC2 and Lambda. We’re excited to announce that we’ve simplified the process for installing the AWS integration. If you’re not already monitoring AWS with Datadog, or if you need to monitor additional AWS accounts, our 1-click integration lets you get started in minutes.

Datadog on Kubernetes

When 2 years ago Datadog decided to move its infrastructure platform to Kubernetes we didn’t expect to find so many roadblocks, but ingesting trillions of datapoints per day in a reliable fashion requires pushing the limits of cloud computing. Creating and managing dozens of clusters, with thousands of nodes each and operating in several clouds was a challenging but rewarding learning experience. In this episode Ara Pulido, Developer Advocate, will chat with Laurent Bernaille, Staff Engineer at Datadog and part of the team that created Datadog’s Kubernetes platform. We’ll cover the challenges we found creating and scaling Datadog’s Kubernetes platform and how we overcame them.

Using Log Patterns to Create Log Exclusion Filters | Datadog Tips & Tricks

In part 2 of this 2 part series, you’ll learn how to use Log Patterns to quickly create log exclusion filters and reduce the number of low-value logs you are indexing. Datadog’s Logging with Limits™ feature allows you to selectively determine which logs to index after ingesting all of your logs. Meanwhile, the Log Patterns feature can quickly isolate groups of low-value logs.

How to Generate Metrics from Logs | Datadog Tips & Tricks

In this video, you’ll learn how to generate metrics using log events attributes to filter your logs more effectively and begin monitoring, graphing and alerting on the new metric immediately. Generating metrics from logs is a powerful tool for monitoring attributes which are parsed from your logs.

Datadog on Kafka

As a company, Datadog ingests trillions of data points per day. Kafka is the messaging persistence layer underlying many of our high-traffic services. Consequently, our Kafka usage is quite high: double-digit gigabytes per second bandwidth and the need for petabytes of high performance storage, even for relatively short retention windows. In this episode, we’ll speak with two engineers responsible for scaling the Kafka infrastructure within Datadog, Balthazar Rouberol and Jamie Alquiza. They'll share their strategy in scaling Kafka, how it’s been deployed on Kubernetes, and introduce kafka-kit; our open source toolkit for scaling Kafka clusters. You'll leave with lessons learned while scaling persistent storage on modern orchestrated infrastructure, and actionable insights you can apply at your organization