Operations | Monitoring | ITSM | DevOps | Cloud

Track the status of all your SLOs in Datadog

Service level objectives, or SLOs, are a key part of the site reliability engineering toolkit. SLOs provide a framework for defining clear targets around application performance, which ultimately help teams provide a consistent customer experience, balance feature development with platform stability, and improve communication with internal and external users.

Best practices for managing your SLOs with Datadog

Collaboration and communication are critical to the successful implementation of service level objectives. Development and operational teams need to evaluate the impact of their work against established service reliability targets in order to improve their end user experience. Datadog simplifies cross-team collaboration by enabling everyone in your organization to track, manage, and monitor the status of all of their SLOs and error budgets in one place.

Troubleshoot infrastructure faster with Recent Changes

Infrastructure changes often trigger incidents, but troubleshooting these incidents is challenging when responders have to navigate through multiple tools to correlate telemetry with configuration changes. This lack of unified observability leads to longer mean time to resolution (MTTR), greater operational stress, and ultimately, negative business outcomes.

Troubleshoot infrastructure issues faster with Resource Changes

Infrastructure changes often trigger incidents, but troubleshooting these incidents is challenging when responders have to navigate through multiple tools to correlate telemetry with configuration changes. This lack of unified observability leads to longer mean time to resolution (MTTR), greater operational stress, and ultimately, negative business outcomes.

Diagnose runtime and code inefficiencies in production by using Continuous Profiler's timeline view

When you face issues like reduced throughput or latency spikes in your production applications, determining the cause isn’t always straightforward. These kinds of performance problems might not arise for simple reasons such as under-provisioned resources; often, the root of the problem lies deep within an application’s runtime execution.

Troubleshoot and optimize data processing workloads with Data Jobs Monitoring

Data is central to any business: it powers mission-critical applications, informs business decisions, and supports the growing adoption of AI/ML models. As a result, data volumes are only increasing, and teams rely on engines like Apache Spark and managed platforms like Databricks or Amazon EMR to process this data at scale.

Remediate Google Cloud issues with new actions in Workflow Automation and App Builder

Datadog Actions help you respond to alerts and manage your infrastructure directly from within Datadog. This can be done by creating workflows that automate end-to-end processes or by using App Builder to build resource management tools and self-serve developer platforms. With more than 550 available actions, Datadog Actions offers capabilities such as creating Jira tickets, resizing autoscaling groups, and triggering GitHub pipelines.

Build custom monitoring and remediation tools with Datadog App Builder

When you’re responding to an issue with your application in the heat of on-call, you need reliable, well-maintained tooling that’s painless to use. Otherwise, the time you’ll spend combing through monitoring data for context, connecting to hosts and other infrastructure resources, and pivoting between consoles for various managed services can add up quickly and slow your response.

Focus on code that matters with source code previews in Continuous Profiler

The use of code profiling to troubleshoot application performance can appear daunting to the uninitiated, and many software engineers even assume that this domain is reserved for niche specialists. But here at Datadog, one of the key goals for our Continuous Profiler product has been to take this seemingly intimidating practice of code profiling and make it more accessible to engineers at all levels.