Operations | Monitoring | ITSM | DevOps | Cloud

June 2021

Mute Datadog alerts for planned downtime

We’re happy to announce the release of new muting features for Datadog monitors. Scoped monitor muting allows teams to eliminate unnecessary alerting during scheduled maintenance, testing, auto scaling events, and instance reboots. Your teams will therefore be able to filter out expected events and quickly pinpoint critical issues in your infrastructure. Previously, monitor muting was binary: all-or-nothing.

Best practices for shift-left testing

There are several different testing methods you can use as part of your development process to ensure you build high-quality applications. Shift-left testing is one approach that has become popular with agile teams because it enables them to move the testing phase to earlier stages of the development life cycle, which is a primary goal for agile development. Shift-left testing has a few advantages over traditional methods.

Real-time distributed tracing for Go and Java Lambda Functions

Serverless applications streamline development by allowing you to focus on writing and deploying code rather than managing and provisioning infrastructure. To help you monitor the performance of your serverless applications, last year we released distributed tracing for AWS Lambda to provide comprehensive visibility across your serverless applications.

Automate remediation of threats detected by Datadog Security Monitoring

When it comes to security threats, a few minutes additional response time can make the difference between a minor nuisance and a major problem. Datadog Security Monitoring enables you to easily triage and alert on threats as they occur. In this post, we’ll look at how you can use Datadog’s webhooks integration to automate responses to common threats Datadog might detect across your environments.

Monitor ActiveMQ Artemis and Classic with Datadog

ActiveMQ is a message broker that uses standard protocols to route messages between disparate services. ActiveMQ currently offers two versions—Classic and Artemis—that it plans to merge into a single version in the future. Both versions provide high throughput, support synchronous and asynchronous messaging, and allow you connect loosely coupled services written in different languages.

Monitor Databricks with Datadog

Databricks is an orchestration platform for Apache Spark. Users can manage clusters and deploy Spark applications for highly performant data storage and processing. By hosting Databricks on AWS, Azure or Google Cloud Platform, you can easily provision Spark clusters in order to run heavy workloads. And, with Databricks’s web-based workspace, teams can use interactive notebooks to share datasets and collaborate on analytics, machine learning, and streaming in the cloud.

Manage incidents on the go with the Datadog mobile app

The Datadog mobile app enables you to check your alerts and dashboards from anywhere, so you can triage issues—and stay up to date—regardless of whether you have access to a laptop. You can now be even more productive when responding to issues while away from your keyboard by declaring incidents and notifying responders directly from your mobile device.

Monitor Salesforce logs with Datadog

Visibility into your Salesforce environment is crucial for keeping your data secure and ensuring a seamless user eperience. That’s why we are excited to announce that Datadog can now collect Salesforce event logs directly from your Real-Time Event Monitoring stream, giving you deep insights into the security and operational performance of your Salesforce environment.

Monitor your cloud architecture and app dependencies with Datadog NPM

Migrating your on-prem infrastructure to the cloud offers a host of benefits, including scalability, mobility, security, and cost reduction. When it comes to cloud network monitoring, tracking the availability and performance of the cloud services your applications rely on becomes even more important. However, moving from self-managed infrastructure to third party–managed services introduces a number of challenges.

Monitor AWS control plane API usage metrics in Datadog

AWS Service Quotas helps you manage limits on the number of resources or API operations that are possible for a given AWS service. Hitting such limits could cause operational disruptions related to getting rate limited on the critical APIs that your applications rely on or being unable to provision additional AWS resources.

Streamline incident management with BigPanda's offering in the Datadog Marketplace

BigPanda is a domain-agnostic AIOps platform that helps organizations detect and resolve incidents in their complex IT environments. By unifying and correlating data from monitoring, change, and topology tools, BigPanda enables teams to quickly pinpoint the root cause of issues and prevent costly outages.

Datadog on Chaos Engineering

As you scale your applications, remaining resilient to underlying network failures, resource constraints introduced by other applications, or spikes in traffic can become exponentially more complex, even with very thorough testing and processes. Chaos engineering is a discipline that encourages experimenting in production and injecting controlled failures into the system to understand how the system will react in such conditions and to improve its reliability.

Planning Center: Simplifying observability and reducing MTTR in a serverless world, with Datadog

Justin Bodeutsch, Systems Administrator at Planning Center discusses how Datadog’s alerting, log management, serverless, and infrastructure monitoring tools have simplified internal processes and been instrumental in minimizing MTTR across the business.

Google Cloud, Vodafone and Datadog SRE Panel Webinar

Since originating at Google, site reliability engineering (SRE) has enabled countless teams to effectively manage large-scale systems, improve the stability of complex services, and automate operational tasks using software. In this SRE panel, Yuri Grinshteyn (Customer Reliability Engineer, Google) will speak about the core principles of SRE and how the culture is practiced at Google. He will be joined by Llywelyn Griffith-Swain (SRE Manager, Vodafone), who will share Vodafone’s story of adopting SRE, lessons learned, and their best practices for maintaining the cultural shift across teams.