August 2023

Monitor Google Cloud Vertex AI with Datadog

Aug 31, 2023 By Thomas Sobolik In Datadog

Vertex AI is Google’s platform offering AI and machine learning computing as a service—enabling users to train and deploy machine learning (ML) models and AI applications in the cloud. In June 2023, Google added generative AI support to Vertex AI, so users can test, tune, and deploy Google’s large language models (LLMs) for use in their applications.

Read Post

Datadog

Read more about Monitor Google Cloud Vertex AI with Datadog

This Month in Datadog: DASH 2023 Recap, featuring Bits AI, Single-Step APM Instrumentation, and more

Aug 31, 2023 By Datadog In Datadog

Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. This month, we’re recapping DASH 2023..

View Video

Datadog

Read more about This Month in Datadog: DASH 2023 Recap, featuring Bits AI, Single-Step APM Instrumentation, and more

Datadog On Mobile Software Development

Aug 29, 2023 By Datadog In Datadog

Understanding the health and user experience of your mobile application is critical in order to avoid user frustration, understand application crashes, and reduce bugs mean time to resolution. To help with that task, Datadog has a mobile monitoring solution that allows developers to better understand and improve their application. But what are the things to take into account when building observability mobile SDKs? How can we gather the right telemetry without affecting the underlying application?

View Video

Datadog

Read more about Datadog On Mobile Software Development

AWS Monitoring Demo

Aug 23, 2023 By Datadog In Datadog

Datadog provides full visibility into your AWS-hosted infrastructure and applications at scale. This demo shows you how Datadog monitors and secures your AWS environment in minutes with powerful observability features, such as preconfigured dashboards and monitors, faster troubleshooting with machine learning, and a single unified platform.

View Video

Datadog

Read more about AWS Monitoring Demo

Visualize service ownership and application boundaries in the Service Map

Aug 22, 2023 By Aaron Kaplan In Datadog

The complexity of microservice architectures can make it hard to determine where an application’s dependencies begin and end and who manages which ones. This can pose a variety of challenges both in the course of day-to-day operations and during incidents. Lacking a clear picture of the ownership and interplay of your services can impede accountability and cause application development, incident investigations, and onboarding processes to become prolonged and haphazard.

Read Post

Datadog

Read more about Visualize service ownership and application boundaries in the Service Map

Easily install the Datadog Agent using AWS Systems Manager

Aug 17, 2023 By Nicholas Thomson In Datadog

AWS Systems Manager (SSM), an end-to-end management solution for AWS resources, provides a marketplace of pre-packaged software scripts for SSM-managed Windows and Linux instances, enabling AWS users to automatically install custom software on large groups of instances.

Read Post

Datadog

Read more about Easily install the Datadog Agent using AWS Systems Manager

Generative AI and Observability Automation - Sajid Mehmood & Michael Gerstenhaber

Aug 16, 2023 By Datadog In Datadog

One of the biggest challenges in observability is separating the signal from the noise. As artificial intelligence (AI) tools become more powerful and accessible, it has generated a lot of buzz around the role of AI with respect to the performance and reliability of our technical systems and the teams that build and operate them. In this fireside chat, Michael Gertenhaber (Datadog VP of Product) and Sajid Mehmood (Datadog VP of Engineering) will sift through the hype to chat about what generative AI and Large Language Models (LLMs) will really mean for the future of observability and how it can benefit your teams today.

View Video

Datadog

Read more about Generative AI and Observability Automation - Sajid Mehmood & Michael Gerstenhaber

Right Size, Right Performance, Right Time

Aug 15, 2023 By Datadog In Datadog

It’s been said that, “premature optimization is the root of all evil.” Contrarily, many engineers have also had to work with software riddled with so much technical debt and inefficiency that optimization is practically impossible and a complete rewrite is required. So when is the right time? In this panel session, we’ll talk with engineering leaders and architects about their approach to software optimization, when to do it, and how to design systems that scale and stay performant.

View Video

Datadog

Read more about Right Size, Right Performance, Right Time

CTO Fireside Chat

Aug 15, 2023 By Datadog In Datadog

Building large scale technical systems is hard, but building and scaling high performing technical organizations is even more difficult. In this session, Datadog Co-founder and CTO Alexis Lê-Quôc will sit down with Prashant Pandey, Head of Engineering at Asana, to discuss their approach to engineering leadership. They’ll share the hard-learned lessons from their long careers to help you cultivate better technical teams, covering topics from staying in tune with new technologies, enabling innovation, shipping modern ML and AI-based features, and scaling teams.

View Video

Datadog

Read more about CTO Fireside Chat

Efficiency and Effectiveness

Aug 15, 2023 By Datadog In Datadog

WIth unlimited money, most technology problems become easy to solve. But how do you design, build, and operate large scale, performant systems without breaking the bank? In this session, Chandru Subramanian (Director of Engineering, Runtime Efficiency at Datadog) and Neil Innes (Sr. Engineering Manager, DevOps at FanDuel) will discuss how they balance efficiency and effectiveness to save money while also meeting key goals.

View Video

Datadog

Read more about Efficiency and Effectiveness

The Darkside of GraphQL

Aug 11, 2023 By Datadog In Datadog

GraphQL is a query language for APIs that provides a powerful and efficient way to query and manipulate data. As powerful and versatile as GraphQL is, its downside is that it can be vulnerable to certain security threats. In this presentation, we will discuss the security vulnerabilities associated with GraphQL, from the basics to more advanced threats, and how to best protect against them. After this presentation, attendees will have a better understanding of security vulnerabilities in GraphQL, as well as an understanding of the steps needed to protect against them.

View Video

Datadog

Read more about The Darkside of GraphQL

Innovating with Faster, Safer Experimentation

Aug 11, 2023 By Datadog In Datadog

Experimentation is the key to innovation. But experiments come with risks, not just of failure, but of wasted time, effort, and money. I’ll share the experimental approach that NTT DOCOMO, Japan’s largest wireless provider, takes to build digital products that customers love. I’ll also present examples from experiments we performed on NTT DOCOMO’s Smart-life website that improved the user experience and significantly increased conversion rates. In this session, you’ll learn how to reduce the risk of experiments and iterate faster to improve your services.

View Video

Datadog

Read more about Innovating with Faster, Safer Experimentation

Container Security Fundamentals - Linux Namespaces (Part 4): The User Namespace

Aug 10, 2023 By Datadog In Datadog

In this video we continue our examination of Linux namespaces by looking at some details of how the user namespace can be used to de-couple the user ID inside a container from the user ID on the host, allowing a container to run as the root user without the risks of being root on the host. To learn more, read our blog on Datadog’s Security Labs site.

View Video

Datadog

Read more about Container Security Fundamentals - Linux Namespaces (Part 4): The User Namespace

Key questions to ask when setting SLOs

Aug 10, 2023 By Jordan Obey In Datadog

Many organizations rely on service level objectives (SLOs) to help them gauge the reliability of their products. By setting SLOs that define clear and measurable reliability targets, businesses can ensure they are delivering positive end-user experiences to their customers. Clearly defined SLOs also make it much easier for businesses to understand what tradeoffs they may have to make in order to deliver those specific experiences.

Read Post

Datadog

Read more about Key questions to ask when setting SLOs

Key metrics for CoreDNS monitoring

Aug 9, 2023 By David Lentz In Datadog

CoreDNS is an open source DNS server that can resolve requests for internet domain names and provide service discovery within a Kubernetes cluster. CoreDNS is the default DNS provider in Kubernetes as of v1.13. Though it can be used independently of Kubernetes, this series will focus on its role in providing Kubernetes service discovery, which simplifies cluster networking by enabling clients to access services using DNS names rather than IP addresses.

Read Post

Datadog

Read more about Key metrics for CoreDNS monitoring

SRE in Transition: From Startup to Enterprise

Aug 9, 2023 By Datadog In Datadog

"Startups are defined by “ship or die”. As a result, SRE teams at a startup should be focused on enabling product engineers to ship features as quickly as possible. As your startup transitions from “we’ll run out of money in the next 18 months” to “we have more than 1000 engineers”, how should the SRE organization evolve and provide the best value through that transition (including booting one up if you don’t have one)? I will discuss specific ways the organization needs to evolve to meet this challenge, how the SRE org can advocate for and support this change (both in direct actions and in “influence”), and how the overhang of startup technical and cultural debt can make this shift more challenging (but also more necessary).

View Video

Datadog

Read more about SRE in Transition: From Startup to Enterprise

From On-call to Non-call: Resolving Incidents Before They Even Happen

Aug 9, 2023 By Datadog In Datadog

Artificial intelligence has captured the attention of the world, with tools like ChatGPT and large language models (LLMs) driving the conversation. But you don’t need to wait for the future or new features powered by LLMs to start working smarter—the tech industry has been investing in intelligent, automated tools for years and they’re ready for production now. In this talk, you’ll learn how the engineering teams at Toyota Connected use tools like Datadog Watchdog, Anomaly Detection, and Workflows to make our lives easier and keep our platform stable.

View Video

Datadog

Read more about From On-call to Non-call: Resolving Incidents Before They Even Happen

From Solution to Startup

Aug 9, 2023 By Datadog In Datadog

Before Datadog was a widely adopted SaaS platform, it was a tool developed to solve our founders’ own monitoring needs. As technology-oriented people, we often build solutions for our own problems, then discover those problems are widespread. But how do you know when your solution should be something more? In this panel session, we’ll talk with tech startup founders to hear their stories and advice for turning tools into businesses.

View Video

Datadog

Read more about From Solution to Startup

How to monitor CoreDNS with Datadog

Aug 9, 2023 By David Lentz In Datadog

In Part 1 of this series, we introduced you to the key metrics you should be monitoring to ensure that you get optimal performance from CoreDNS running in your Kubernetes clusters. In Part 2, we showed you some tools you can use to monitor CoreDNS. In this post, we’ll show you how you can use Datadog to monitor metrics, logs, and traces from CoreDNS alongside telemetry from the rest of your cluster, including the infrastructure it runs on.

Read Post

Datadog

Read more about How to monitor CoreDNS with Datadog

Tools for collecting metrics and logs from CoreDNS

Aug 9, 2023 By David Lentz In Datadog

In Part 1 of this series, we looked at key metrics you should monitor to understand the performance of your CoreDNS servers. In this post, we’ll show you how to collect and visualize these metrics. We’ll also explore how CoreDNS logging works and show you how to collect CoreDNS logs to get even deeper visibility into your Deployment.

Read Post

Datadog

Read more about Tools for collecting metrics and logs from CoreDNS

Send your logs to multiple destinations with Datadog's managed Log Pipelines and Observability Pipelines

Aug 3, 2023 By Candace Shamieh In Datadog

As your infrastructure and applications scale, so does the volume of your observability data. Managing a growing suite of tooling while balancing the need to mitigate costs, avoid vendor lock-in, and maintain data quality across an organization is becoming increasingly complex. With a variety of installed agents, log forwarders, and storage tools, the mechanisms you use to collect, transform, and route data should be able to evolve and adjust to your growth and meet the unique needs of your team.

Read Post

Datadog

Read more about Send your logs to multiple destinations with Datadog's managed Log Pipelines and Observability Pipelines

Integration roundup: Monitoring your AI stack

Aug 3, 2023 By Shri Subramanian In Datadog

Integrating AI, including large language models (LLMs), into your applications enables you to build powerful tools for data analysis, intelligent search, and text and image generation. There are a number of tools you can use to leverage AI and scale it according to your business needs, with specialized technologies such as vector databases, development platforms, and discrete GPUs being necessary to run many models. As a result, optimizing your system for AI often leads to upgrading your entire stack.

Read Post

Datadog

Read more about Integration roundup: Monitoring your AI stack

Enhance code reliability with Datadog Quality Gates

Aug 3, 2023 By Bowen Chen In Datadog

Maintaining the quality of your code becomes increasingly difficult as your organization grows. Engineering teams need to release code quickly while still finding a way to enforce best practices, catch security vulnerabilities, and prevent flaky tests. To address this challenge, Datadog is pleased to introduce Quality Gates, a feature that automatically halts code merges when they fail to satisfy your configured quality checks.

Read Post

Datadog

Read more about Enhance code reliability with Datadog Quality Gates

Easily test and monitor your mobile applications with Datadog Mobile Application Testing

Aug 3, 2023 By Nicholas Thomson In Datadog

Effective mobile application testing that meets all the requirements of modern quality assurance can be challenging. Not only do teams need to create tests that cover a range of different device types, operating system versions, and user interactions—including swipes, gestures, touches, and more—they also have to maintain the infrastructure and device fleets necessary to run these tests.

Read Post

Datadog

Read more about Easily test and monitor your mobile applications with Datadog Mobile Application Testing

Store and analyze high-volume logs efficiently with Flex Logs

Aug 3, 2023 By Sid Dhingra In Datadog

The volume of logs that organizations collect from all over their systems is growing exponentially. Sources range from distributed infrastructure to data pipelines and APIs, and different types of logs demand different treatment. As a result, logs have become increasingly difficult to manage. Organizations must reconcile conflicting needs for long-term retention, rapid access, and cost-effective storage.

Read Post

Datadog

Read more about Store and analyze high-volume logs efficiently with Flex Logs

DASH 2023: Guide to Datadog's newest announcements

Aug 3, 2023 By Datadog In Datadog

This year at DASH, we announced new products and features that enable your teams to get complete visibility into their AI ecosystem, utilize LLM for efficient troubleshooting, take full control of petabytes of observability data, optimize cloud costs, and more. With Datadog’s new AI integrations, you can easily monitor every layer of your AI stack. And Bits AI, our new DevOps copilot, helps speed up the detection and resolution of issues across your environment.

Read Post

Datadog

Read more about DASH 2023: Guide to Datadog's newest announcements

Introducing Bits AI, your new DevOps copilot

Aug 3, 2023 By Thomas Sobolik In Datadog

Business-critical infrastructure and services generate massive volumes of observability data from many disparate sources. It can be challenging to synthesize all this data to gain actionable insights for detecting and remediating issues—particularly in the heat of incident response.

Read Post

Datadog

Read more about Introducing Bits AI, your new DevOps copilot

Quickstart network investigations with NPM's story-centric UX

Aug 1, 2023 By Jordan Obey In Datadog

Datadog Network Performance Monitoring (NPM) gives you visibility into all the communication that takes place between the network components in your environment, including hosts, processes, containers, clusters, zones, regions, and VPCs. As organizations scale, and as their networks grow in complexity, the massive volume of network data to be monitored can become overwhelming. Knowing precisely what network data to surface to resolve issues within these larger environments can be a challenge.

Read Post

Datadog

Read more about Quickstart network investigations with NPM's story-centric UX

Operations | Monitoring | ITSM | DevOps | Cloud

August 2023

Monitor Google Cloud Vertex AI with Datadog

This Month in Datadog: DASH 2023 Recap, featuring Bits AI, Single-Step APM Instrumentation, and more

Datadog On Mobile Software Development

AWS Monitoring Demo

Visualize service ownership and application boundaries in the Service Map

Easily install the Datadog Agent using AWS Systems Manager

Generative AI and Observability Automation - Sajid Mehmood & Michael Gerstenhaber

Right Size, Right Performance, Right Time

CTO Fireside Chat

Efficiency and Effectiveness

The Darkside of GraphQL

Innovating with Faster, Safer Experimentation

Container Security Fundamentals - Linux Namespaces (Part 4): The User Namespace

Key questions to ask when setting SLOs

Key metrics for CoreDNS monitoring

SRE in Transition: From Startup to Enterprise

From On-call to Non-call: Resolving Incidents Before They Even Happen

From Solution to Startup

How to monitor CoreDNS with Datadog

Tools for collecting metrics and logs from CoreDNS

Send your logs to multiple destinations with Datadog's managed Log Pipelines and Observability Pipelines

Integration roundup: Monitoring your AI stack

Enhance code reliability with Datadog Quality Gates

Easily test and monitor your mobile applications with Datadog Mobile Application Testing

Store and analyze high-volume logs efficiently with Flex Logs

DASH 2023: Guide to Datadog's newest announcements

Introducing Bits AI, your new DevOps copilot

Quickstart network investigations with NPM's story-centric UX

Monthly Archive

Follow Us