Operations | Monitoring | ITSM | DevOps | Cloud

Introducing Datadog Agent Builder: Build agentic workflows for alert response and remediation

Building automated workflows that adapt to real-world complexity can be a challenge. As systems scale and scenarios multiply, teams often end up hardcoding endless logic branches just to handle every potential outcome. That’s why we’re introducing Datadog Agent Builder, a powerful new tool that lets you create custom AI agents that are fully hosted by Datadog.

Optimizing Ruby performance: Observations from thousands of real-world services

Over the past three decades, Ruby has assumed a pivotal role in the modern web stack and become a fixture in the tool kits of countless DevOps and platform teams. Today, it is a driving force in contemporary application development, testing, automation, and CI/CD. For this blog post, we used data from our always-on continuous profiling of more than 3,000 real-world services from hundreds of organizations to track trends in Ruby usage and performance.

How to Speed Up Incident Response With Guided Remediation

Most teams picture incident response as a linear sprint from alert to resolution. A notification appears, an analyst pivots across screens, a decision gets made, and the workflow moves on. It works, but it is mechanical, tiring, and fragile. Graylog 7.0 aims for something more impactful. Guided remediation gives analysts clarity during the moments when pressure rises and context usually scatters. It takes raw detection data and turns it into a clear path forward. No theatrics.

How Bitbucket powers compliance and code quality at scale

Bitbucket Cloud is more than a code hosting platform. We’re an enterprise partner, helping teams code together at scale with security, compliance, and flexibility at every step. As part of the Atlassian Cloud platform serving more than 300,000 organizations around the world, we’re continuing to build the next generation of Bitbucket Cloud as your trusted cloud vendor, whether you’re a global bank, healthcare provider, or a fast-scaling tech company.

Introducing Kentik AI Advisor: The Future of Network Intelligence

Introducing Kentik AI Advisor, a powerful new AI designed to deeply understand your network, reason through complex issues, and deliver clear, actionable guidance for designing, operating, and protecting your networks. By autonomously querying Kentik’s rich telemetry and tools, it explains what’s happening, why it matters, and what to do next — from troubleshooting and capacity planning to cost optimization and risk mitigation.

Prioritize errors and create tickets using Rollbar's MCP Server

Production errors can feel overwhelming. Your Rollbar dashboard is filling up with alerts, your team is scrambling to understand what needs immediate attention, and critical revenue-impacting issues might be buried among less urgent problems. Sound familiar? In this post, I'll walk you through a workflow that transforms production error chaos into organized, prioritized action items. We'll cover everything from analyzing Rollbar errors to creating properly linked Linear tickets.

Cloudflare outage: another wake-up call for resilience planning

Another day, another massive Internet disruption, and this time it’s Cloudflare taking huge parts of the Internet offline. This incident is not an anomaly. It is part of a recurring pattern that has become standard in digital infrastructure. We have reached an inflection point in digital operations. Outages at major cloud and content delivery network (CDN) providers are now expected. The only real uncertainty is when it will happen next.

Introducing webvitals.com: Find out what's slowing down your site

Developers don’t need another “run this tool, stare at a number, and feel bad about it” website. So we built something different. WebVitals helps you analyze, optimize, and ship faster websites, all in one place. Built by the same folks who obsess over stack traces and slow queries, it connects the dots between performance metrics and what’s actually slowing your users down. In one place, you can.

KubeCon Atlanta Signals Key Shift: From Cloud Cost To Value Engineering

After three days of demos, sessions, and hallway conversations at KubeCon Atlanta, one thing became clear to CloudZero CTO Erik Peterson: the cloud-native world is shifting from cost control to value engineering. Teams aren’t just fighting bills anymore. They’re fighting complexity, GPU scarcity, Kubernetes sprawl, and pressure from the business to justify every dollar of technical investment. And this year’s KubeCon attendees? They were ready for those conversations.