Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Build custom apps in seconds with conversational AI in App Builder

Using a drag-and-drop interface, engineering teams can create apps that support troubleshooting, improve day-to-day operations, and offer self-service access without leaving Datadog. With the new conversational AI feature, teams can turn an idea into a working app in seconds. Watch the video to see how it works..

Fleet Management: Manage your telemetry collectors at scale

In this video, we introduce Fleet Management and how it helps teams control their telemetry estate as it scales. See how you can centrally manage collectors and agents, standardize configurations across environments, and roll out updates confidently, reducing operational effort and risk.

Trace-connected structured logging with LogTape and Sentry

As our applications grow from simple side projects into complex distributed systems with many users, the “old way” of console.log debugging isn’t going to hold up. To build truly observable systems, we have to transition from simple text logs to structured, queryable, trace-connected events.

How to visualize your 3CX contact center phone system with Grafana

Note: this post was co-authored by Nicholas Borg, 3CX Product Manager. 3CX provides a robust, flexible IP PBX platform used by organizations of all sizes to power their contact centers. It offers detailed call activity, agent performance metrics, and operational insights — all of which become even more powerful when visualized.

Grafana dashboards: tips for optimizing query performance

Even with a powerful database or visualization layer, performance can suffer if queries aren’t optimized or system settings aren’t tuned. The new Mimir Query Engine in Grafana Cloud improves query efficiency, but there are still best practices you can follow to keep dashboards fast and responsive—whether your data source is hosted in Grafana Cloud or running on-premises.

Bring faster visibility into AWS Lambda functions with remote instrumentation

Comprehensive observability is critical for running performant, reliable, and secure serverless workloads. However, configuring and maintaining that visibility across hundreds or thousands of serverless functions can be difficult to scale and sustain. Developers across teams often manage serverless functions using different infrastructure as code (IaC) frameworks, as well as different review, deployment, and update processes.

A Bright Outlook: Building Operational Resilience for the Year Ahead

As we step into a new year, one truth stands firm in financial services: resilience isn’t optional – it’s expected. Markets fluctuate, regulations evolve, and technology accelerates. Amid this complexity, IT leaders carry the responsibility of ensuring that operations don’t just survive disruption, they thrive through it.

New Year, New Telemetry: Resolve to Stop Breaking Dashboards

It's 2026. Your New Year's resolution was to finally migrate to OpenTelemetry. But you're staring at dozens of dashboards that depend on your current data format, and that migration deadline is looming... Sound familiar? If you're an SRE or Platform Engineer facing a top-down OTel mandate, you're not alone. The challenge isn't just about adopting a new standard—it's about doing so without disrupting the observability systems your team depends on every day.

How to Ensure AI-Generated Code is Reliable with Runtime Context

TLDR: AI coding assistants have sped up code delivery, but created a validation gap. Historic telemetry and static analysis cannot predict the behavior of unfamiliar, high-volume code. Lightrun’s Runtime Context MCP closes that gap, allowing AI assistants to verify behavior before it breaks, and resolve issues in real time.