Operations | Monitoring | ITSM | DevOps | Cloud

Observing agentic AI workflows with Grafana Cloud, OpenTelemetry, and the OpenAI Agents SDK

As agentic AI applications are used more broadly in production, they introduce new operational models, combining multi-step reasoning, tool execution, and autonomous decision-making into a single workflow. SRE teams need visibility into how these agents behave, where they fail, and how they perform over time.

Grafana Assistant: Why you can trust our agent-and yourself-in an era of AI hallucinations

Let’s be real: AI can hallucinate. And in observability, that feels risky. No one wants an assistant that sends your SREs chasing ghosts. At best, that burns expensive engineering time. At worst, it slows incident response in production and pushes teams toward the wrong remediation path. So here’s the big question: What makes Grafana Assistant different, and why should you trust it? Let’s start by acknowledging the fear. AI hallucinations are a real issue.

How Prometheus Remote Write v2 can help cut network egress costs by as much as 50%

Back in 2021, Grafana Labs CTO Tom Wilkie (then VP of Products) spoke at PromCON about the need for improvements in Prometheus' remote write capabilities. “We use between 10 and 2 bytes per sample to send via remote write, and Prometheus only uses 1 or 2 bytes per sample on the local disk so there’s big, big room for improvement,” Wilkie said at the time.

Grafana 12, from the founder's perspective: design, scale, and the next chapter

Sometimes the most interesting engineering stories don’t start with a roadmap or a release plan—they start with personal taste. A preference for good design. A frustration with clunky tools. A desire to see everything in one place.

Tempo 2.10 release: new TraceQL features, LLM-optimized API responses, vParquet5, and more

Tempo 2.10 has arrived, delivering TraceQL enhancements, improved cardinality management for the metrics-generator, vParquet5, and more. You can continue reading and check out the video below to learn more about these and other new features. The Tempo 2.10 release notes and changelog provide more in-depth details and include all of the changes that came with this release.

Business intelligence plugins for Grafana: what's next

Volkov Labs has been a longtime partner to Grafana Labs, with co-founder Mikhail Volkov contributing to Grafana in the early stages of the OSS project. On Sept. 26, the Florida-based company that recently created a suite of business intelligence (BI) plugins for Grafana announced it had been acquired. In light of the news, Grafana Labs committed to taking over the maintenance and development of their popular business intelligence (BI) plugin suite.

Building a synthetic monitoring solution for Jaeger with Grafana k6

Wilfried Roset is an engineering manager who leads an SRE team and he is a Grafana Champion. Wilfried currently works at OVHcloud where he focuses on prioritizing sustainability, resilience, and industrialization to guarantee customer satisfaction. As an SRE Engineering Manager and a Grafana Champion, I believe a resilient and sustainable cloud experience begins with strong observability.

ChatOps that actually works: Grafana Cloud, Slack, and AI-powered observability

Context switching isn’t just inefficient—under pressure, it’s exhausting. It slows decision-making, increases the risk of mistakes, and makes even experienced engineers feel like they’re always a step behind the system they’re responsible for. At Grafana Labs, we want to build tools that meet you where you are. That's why we embedded Grafana Assistant, our context-aware AI assistant, directly in Grafana Cloud.

React 19 is coming to Grafana: what plugin developers need to know

As part of the upcoming Grafana 13 release in April, we will be updating to React 19, the latest major version of the frontend library for building user interfaces. Grafana uses React as the core technology for its frontend UI and its vibrant ecosystem of plugins. This update ensures we stay aligned with the broader React ecosystem, and allows us to take advantage of ongoing performance enhancements and new functionality provided by React APIs.

Fleet Management and Terraform: Use cases and best practices for managing collectors in Grafana Cloud

Earlier this year we launched Grafana Cloud Fleet Management to address the pain that comes with managing scores of telemetry collectors across departments and environments. We've been excited to see how organizations are using it to manage collectors at scale, but we've also heard from users who aren't sure how Fleet Management fits with their existing infrastructure-as-code tooling. The good news is Fleet Management is designed specifically to complement—not replace—tools like Terraform.