Monthly Archive

Full-stack observability in Grafana Cloud: How to investigate issues across services and infrastructure

Jun 30, 2026 By Victor Padilla In Grafana

Many times, the hardest part of troubleshooting isn’t fixing the actual problem. It’s figuring out where to start. As engineers, it’s easy to lose count of how many times we’ve opened logs, then 10 metrics tabs, and another 10 tabs with trace queries, only to end up back in the logs trying to find a root cause.

Read Post

Grafana

Read more about Full-stack observability in Grafana Cloud: How to investigate issues across services and infrastructure

What Customers Are Doing With AI and Honeycomb

Jun 30, 2026 By Rox Williams In Honeycomb

At O11yCon, we talked to engineering teams across the industry, and the numbers are starting to get genuinely wild: Mixpanel DevOps Engineer Eddie Bracho told us their engineering team is generating 50% more PRs than before AI came into the mix (sorry). That kind of velocity is exciting, but it's also a pressure test for every part of your stack that isn't writing code, including your observability practice. Here's what we're hearing from customers about how that's playing out.

Read Post

Honeycomb

Read more about What Customers Are Doing With AI and Honeycomb

Debug and evaluate your AI app from your coding agent with Datadog Agent Observability

Jun 30, 2026 By Michael Bevilacqua-Linn In Datadog

Coding agents like Claude Code, Cursor, and Codex CLI handle the coding parts of building an AI application well. The harder work comes after: understanding why a response went wrong, building eval sets that reflect real production behavior, and keeping up with an application that changes faster than any one-off script can. Teams spend 60–80% of their time on evaluation and error analysis, and much of that work needs to be redone every time the stack shifts.

Read Post

Datadog

Read more about Debug and evaluate your AI app from your coding agent with Datadog Agent Observability

New Feature: Automatic Snapshots When Latency Spikes

Jun 30, 2026 By Roi Bar In Lightrun

We’ve released an exciting new Lightrun capability: set a duration threshold on your Tic & Toc or Method Duration metrics, and Lightrun will automatically capture a snapshot whenever execution exceeds it. It takes moments to configure, and gives engineers the runtime context they need to understand why unexpected slow executions are occurring.

Read Post

Lightrun

Read more about New Feature: Automatic Snapshots When Latency Spikes

The hard part of AI root cause analysis is no longer the model

Jun 30, 2026 By Nikolay Sivko In Coroot

Every few weeks someone tells me root cause analysis is a solved problem now: pipe your telemetry into an LLM, let it tell you what broke. I wish it were that easy. After years on this, I think "can AI do RCA?" is the wrong question, because doing RCA with an LLM is really two separate jobs, and the answer is different for each. They break in completely different ways, so it's worth pulling them apart.

Read Post

Coroot

Read more about The hard part of AI root cause analysis is no longer the model

Instrumenting AI Agents for the Agent Timeline: A Practical OpenTelemetry Guide

Jun 29, 2026 By Dan Juengst In Honeycomb

AI agents are nondeterministic, multi-step, and opaque. When one fails in production, "the model said something weird" is the cheapest, most useless line in your incident postmortem. To debug agents the way they actually run, you need telemetry that captures all of it, in order, with enough context to reconstruct what happened. The OpenTelemetry GenAI Semantic Conventions give you a vendor-neutral way to do exactly that.

Read Post

Honeycomb

Read more about Instrumenting AI Agents for the Agent Timeline: A Practical OpenTelemetry Guide

Why Observability Isn't Enough for AI Coding Agents

Jun 29, 2026 By Lightrun Team In Lightrun

Observability platforms collect pre-instrumented logs, metrics, and distributed traces to monitor production systems and surface failures to human engineers. The adoption of AI into engineering has led observability providers to offer those same signals to agents. This is often packaged as AI observability, but the signals themselves were designed around a human investigation loop. AI coding agents work faster, consume data differently, and need feedback as they work rather than after deployment.

Read Post

Lightrun

Read more about Why Observability Isn't Enough for AI Coding Agents

What Is Agentic Observability? The Complete Guide for Enterprise Engineering Teams

Jun 29, 2026 By Libi Michelson In logz.io

TL;DR Agentic observability uses AI agents to autonomously investigate incidents, identify root causes, and take action in production environments. Unlike traditional monitoring (which alerts and waits) or AIOps (which assists human analysis), agentic platforms conduct the investigation themselves. Key capabilities include autonomous incident triage, evidence-backed root cause analysis, alert noise reduction, and governed remediation.

Read Post

logz.io

Read more about What Is Agentic Observability? The Complete Guide for Enterprise Engineering Teams

Fleet observability: how to monitor thousands of edge Linux devices

Jun 28, 2026 By Netdata Team In netdata

It feels less like managing devices and more like remote babysitting. You check the dashboard, everything is green, and then a customer in the field tells you a device has been down for two days. At a handful of servers, the rare failure is an event.

Read Post

netdata

Read more about Fleet observability: how to monitor thousands of edge Linux devices

From query to action: Introducing SQL alerting in Cloud Monitoring Observability Analytics

Jun 27, 2026 By Joy Wang In Google Operations

Cloud Monitoring Observability Analytics lets you create alerts from (and get alerted about) analytical SQL queries of logs and traces.

Read Post

Google Operations

Read more about From query to action: Introducing SQL alerting in Cloud Monitoring Observability Analytics

Runtime Aware PR Review: Validate Changes in Live Production

Jun 26, 2026 By Lightrun Team In Lightrun

Runtime PR review means validating a code change against live variable state, real execution paths, and downstream service behavior before the merge decision. Not after a checkout regression exposes what the diff missed. As AI coding agents ship PRs faster than any reviewer can mentally simulate execution, static analysis and CI leave a structural gap that only runtime evidence can close. This article explains what that gap looks like, why it recurs, and how to close it with runtime context code review.

Read Post

Lightrun

Read more about Runtime Aware PR Review: Validate Changes in Live Production

Grafana + Uptrace: Reuse Your Dashboards in Seconds

Jun 26, 2026 By Uptrace In Uptrace

In this tutorial you'll learn how to use Uptrace and Grafana together. Uptrace exposes a Prometheus-compatible HTTP endpoint, so you can add it as a data source in Grafana and reuse your existing dashboards without changing metric names or rewriting queries.

View Video

Uptrace

Read more about Grafana + Uptrace: Reuse Your Dashboards in Seconds

Rethinking Public Sector Observability: From Infrastructure Health to Mission Continuity

Jun 26, 2026 By Teia Jensen In LogicMonitor

Public sector reliability is not a green dashboard. It’s whether people can complete the service when it matters.

Read Post

LogicMonitor

Read more about Rethinking Public Sector Observability: From Infrastructure Health to Mission Continuity

Full Stack Observability vs Monitoring: Key Differences

Jun 25, 2026 By Chandni Verma In eG Innovations

Traditional monitoring tracks system health by collecting data such as metrics and logs, this data is checked to see if a system is behaving as expected and alerts are raised if errors or anomalous data values are found. This works well in stable, predictable environments, but modern IT systems are far more complex and dynamic. In distributed architectures like microservices and cloud-native platforms, predefined alerts usually aren’t enough to explain why a failure is happening.

Read Post

eG Innovations

Read more about Full Stack Observability vs Monitoring: Key Differences

What's New in Network Observability for Summer 2026

Jun 25, 2026 By Sean Armstrong In Broadcom

As a network engineer, you likely face two persistent operational challenges every day: When you have to manually track device lifecycles on spreadsheets or spend your scheduled maintenance periods troubleshooting software upgrades, you lose the time you need to proactively ensure network performance. Over the past six months, we have continued to enhance Network Observability by Broadcom. These latest enhancements directly address the operational challenges outlined above.

Read Post

Broadcom

Read more about What's New in Network Observability for Summer 2026

No Baaaa-d Data: A Hoppy Hour of Discovery with Cribl

Jun 25, 2026 By Cribl In Cribl

This is the "Cribl: The Good Bits" version of a webinar we gave recently, which combined the fermented educational joy of a beer tasting led by Gabe Callahan; and the only-slightly-less-intoxicating demos of the Cribl platform, led by Principal Technical Marketing Engineer Leon Adato.

View Video

Cribl

Read more about No Baaaa-d Data: A Hoppy Hour of Discovery with Cribl

On Release Days We Wear Teal ep04

Jun 24, 2026 By Cribl In Cribl

In this episode, Leon explores some of the new features, functions, updates, and improvements in release 4.18 (from last month) and 4.18.2. For more information, check out these links.

View Video

Cribl

Read more about On Release Days We Wear Teal ep04

The Four Pillars of AI Observability in 90 Seconds

Jun 24, 2026 By Splunk In Splunk

AI applications can behave unpredictably, potentially leading to errors such as hallucinations or data leaks, even when classic monitoring indicates a successful response. To effectively monitor AI systems, four key areas should be focused on. Implementing these pillars can enhance trust in AI deployments, help manage costs, and identify safety issues before they impact users.

View Video

Splunk

Read more about The Four Pillars of AI Observability in 90 Seconds

Observability on Windows, before eBPF is production-ready

Jun 23, 2026 By Nikolay Sivko In Coroot

No large enterprise runs a single stack. A shiny new Kubernetes cluster sits right next to a Windows Server box that has quietly run the billing system for a decade without missing a beat. Both keep the business running. Both deserve the same visibility. Linux runs most server workloads, and Coroot grew up there. Our open-source node-agent uses eBPF to collect metrics, logs, traces, and profiles, with no code changes. But "most" is not "all".

Read Post

Coroot

Read more about Observability on Windows, before eBPF is production-ready

Using Evaluation Frameworks with Agent Observability

Jun 22, 2026 By Jennifer Mickel In Datadog

AI teams have invested heavily in evaluation frameworks, yet getting those frameworks beyond local experimentation remains challenging. Teams using open source libraries like DeepEval and Pydantic Evals gain flexibility and research-grounded metrics, but operationalizing those evaluations still requires brittle custom integration code that doesn’t scale.

Read Post

Datadog

Read more about Using Evaluation Frameworks with Agent Observability

Observability Self Hosted 2026.2 | Routing Summary Dashboard

Jun 22, 2026 By solarwindsinc In SolarWinds

Connect with SolarWinds.

View Video

SolarWinds

Read more about Observability Self Hosted 2026.2 | Routing Summary Dashboard

Monitoring vs. observability: The future of IT operations in 2026

Jun 19, 2026 By Kaviya Radhakrishnan In ManageEngine

For years, monitoring was the gold standard of infrastructure management. Dashboards. Thresholds. Alerts. If everything on the dashboard was green, you didn't need to worry. If something turned red, you responded. It was a model built on predictability, and for a long time, it worked. But modern infrastructure is no longer predictable.

Read Post

ManageEngine

Read more about Monitoring vs. observability: The future of IT operations in 2026

George Luong Shares How 200 Slack Engineers Use Honeycomb

Jun 19, 2026 By Honeycomb In Honeycomb

At Slack, between 100 to 200 users per day use Honeycomb for client observability, tracing, instrumentation, analysis of performance, frontend issues, investigating incidents, or just looking into production issues.

View Video

Honeycomb

Read more about George Luong Shares How 200 Slack Engineers Use Honeycomb

The Second Edition of Observability Engineering Is Here

Jun 18, 2026 By Charity Majors In Honeycomb

IT’S HERE it’s here it’s here it’s here!!!! The second edition of Observability Engineering is available for download, and since Honeycomb is the sponsor, you can now download it from our website (the dead tree version will take another month). This is a strange time to be writing a book.

Read Post

Honeycomb

Read more about The Second Edition of Observability Engineering Is Here

Agent Timeline Is Now Generally Available

Jun 18, 2026 By Dan Juengst In Honeycomb

A few weeks ago I wrote about a customer’s refund request that stopped halfway through at 11:47 p.m. on a Tuesday night. That post walked through the 40 minutes it took to work out what happened when an agentic application had a problem: a tool retried against a rate-limited payments API, the error responses filled up the context window, and the agent gave up. The whole reason we built Agent Timeline was to turn that 40 minutes into five. To reduce MTTR. To solve the problem and get back to sleep.

Read Post

Honeycomb

Read more about Agent Timeline Is Now Generally Available

Working as a remote engineer at Cribl | Building the AI Platform for Telemetry

Jun 18, 2026 By Cribl In Cribl

Learn what it’s like to work as an engineer at Cribl, a remote-first company building the AI platform for IT and security data. In this recruiting video, Cribl’s engineering and support leaders share how fully distributed teams collaborate, solve hard data problems, and grow their careers while working from around the world. You’ll hear from managers and leaders in site reliability engineering, security incubation, and technical support about.

View Video

Cribl

Read more about Working as a remote engineer at Cribl | Building the AI Platform for Telemetry

Multi Cloud Observability - Selector

Jun 18, 2026 By Selector In Selector

Unify cloud, network, and infrastructure telemetry into a single shared intelligence layer to get to root cause faster.

View Video

Selector

Read more about Multi Cloud Observability - Selector

Observability for a Privacy-first AI Wearable | Grafana Everywhere

Jun 18, 2026 By Grafana In Grafana

Trust is everything when AI gets personal. Golden Grot Award winner and NeoSapien co-founder and CEO Dhananjay Yadav shares how his team uses Grafana Assistant to ensure the privacy-first AI wearable delivers a seamless, reliable experience without compromising its mission. Because when AI moves closer to our everyday lives, teams need to know what’s happening — and users need to trust that it’s working as intended.

View Video

Grafana

Read more about Observability for a Privacy-first AI Wearable | Grafana Everywhere

From event correlation to autonomous IT: Why observability isn't enough anymore

Jun 17, 2026 By Sangavi D In ManageEngine

Most IT war rooms have plenty of data, but not enough time or clarity to find the real answer. Dashboards are crowded, alerts keep piling up, and the real issue gets lost in all the noise. Ever dealt with this situation? You’re not alone, and there’s a simpler way to deal with it. OpManager Nexus closes this gap by moving beyond visibility to help teams actually diagnose and fix problems faster.

Read Post

ManageEngine

Read more about From event correlation to autonomous IT: Why observability isn't enough anymore

Why AI observability is a critical ITOps priority

Jun 17, 2026 By Ismath Mohideen In LogicMonitor

AI Observability is a Critical Priority for ITOps Teams See how LogicMonitor helps ITOps teams monitor AI workloads, reduce blind spots, and move toward Autonomous IT. Schedule a meeting AI has shifted from experimental pilots to everyday business operations. Customers are interacting with AI-powered applications. Engineering teams are building with LLMs, GPUs, APIs, and automation at a much faster pace. That adds to the visibility strain on already overburdened ITOps teams.

Read Post

LogicMonitor

Read more about Why AI observability is a critical ITOps priority

Datadog Data Observability: Be the first to know when data fails

Jun 17, 2026 By Datadog In Datadog

Bad data doesn't announce itself. Datadog Data Observability gives you unified visibility across your entire data stack—from source systems and pipelines to dashboards and AI applications—so you catch silent failures before they cascade. Detect data quality and pipeline issues before stakeholders do, pinpoint root causes with end-to-end lineage, and reduce pipeline costs with job, cluster, and query recommendations.

View Video

Datadog

Read more about Datadog Data Observability: Be the first to know when data fails

Building trustworthy agentic AI workflows for high-stakes enterprise environments

Jun 17, 2026 By OpsMatters In OpsMatters

Wilson Chan, CEO and Founder of Permutable, explores how enterprises can build trustworthy agentic AI workflows with observability, source traceability, human oversight, audit trails and governed autonomy.

Read Post

OpsMatters

Read more about Building trustworthy agentic AI workflows for high-stakes enterprise environments

Un-observable AI is Un-trustworthy AI

Jun 16, 2026 By Annie Freeman In Coralogix

Recently, someone talked Chipotle’s customer support agent into reversing a linked list – a task completely unrelated to burritos in any way. Screenshots circulated, people laughed, but underneath the joke sat a sharper question. If a production support agent will do that on a public channel, what else will it do that nobody is screenshotting? The bug is funny. The trust gap behind it is not.

Read Post

Coralogix

Read more about Un-observable AI is Un-trustworthy AI

Why CI/CD Pipelines Miss Runtime Failures

Jun 16, 2026 By Lightrun Team In Lightrun

CI/CD pipelines do four things: it builds code, runs tests against mocked dependencies, lints for style violations, and scans for known vulnerability patterns. What it cannot do is validate how that code behaves under real users, real service responses, and real runtime constraints that staging was never configured to reproduce. That entire class of failure clears every gate cleanly and surfaces only in production.

Read Post

Lightrun

Read more about Why CI/CD Pipelines Miss Runtime Failures

Kubernetes Monitoring: Datadog Alert to Lightrun Root Cause

Jun 15, 2026 By Lightrun Team In Lightrun

Datadog Kubernetes monitoring tells an SRE team what failed, which pod failed, and when. It does so within seconds of the alert firing. The investigation then stalls at the same point every time: nothing in the dashboard layer can prove why a specific request behaved the way it did inside a running JVM at the moment of failure. Variable values, feature flag evaluations, and code branches are never captured.

Read Post

Lightrun

Read more about Kubernetes Monitoring: Datadog Alert to Lightrun Root Cause

Observability: Are You Measuring What Actually Matters?

Jun 15, 2026 By Colin Burke In Honeycomb

Observability has always been important, and much like any core capability in your business, the value needs to be understood. For years, the value of observability was predictable. It was uptime, error rates, MTTR, and likely tool consolidation. That was enough to be able to show progress. These are foundational, tablestakes metrics—and they still matter, but they aren’t enough.

Read Post

Honeycomb

Read more about Observability: Are You Measuring What Actually Matters?

Why Your Agentic Workflow Succeeds and Still Gets It Wrong

Jun 12, 2026 By Lightrun Team In Lightrun

Agentic workflows are reshaping how engineering teams operate, fetching context, synthesizing decisions, and shipping results across systems without human intervention. But the same design that makes them powerful adds risk in production. Agents do not crash when they hit bad data; they synthesize around it, substituting a stale value, an empty page, or a missing field for the result they were supposed to capture.

Read Post

Lightrun

Read more about Why Your Agentic Workflow Succeeds and Still Gets It Wrong

13 Best Observability Tools in 2026 [Top-Picked]

Jun 12, 2026 By Written by In Motadata

How many tools does your team open before anyone can say why production is slow? If the answer is more than two, you are paying for that gap in engineering hours every week. We understand the frustration. So we did the research work for you to help you pick the best observability tools.

Read Post

Motadata

Read more about 13 Best Observability Tools in 2026 [Top-Picked]

The Next Evolution of Infrastructure Observability

Jun 11, 2026 By Kristy Slimmer In Galileo

Operational visibility is becoming increasingly important as infrastructure teams are asked to support AI initiatives, automation goals, cost accountability, modernization efforts, and growing operational complexity at the same time. Most are expected to do it without expanding headcount, introducing additional risk, or rebuilding the environment from scratch. Those expectations are changing the role of infrastructure operations.

Read Post

Galileo

Read more about The Next Evolution of Infrastructure Observability

Monitoring Protocols Compared - Which Standard for What

Jun 11, 2026 By Lionel Porcheron In Bleemeo

Modern applications are distributed, ephemeral and built from a dozen moving parts. To keep them reliable, you need real visibility: not just “is the server up?”, but“how is this request behaving, right now, across every component it touches?”. The good news is that the observability world has converged on a handful of open standards.

Read Post

Bleemeo

Read more about Monitoring Protocols Compared - Which Standard for What

Nathen Harvey: Scale Brilliance, Not Bottlenecks: Building Platforms for the AI-First World

Jun 10, 2026 By Honeycomb In Honeycomb

Watch Nathen Harvey's full talk at O11yCon 2026, Honeycomb's observability conference, and enjoy Christine Yen's intro as well.

View Video

Honeycomb

Read more about Nathen Harvey: Scale Brilliance, Not Bottlenecks: Building Platforms for the AI-First World

Canvas, MCP, and Claude: How Liz Fixed Three Bugs During a Conference

Jun 10, 2026 By Honeycomb In Honeycomb

In this demo, Liz and Kale talk through a slow query that Liz couldn't get out of her head. During a conference, she set out to solve it... and ended up finding two more bugs to fix with, Honeycomb MCP, and Honeycomb Canvas.

View Video

Honeycomb

Read more about Canvas, MCP, and Claude: How Liz Fixed Three Bugs During a Conference

Graviton5 in Production at Honeycomb: Per-service Results From the m8g to m9g Migration

Jun 10, 2026 By Liz Fong-Jones In Honeycomb

This is the fourth installment in the Graviton retrospective series we've been writing since 2021. The methodology is the same one I always reach for: hold the workload constant, run both generations on the same Kubernetes namespace concurrently, and let the per-pod numbers speak.

Read Post

Honeycomb

Read more about Graviton5 in Production at Honeycomb: Per-service Results From the m8g to m9g Migration

Discovering Entities in SolarWinds Observability Self-Hosted

Jun 9, 2026 By solarwindsinc In SolarWinds

Resource Links SolarWinds Observability Self-Hosted version 2026.2 Blog Post: SolarWinds Observability Self-Hosted version 2026.2 Release Notes.

View Video

SolarWinds

Read more about Discovering Entities in SolarWinds Observability Self-Hosted

Nishi Bhonsle of Salesforce at O11yCon: Speaker Highlight Reel

Jun 9, 2026 By Honeycomb In Honeycomb

In her talk at O11yCon 2026, Nishi Bhonsle of Salesforce talked about,, and provided some great examples of how Honeycomb has helped Salesforce issues in seconds. Here's a 4-minute highlight reel.

View Video

Honeycomb

Read more about Nishi Bhonsle of Salesforce at O11yCon: Speaker Highlight Reel

What is SRE Observability and Key Pillars You Should Know?

Jun 8, 2026 By Arpit Sharma In Motadata

What happens when a critical service slows down, but nothing is technically “broken”? Most teams have monitoring in place. They know when something goes down. But when performance drops or issues spread across services, finding the real cause becomes slow and unclear. Engineering teams end up switching between dashboards, logs, and alerts just to understand what changed. This delays response and increases pressure on on-call teams. This is where SRE observability becomes essential.

Read Post

Motadata

Read more about What is SRE Observability and Key Pillars You Should Know?

It Can Only Goodhart Happen

Jun 8, 2026 By Austin Parker In Honeycomb

When a measure becomes a target, it ceases to be a good measure. Charles Goodhart, 1975 You’ve probably read this quote in relation to any number of things over the years. People complaining about arbitrary metrics like PRs merged, lines of code produced, and now, token usage. But is the era of tokenmaxxing over before it even began? The rise of token leaderboards to the death of token leaderboards at companies like Amazon seem to have taken place in less than three months!

Read Post

Honeycomb

Read more about It Can Only Goodhart Happen

Running the OpenTelemetry Collector as a Lambda

Jun 8, 2026 By Jessica Kerr (Jessitron) In Honeycomb

The OpenTelemetry Collector is usually deployed as a long-running process: a sidecar, a DaemonSet, an EC2 instance, a docker container on my computer. It sits there listening for telemetry. That's fine when I want to send telemetry all day, but not when telemetry is rare. Like right now, when I have an agent defined on AgentCore, and it runs a few times a week maybe. Or my website that hardly sees any traffic. Can I run the OpenTelemetry Collector as a Lambda function?

Read Post

Honeycomb

Read more about Running the OpenTelemetry Collector as a Lambda

MCP Servers Are Becoming a Core Interface Layer in Data Observability and Data Quality

Jun 6, 2026 By OpsMatters In OpsMatters

Data observability has traditionally been built around human workflows. When data breaks, engineers are alerted, open dashboards, inspect lineage graphs, and manually trace the issue across pipelines. The system is designed for human investigation and interpretation. That model is now being challenged by the rise of AI agents in data operations. As organizations begin embedding AI into analytics, engineering, and decision-making workflows, observability is no longer just about explaining what happened - it must also enable systems to understand and act on it.

Read Post

OpsMatters

Read more about MCP Servers Are Becoming a Core Interface Layer in Data Observability and Data Quality

Why Engineers Don't Trust Autonomous AI - 4th Annual Observability Survey | Grafana Labs

Jun 5, 2026 By Grafana In Grafana

The 2026 Observability Survey from Grafana Labs heard from over 1,300 engineers and leaders across 76 countries on the real-world role of AI in observability. The data reveals a sharp distinction between intelligence and autonomy — and a critical blind spot most teams have.

View Video

Grafana

Read more about Why Engineers Don't Trust Autonomous AI - 4th Annual Observability Survey | Grafana Labs

How APM fits into the modern observability stack

Jun 4, 2026 By Kirubanandan Rammohan In ManageEngine

Most engineering teams don't have a data problem. They have an interpretation problem. Prometheus is running, logs are shipping to the aggregator, dashboards are green-and then a latency spike hits and the root cause takes 45 minutes to isolate. The data was there but the answer wasn't. That gap is where application performance monitoring (APM) operates. This article explores what APM adds to a modern observability stack, why relying on standalone tools leaves critical blind spots, and how teams can unify infrastructure data with application context for a complete operational picture.

Read Post

ManageEngine

Read more about How APM fits into the modern observability stack

Why Observability Is Essential for Platform Engineers?

Jun 4, 2026 By Mohana Ayeswariya J In Atatus

Observability is how platform teams stop being the answer to every question and start building platforms that answer those questions themselves. This article explains specifically how observability enables platform engineers to support development teams better which reducing ticket volume, cutting MTTR, enabling SLO ownership, and making microservice debugging something devs can do without escalating to you.

Read Post

Atatus

Read more about Why Observability Is Essential for Platform Engineers?

AI Observability Deep Dive Demo | Grafana Cloud

Jun 4, 2026 By Grafana In Grafana

Grafana AI Observability is our new database and platform for observing AI Agents. Over the past year at Grafana Labs, we built Agents and we needed a way to understand how they are performing, what are the costs associated with them, what's the error rate or time to the first token as well as how they are behaving. Grafana Staff Engineer, Ivana Hučková provides a deep dive demo on how Grafana AI Observability connects our experience building Agents with our experience building observability systems.

View Video

Grafana

Read more about AI Observability Deep Dive Demo | Grafana Cloud

Observability for Healthcare Systems | Grafana Everywhere

Jun 4, 2026 By Grafana In Grafana

Grafana Assistant is going places you might not expect — including healthcare. Golden Grot winner Oren Lion from TeleTracking reveals how Grafana Cloud supports their systems that help keep patient care moving — and how Assistant enables teams to get from “what happened?” to “here’s why” faster. From moon landings to patient care, Grafana is everywhere. Congratulations to Oren, Chris Johnson, Mark Munson, and the entire TeleTracking team on winning this year's Golden Grot Award for Pioneering AI in Observability!

View Video

Grafana

Read more about Observability for Healthcare Systems | Grafana Everywhere

How to debug REST Collector APIs with Cribl REST Collector Diagnostics

Jun 4, 2026 By Cribl In Cribl

This video introduces the new REST Collector Diagnostics feature in Cribl, which helps you troubleshoot API collection issues faster. It’s designed for observability and data engineers who use REST Collector to pull data from external APIs and need deeper visibility into HTTP requests, responses, and errors.

View Video

Cribl

Read more about How to debug REST Collector APIs with Cribl REST Collector Diagnostics

Claude Code Observability at Scale: How We Did It With Bindplane

Jun 4, 2026 By Chelsea Wright &Adnan Rahic In ObservIQ

At Bindplane, we iterate fast. One of the most important tools we've adopted across our organization is Claude Code. It helps every team here build solutions to complex problems with both speed and precision. But speed without visibility is a liability. We needed a reliable way to monitor and audit how Claude Code was being used across our team. Luckily, we build the best platform on the market for data in motion.

Read Post

ObservIQ

Read more about Claude Code Observability at Scale: How We Did It With Bindplane

Cribl Search Pack for Zscaler: Setup & security dashboard walkthrough

Jun 3, 2026 By Cribl In Cribl

Learn how to install and configure the Cribl Search Pack for Zscaler, then walk through prebuilt dashboards for your Zscaler security logs. This video is for security engineers, Zscaler administrators, and SOC/observability teams using Cribl Search to monitor and investigate Zscaler activity. In this walkthrough, you’ll see: If you need a reminder or want to share feedback on the pack, you can always refer to the README bundled with the pack or reach out to the Cribl team.

View Video

Cribl

Read more about Cribl Search Pack for Zscaler: Setup & security dashboard walkthrough

How Support Uses Honeycomb to Debug Honeycomb

Jun 2, 2026 By Sara Cave In Honeycomb

You'd think that working at an observability company means everyone knows exactly where to find everything in the data. It doesn't. Especially not on the support team. We're the ones who get the tickets. We're in the telemetry every day trying to figure out what went wrong for a customer, and we do that by pointing Honeycomb at itself. Here's how that actually works, and how it's changed.

Read Post

Honeycomb

Read more about How Support Uses Honeycomb to Debug Honeycomb

Splunk Observability at Cisco Live: Agentic Observability for the AI Era

Jun 2, 2026 By Cale Hilts In Splunk

Observability has always been about seeing clearly under pressure. But the pressure has changed. Applications are more distributed. Kubernetes environments keep expanding. Digital experiences depend on services, APIs, networks, third-party providers, and now AI models and agents that can make decisions faster than a human team can review every signal.

Read Post

Splunk

Read more about Splunk Observability at Cisco Live: Agentic Observability for the AI Era

The Observability Journey: Getty Images and Cribl

Jun 2, 2026 By Cribl In Cribl

I recently sat down with Simon Overbey and Lovepreet Singh - the Engineering Manager and systems engineer (respectively) at Getty Images to talk about their experiences implementing Cribl. After getting a rundown of the pre-Cribl environment (described above) I asked to jump straight to the end, the net benefits. If the "before" was a terrifying tidal wave of cost and complexity, what did the "after" look like?

View Video

Cribl

Read more about The Observability Journey: Getty Images and Cribl

How to Build Real-Time Supply Chain Observability

Jun 2, 2026 By OpsMatters In OpsMatters

"One missing pallet." That's how a warehouse supervisor in New Jersey described the start of a week-long supply chain mess back in 2024. One pallet. Then came delayed trucks, angry retailers, overtime pay, and a customer threatening to walk. In logistics, small gaps don't stay small for long. And the uncomfortable part is that most teams are already working hard. The issue isn't effort. It's alignment. The data exists in most organizations-it just doesn't show the same reality at the same time. Which leaves a basic question surprisingly hard to answer: what's actually happening right now?

Read Post

OpsMatters

Read more about How to Build Real-Time Supply Chain Observability

Operations | Monitoring | ITSM | DevOps | Cloud