Operations | Monitoring | ITSM | DevOps | Cloud

8 Years of Building Obkio: From Network Monitoring to Observability & Network Diagnostics

In 2016, Obkio was just an idea, but it was an idea born from a real problem. Before writing a single line of code, we conducted a market audit to understand why Network Performance Monitoring solutions weren't more mature. We interviewed banks, manufacturing companies, and service providers, and the answer was unanimous: the NPM tools on the market were too complex, and most businesses simply didn't have the internal resources to dedicate full-time to managing them.

Data Observability, AI Guard, Feature Flags, Ambassador program, and more | This Month in Datadog

See how you can ensure trust across the data life cycle in February’s episode of This Month in Datadog. Join us for a spotlight of Datadog Data Observability, which enables you to detect data quality and pipeline issues early, as well as remediate those issues with end-to-end lineage. Plus, we cover: Protecting agentic AI applications from real-time threats with Datadog AI Guard Staying up to date and reducing steps to collaborate with five new Incident Management releases Releasing software with confidence using Datadog Feature Flags.

The rise of agentic AI in production: Can observability systems run themselves?

Sometimes the biggest shifts in technology aren’t about collecting more data — they’re about who (or what) gets to act on it. In this episode of “Grafana’s Big Tent” podcast, host Tom Wilkie, Grafana Labs CTO, is joined by Spiros Xanthos, Founder & CEO of Resolve AI, Manoj Acharya, VP of Engineering for Observability at Grafana Labs, and Cyril Tovena, Principal Engineer on the Grafana Assistant team, to discuss agentic AI in observability.

From RCA to Autonomous Ops: The Future of AI in Observability | Big Tent S3E7

SREs are famously skeptical of AI — so how do you convince them to trust agents in production? In this episode of Grafana’s Big Tent, Tom Wilkie talks with Spiros Xanthos (Resolve AI), Manoj Acharya (Grafana Labs), and Cyril Tovena (Grafana Assistant team) about agent-first observability. They unpack knowledge graphs, LLM reasoning, autonomous debugging, pricing models, and the “Claude Code moment” for observability. Is autonomous production ops closer than we think?

Observability Self-Hosted 2026.1 - Routing Insights

SolarWinds Evangelist Chrystal Taylor introduces the new routing insights feature in Observability Self-Hosted 2026.1. This first phase enhancement enriches routing table information with detailed context, including forwarding interface names, VRF data, next hop IPs, and timestamps. The update unifies BGP, OSPF, and EIGRP neighbors in a single dashboard, providing visibility into peer identity, flap counts, health status, and admin states.

Colsubsidio transforms business process monitoring with Elastic Observability

Colsubsidio is one of the largest and most representative family compensation funds in Colombia. The organization manages and delivers essential social services to millions of users through a broad network spanning health, education, subsidies, recreation, tourism, credit, housing, pharmacies, retail supply, culture, and labor welfare.

Incident Report: Exercises, Cleanups, and Evacuations

Every year, Honeycomb runs disaster recovery scenarios in multiple environments, including in production. Although each of our instances runs in a single region, on at least three Availability Zones (AZs), we have multiple plans for partial regional failures, and particularly, zonal failures. One of these tests was run on December 5th, and after its successful completion came its cleanup steps.

Observability Self-Hosted 2026.1 - Server Configuration Comparisons

In this video, SolarWinds Evangelist Chrystal Taylor introduces server configuration comparisons, a new feature in Observability Self-Hosted 2026.1 and Server Configuration Monitor 2026.1. The key highlight is the ability to compare server configurations side by side, enabling users to identify differences in configuration files between nodes or against a defined ideal state. This new functionality aims to help users monitor configuration drift.

Observability Self-Hosted 2026.1 - Additional Cloud Support

SolarWinds Evangelist Chrystal Taylor demonstrates the new cloud entity support features in Observability Self-Hosted version 2026.1. The update adds monitoring capabilities for MySQL and PostgreSQL databases on Google Cloud Platform, GCP load balancers, Azure functions, AWS Elastic Kubernetes Service, and AWS Lambda functions. She provides a guided walkthrough of the dashboard interface, showing how users can monitor various metrics including database performance, network traffic, latency, function execution counts, system usage, and costs across different cloud platforms.

Using Core Web Vitals in Honeycomb Frontend Telemetry

Google's Core Web Vitals (CWVs) measurements have been used by web administrators and SREs to review frontend application performance metrics, and have been factored into Google's page rankings since 2021. They are also used in Google Analytics, which crawls websites and evaluates performance metrics over a period of multiple days, and with various frontends (desktop web, mobile web, etc.) to establish how well a website performs in production.

A 4-Month Bug Fixed in <10 Minutes with Olly

In today’s highly interconnected systems, the subtle relationships between services are rarely obvious. Modern, complex architectures generate telemetry that functions less as “flashing signs” and more as faint “breadcrumbs” to be followed across a vast network of signals. In 2025, about two-thirds of outages involved third-party systems like cloud platforms and APIs.

The Next Era of Observability: Founders' Reflections - Additional Q&A

What happens when the people who helped define observability take a hard look at AI? That’s what Honeycomb co-founders Christine Yen (CEO) and Charity Majors (CTO) dug into during this webinar, starting with the early days of observability (back when it wasn’t even a category yet).

Kiro Can Now Use Lightrun via MCP

AI code assistants transformed how software is written. They did not transform how it fails. Today, we’re announcing a new MCP integration between Lightrun and Kiro. Kiro now gains live runtime visibility through the Lightrun MCP, grounding AI-assisted development in how code actually behaves at runtime. Kiro, the AI coding assistant from the teams at AWS, is built for velocity and intuition. It helps teams move from specification to production faster by turning intent into working code.

How to Make AI-Generated Code Reliable with Runtime Context

AI coding assistants like Cursor and Claude Code are driving massive productivity gains, yet they have introduced a critical validation gap in the software delivery lifecycle. While these tools excel at generating syntax, they lack visibility into live production environments. This article explains how Runtime Context, the missing nervous system of AI development, secures production by moving from probabilistic guessing to deterministic, live code validation.

Teaching AI How to Refinery

At the beginning of February, we released v3.1 of Refinery, our advanced, tail-based sampling solution. The new version comes with more performance enhancements, bug fixes, and a few new pieces of telemetry. In tandem with the 3.1 release, we also released a new tool for our MCP server which helps your AIs understand Refinery, and how Honeycomb handles sampling.

Introducing "Explain Flame Graph": Stop Fighting Fires and Start Explaining Them

In a modern observability deployment, it’s simple to get data that helps you understand where your system is failing. However, when we try to understand why, the answer is often buried beneath a mound of stack traces. For many developers, attempting to interpret a flame graph by manually calculating self-time (the resources consumed by the function itself) versus child-frame latency (the time spent waiting on called sub-functions) is both confusing and time-consuming.

Sovereign observability: How UAE data residency powers resilient digital economies

Cloud observability is a must for IT teams operating in modern digital economies. It allows administrators to see inside complex systems, understand how each component behaves under real conditions, and act before users or regulators feel the impact. In simple terms, observability transforms digital infrastructure from a black box into a transparent, accountable, and resilient system.

Uptrace Errors & Logs Tutorial: Capture Stacktraces, Context, and Traces in One Place

Every error tells a story — and Uptrace helps you see the full picture. In this tutorial, you’ll learn how to use Uptrace to capture errors, logs, stacktraces, and request context in a single observability platform. See how errors automatically link to traces, understand exactly what happened, and debug issues faster with rich attributes, user data, and performance impact. What you’ll learn: Understand not just *what broke*, but *who it affected and why* — and fix problems with confidence using Uptrace.

Uptrace Tutorial: Dashboards, Percentiles, Heatmaps & OpenTelemetry Metrics

Learn how to use *Uptrace* to measure what truly matters in your applications using percentiles, heatmaps, and histograms—then turn that data into dashboards that answer questions before they’re even asked. In this tutorial, you’ll discover how to: Whether you’re setting up observability for the first time or replacing expensive monitoring tools, this guide shows how Uptrace helps you understand performance, reliability, and user experience — all in one place.

End-to-End Tracing with Uptrace: Follow Any Request Across Your Entire System

Stop guessing where requests slow down. With Uptrace, you can follow any request across your entire system and instantly see performance bottlenecks, errors, and latency sources. This video covers: Build real observability, not just dashboards.

Happy Birthday to Us: Honeycomb 10 Year Manifesto, Part 1

Christine and I started Honeycomb in 2016, which means it’s been ten years. Christine, a developer, and I, an operations engineer, were both profoundly unhappy with the state of the art in monitoring and logging tools. The tools we had used at Facebook didn’t spray our signals around to a bunch of siloed-off pillars. They consolidated as much context as possible so we could properly explore it, the way every other non-software engineering team already takes for granted.

ilert now supports a native WhaTap integration

ilert now supports a native WhaTap integration, connecting AI-native observability with AI-first incident management in a seamless workflow. This integration allows DevOps, SRE, and IT teams to move instantly from detection to resolution – cutting through alert noise, improving coordination, and dramatically reducing MTTR in even the most complex IT environments.

The Architecture Shift Powering Network Observability

If you work in network operations, you know that the only constant is the increasing complexity of the infrastructure you manage. The days of installing a monolithic software package on a single bare-metal server and letting it hum along for years are largely behind you. The software industry has largely shifted toward cloud-native architectures, microservices, and containerization. While these shifts promise agility and scalability, they also introduce significant operational complexity.

Kubernetes Network Observability: Comparing Calico, Cilium, Retina, and Netobserv

Calico, Cilium, Retina, and Netobserv: Which Observability Tool is Right for Your Kubernetes Cluster? Network observability is a tale as old as the OSI model itself and anyone who has managed a network or even a Kubernetes cluster knows the feeling: a service suddenly can’t reach its dependency, a pod is mysteriously offline, and the Slack alerts start rolling in. Investigating network connectivity issues in these complex, distributed environments can be incredibly time consuming.

Why distributed observability is straining and what new research reveals

Distributed systems quietly run much of today's digital world. People expect these systems to work reliably across regions and time zones for everything from money transfers to streaming platforms and AI-driven workloads. As organisations use more microservices, containers, and event-driven architectures, observability has become the main way for teams to understand what is happening in production.

Heartbeat behind the metrics | Muraleedharan on support, scale, and seeing the product in the wild

What does observability look like when you’re responsible for customers at scale? In this episode of Heartbeat Behind the Metrics, Muraleedharan Sadhasivam, Head of Customer Success, talks about his 15-year journey at ManageEngine and the perspective you only get from being close to customers every day. He shares why custom dashboards matter so much, and why AppLogs is a feature he wishes more users explored to complete the MELT story. From querying logs to turning them into alerts and dashboards, he explains how real insights start when data is brought together.

How we built Grafana Assistant - a conversation about AI development for observability

This conversation with Grafana Labs engineers, Mat Ryer, Cyril Tovena and Sven Großmann, dives deep into the engineering behind Grafana Assistant, exploring how agentic AI is transforming the observability landscape. From hackathon origins to sophisticated backend agents, the team shares candid lessons on building, scaling, and refining AI tools for engineers.

Kiro Can Now Reason With Lightrun's Live Runtime Context

AI code generation is fast. Making it reliable requires runtime context. Today, Kiro gains live runtime visibility with the Lightrun MCP. This grounds AI-assisted development in how code actually behaves at runtime. Kiro, the AI coding assistant from the teams at AWS, is built for velocity and intuition. It moves from specification to production with speed and structure, helping teams turn intent into working code. But until now, like every AI coding assistant, Kiro had a major blind spot.

How Honeycomb Supercharges OpenTelemetry for AI

It has become common knowledge that the nature of software development has changed as AI-code generation and agent-based features gain adoption. In perhaps a more subtle shift, the fundamentals of software instrumentation are changing too. As OpenTelemetry becomes the standard instrumentation layer across enterprises, with thousands of developers (many from Honeycomb) actively contributing to it, the nature of the telemetry data captured itself is evolving to meet the growing demand for rich context.

Top 9 Observability Tools for AI-Assisted Development & Deployment

AI-assisted development is rapidly becoming the default way software is built. Code generation, AI copilots, agentic pull requests, and automated refactoring are now embedded directly into engineering workflows. While this shift dramatically increases delivery speed, it also introduces a new operational reality: production systems are changing faster than humans can fully reason about them. This is where observability becomes mission-critical.

Observability trends for 2026 (Part 2): GenAI and OpenTelemetry reshape the landscape

Over the course of my 20 years as a developer, SRE, and now observability product leader, software has typically progressed at a good pace. But now, the emergence of two transformative technologies are fundamentally reshaping enterprise observability: generative AI (GenAI) and OpenTelemetry (OTel). We surveyed over 500 IT decision-makers for a new report:The Landscape of Observability in 2026: Balancing Cost and Innovation.

30+ Top Observability Tools to Monitor Websites and Applications [2026 Updated]

By incorporating observability tools into your stack, you can better understand how your complex infrastructure operates, reduce downtime, and empower developers to identify and fix problems quickly. However, it now takes considerably more work, time, and money to build the best observability tools for your infrastructure and applications. According to a Splunk survey, over half of the firms polled employ eight or more observability tools.

OpenTelemetry Instrumentation Best Practices for Microservices Observability

OpenTelemetry instrumentation is the foundation of modern microservices observability, but getting it right in production requires more than just enabling auto-instrumentation. This guide covers production-tested OpenTelemetry best practices that help engineering teams achieve reliable distributed tracing, control observability costs, and extract maximum value from their telemetry data.

Observability vs Monitoring: Getting a Full Picture of the Environment

Driving down the highway, you usually glance intermittently at your speedometer to ensure that you stay within the speed limit, or whatever window above the speed limit you’re willing to drive. While monitoring your speed mitigates the risk of a ticket, you still need to look out for various threats on the road, like cars going through stop signs. By observing your surroundings, you take in real-time information that can help prevent a crash.