Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

Incident Report: Exercises, Cleanups, and Evacuations

Every year, Honeycomb runs disaster recovery scenarios in multiple environments, including in production. Although each of our instances runs in a single region, on at least three Availability Zones (AZs), we have multiple plans for partial regional failures, and particularly, zonal failures. One of these tests was run on December 5th, and after its successful completion came its cleanup steps.

Observability Self-Hosted 2026.1 - Server Configuration Comparisons

In this video, SolarWinds Evangelist Chrystal Taylor introduces server configuration comparisons, a new feature in Observability Self-Hosted 2026.1 and Server Configuration Monitor 2026.1. The key highlight is the ability to compare server configurations side by side, enabling users to identify differences in configuration files between nodes or against a defined ideal state. This new functionality aims to help users monitor configuration drift.

Observability Self-Hosted 2026.1 - Additional Cloud Support

SolarWinds Evangelist Chrystal Taylor demonstrates the new cloud entity support features in Observability Self-Hosted version 2026.1. The update adds monitoring capabilities for MySQL and PostgreSQL databases on Google Cloud Platform, GCP load balancers, Azure functions, AWS Elastic Kubernetes Service, and AWS Lambda functions. She provides a guided walkthrough of the dashboard interface, showing how users can monitor various metrics including database performance, network traffic, latency, function execution counts, system usage, and costs across different cloud platforms.

A 4-Month Bug Fixed in <10 Minutes with Olly

In today’s highly interconnected systems, the subtle relationships between services are rarely obvious. Modern, complex architectures generate telemetry that functions less as “flashing signs” and more as faint “breadcrumbs” to be followed across a vast network of signals. In 2025, about two-thirds of outages involved third-party systems like cloud platforms and APIs.

Using Core Web Vitals in Honeycomb Frontend Telemetry

Google's Core Web Vitals (CWVs) measurements have been used by web administrators and SREs to review frontend application performance metrics, and have been factored into Google's page rankings since 2021. They are also used in Google Analytics, which crawls websites and evaluates performance metrics over a period of multiple days, and with various frontends (desktop web, mobile web, etc.) to establish how well a website performs in production.

The Next Era of Observability: Founders' Reflections - Additional Q&A

What happens when the people who helped define observability take a hard look at AI? That’s what Honeycomb co-founders Christine Yen (CEO) and Charity Majors (CTO) dug into during this webinar, starting with the early days of observability (back when it wasn’t even a category yet).

Kiro Can Now Use Lightrun via MCP

AI code assistants transformed how software is written. They did not transform how it fails. Today, we’re announcing a new MCP integration between Lightrun and Kiro. Kiro now gains live runtime visibility through the Lightrun MCP, grounding AI-assisted development in how code actually behaves at runtime. Kiro, the AI coding assistant from the teams at AWS, is built for velocity and intuition. It helps teams move from specification to production faster by turning intent into working code.

How to Make AI-Generated Code Reliable with Runtime Context

AI coding assistants like Cursor and Claude Code are driving massive productivity gains, yet they have introduced a critical validation gap in the software delivery lifecycle. While these tools excel at generating syntax, they lack visibility into live production environments. This article explains how Runtime Context, the missing nervous system of AI development, secures production by moving from probabilistic guessing to deterministic, live code validation.

Teaching AI How to Refinery

At the beginning of February, we released v3.1 of Refinery, our advanced, tail-based sampling solution. The new version comes with more performance enhancements, bug fixes, and a few new pieces of telemetry. In tandem with the 3.1 release, we also released a new tool for our MCP server which helps your AIs understand Refinery, and how Honeycomb handles sampling.

Introducing "Explain Flame Graph": Stop Fighting Fires and Start Explaining Them

In a modern observability deployment, it’s simple to get data that helps you understand where your system is failing. However, when we try to understand why, the answer is often buried beneath a mound of stack traces. For many developers, attempting to interpret a flame graph by manually calculating self-time (the resources consumed by the function itself) versus child-frame latency (the time spent waiting on called sub-functions) is both confusing and time-consuming.