Operations | Monitoring | ITSM | DevOps | Cloud

Operational Intelligence - the new horizon of observability

Monitoring your systems isn't enough anymore. Neither is “asking questions about your system”. Operational Intelligence embraces observability to proactively deliver business insights, support decision-making, and accelerate innovation. It seems that as the observability market grows and more and more products come into the space, the meaning of the term observability itself becomes more and more nebulous.

A Quick Guide To Kubernetes Observability

Many companies are rapidly adopting cloud-native computing services, like containers, microservices, and serverless computing. Unlike monolithic applications, these technologies rely on distributed architectures. Whether you are running them in the cloud, on-premises, or both, distributed systems consist of thousands or millions of processes and components. The challenge now is to make these complex systems’ inner workings visible, controllable, and improvable.

The Open Source Observability Podcast - EP #1: Clickhouse, Data Lakes, and AWS S3 with Joshua Lee

In this episode we get to dive into some of Josh's favourite databases and telemetry sources for observability. Listen to learn what open source software you could benefit from including in your toolstack! Joshua Lee is a Developer Advocate at Altinity, where he applies his observability and engineering background to ClickHouse use cases and creates educational content to support the open source community. He has over 15 years of experience in leading software projects for a broad scope of industries.

From Detection to Resolution: How Selector + Itential Deliver AI-Driven Observability and Automated Recovery

Every second counts when it comes to detecting, diagnosing, and resolving network incidents, yet many teams still find themselves stuck in reactive mode, drowning in alerts, manually writing scripts, and managing tickets across disconnected systems. This is where Selector and Itential come in. Together, Selector and Itential deliver a powerful, enterprise-ready solution that closes the loop between detection and action.

Can AI/ML Guide Observability? Tech Talk #6

This talk will examine the application of Artificial Intelligence and Machine Learning in observability. It will cover how AI/ML is being used to monitor systems, detect anomalies, and extract insights from telemetry data. The session will provide information on integrating AI/ML into observability pipelines, improving analytical capabilities, and system performance.

Observability Across Asia-Pacific: What's Holding Teams Back? | 2025 Observability Survey Analysis

What’s holding back observability maturity in Asia-Pacific? Grafana Labs' cofounder Anthony Woods shares key takeaways from the largest global observability survey. Learn how SaaS, budget concerns, and org structure are shaping Asia-Pacific (APAC)'s future. Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more.

Understanding APM and Distributed Tracing in the Observability Stack

To keep modern applications running smoothly, you need more than just basic monitoring. APM (Application Performance Monitoring) gives you a broad overview, tracking metrics like latency, errors, and system health. Distributed Tracing, on the other hand, shows the full journey of each request across services, helping you pinpoint the root cause of slowdowns or failures.

Master Your AWS Cloud Environment With Observability

In many cases, cloud and on-premises environments exist side by side. The only way to maintain visibility into such intricate hybrid ecosystems is with a sophisticated end-to-end observability solution. Here are the key factors in choosing a comprehensive observability solution to help you master your AWS cloud and on-prem environments.

Observability 2.0: Seeing More, Knowing More, Fixing More

The era of scattered monitoring tools and fragmented operational visibility is over. As hybrid and multi-cloud environments have become the norm rather than the exception, traditional observability approaches—siloed metrics, isolated logs, and disconnected traces—can no longer keep pace with the complexity of modern IT infrastructure. Organizations today need more than just monitoring.

Observability Without Tradeoffs: Introducing Powerful New Honeycomb Telemetry Pipeline Features

Every day, enterprise companies generate terabytes of observability data while engineering teams are under pressure to cut costs. One of the easiest ways to reduce observability bills is through sampling: intentionally sending only a representative portion of telemetry data, rather than the full volume, to your observability tool. But turning down the dial is risky.

Grafana Cloud: Manage the AWS Observability app as code with Terraform

Imagine setting up your AWS configuration in Grafana Cloud by hand and clicking through menus. When you only have a few services, it’s not a big deal. But as you add more and more, keeping track of every little change becomes a headache. It’s easy to make mistakes, and before you know it, things can get out of sync and your monitoring becomes unreliable.

From dashboard soup to observability lasagna: Building better layers

Let's be honest - observability can suck. Ever feel like you're swimming in dashboard soup? You know the feeling: tons of single-use dashboards, building new ones during every incident only to lose them in the chaos, and spending ages creating visualizations that no one ever looks at again. Even with all the right tools, something still feels off.

Fireside Chat: Observability Lessons and Practices from a Fortune 500 Leader

Join SAP CX's Martin Norato Auer, VP of Observability, and Catchpoint’s Nick Homan as we explore SAP CX’s journey from fragmented alert management to a scalable, standardized observability model. In this candid fireside chat, Martin shares how his team overcame alert fatigue, integrated observability with automation and BI, and scaled their practices across multiple SAP CX products with APM & Internet Performance Monitoring (IPM).

Slash Observability Costs Without Sacrificing Reliability: The OTEL + PagerDuty Advantage

In a time when budgets are tight but reliability still needs to be high, observability is under the spotlight. Monitoring and observability tools are some of the most expensive parts of a tech stack, often eating up the bulk of the budget. Luckily, there are strategies organizations can implement to reduce costs, such as utilizing open-source solutions like OpenTelemetry (OTEL), which provides a flexible, open standard for data collection without the price tag of proprietary tooling.

Harnessing Network Observability to Speed the Telco-to-Techco Transition

For telecommunications firms (telcos), the race is on. If these organizations are to rise to meet their top challenges and growth objectives, transformation is a must. Those who make this move most rapidly will be best positioned for sustained success. Today, telcos face several significant challenges, which are creating fundamental disruption: Telcos need to transform to contend with these shifts.

Brand-Driven Observability: Crafting Monitoring That Reflects Your Product Identity

In the fast-paced world of modern IT operations, observability has become a crucial pillar in ensuring the health, reliability, and performance of complex systems. As organizations scale their infrastructures and embrace distributed architectures, monitoring systems have evolved beyond simple uptime checks to holistic observability platforms. However, in this technical landscape, one often overlooked element is the role of branding in observability design.

Highlight reel: Futureproof Your AI Investment With Observability

Artificial intelligence is changing the way modern systems are built—and how teams are expected to and operate them. But as AI-driven complexity grows, so too does the need for deep, reliable, and fast visibility into what’s really happening inside our. In this timely and thought-provoking session, Christine Yen, CEO and Co-founder of Honeycomb, explores how practices must evolve to keep pace with.

Honeycomb Observability Day London: A Jam-Packed Day of Great Talks

On May 15th, 2025, Honeycomb hosted Observability Day (or O11yDay) in the London financial district. The skies were clear and the weather was wonderful and we had a huge turnout, from our networking breakfast to the happy hour at the end of the day.

Tales From the Trench: Building With LLMs and Honeycomb

AI discourse these days is all over the place. Depending on who you talk to, AI’s are absolute flash-in-the-pan junk, or they’re the best thing since sliced bread. I want to cut through the noise, though, and see for myself what someone can do out here on the bleeding edge. Thus, I’m setting myself a challenge: write a usable—and useful—application with Claude Code, from soup to nuts. Here are the rules: With our ground rules established, let’s figure out our app!

Announcing Qovery Observability: the simplest way to understand your application

We are thrilled to announce the next major milestone in our platform vision: Qovery observability! Qovery Observability is our new product, ready to give you the fastest way to gain a crystal-clear, unified understanding of your application and infrastructure. Fully managed, zero lock-in, you keep the data. Devs love it, no DevOps needed. Coming soon!

Defining SLA/SLO-Driven Monitoring Requirements in 2025

SLA/SLO-driven monitoring aligns your observability strategy with business objectives by defining measurable service targets and implementing monitoring systems that track progress toward those goals. Service Level Agreements (SLAs) represent commitments to users, while Service Level Objectives (SLOs) are internal targets that ensure you meet those commitments with a safety buffer. In 2025, organizations running distributed systems need monitoring that goes beyond basic uptime checks.

OpenTelemetry for Go: measuring the overhead

Everything comes at a cost — and observability is no exception. When we add metrics, logging, or distributed tracing to our applications, it helps us understand what’s going on with performance and key UX metrics like success rate and latency. But what’s the cost? I’m not talking about the price of observability tools here, I mean the instrumentation overhead.

Observability trends in Japan: Insights from Grafana Labs' latest survey

Japanese organizations are focused on controlling costs and limiting complexity—and they might be getting ready to broaden their adoption at just the right time, according to analysis of a micro survey on observability recently conducted by Grafana Labs. Observability is an evolving space in Japan, and this is the first time Grafana Labs has run a Japanese version of our annual Observability Survey.

How to reduce Cloud Costs (with Open Source!)

We strongly believe that simple observability should be an innovation everyone can afford to benefit from: which is why Coroot is open source, and includes cost monitoring for Azure, GCP, AWS, or your own custom settings. eBPF automatically tracks how each deployment impacts your cloud costs, so you can easily roll back changes and avoid lovecraftian monthly bill when necessary.
Sponsored Post

The Network-First Advantage: How Fabrix.ai Redefines Observability from the Ground Up

Modern enterprises today often find themselves in a peculiar predicament: they are drowning in a deluge of telemetry data—including logs, metrics, and traces—yet paradoxically remain blind to what truly matters. Despite making substantial investments in observability tools, teams frequently find themselves reacting to incidents rather than proactively preventing them, with alerts flooding dashboards often devoid of critical context.

Ensure trust across the entire data life cycle with Datadog Data Observability

As data systems grow more complex and data becomes even more business-critical, teams struggle to detect and resolve issues that impact data quality, reliability, and, ultimately, trust. Engineers have to rely on manual checks and ad hoc SQL queries to catch data quality issues—often after teams relying on the data have noticed something has gone wrong.

Lunar-level observability: How Firefly Aerospace used Grafana to monitor its historic moon landing

On March 2, 2025, Firefly Aerospace made history. The company — a space services firm that offers safe, reliable, and economical access to space — completed the first fully successful lunar landing by a commercial provider with its Blue Ghost Mission 1. But behind the headlines and highlight reels was a team of dedicated engineers, years of preparation, and a mission control center outfitted with Grafana dashboards.

Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos

Log volume is exploding, costs are rising, and most teams are stuck duct-taping together short-term fixes. During our webinar, "Optimizing Log Management in Datadog: Cut Costs Without Losing Insights," we discuss how DevOps and engineering leaders are navigating the growing pains of observability, especially in environments where tools like Datadog are mission-critical but challenging to manage. Here’s a recap of the key takeaways.

It's The End Of Observability As We Know It (And I Feel Fine)

In a really broad sense, the history of observability tools over the past couple of decades have been about a pretty simple concept: how do we make terabytes of heterogeneous telemetry data comprehensible to human beings? New Relic did this for the Rails revolution, Datadog did it for the rise of AWS, and Honeycomb led the way for OpenTelemetry.

The One Where We Show You Copilot Editor

Copilot Editor is like an AI-powered Rosetta Stone for telemetry. It helps Cribl users take raw, messy telemetry data and turn it into standardized, analytics-ready formats. The most important piece? It puts YOU in control. Our human-in-the-loop design means that users have full control over and visibility into what’s happening with their critical data, preventing AI-induced mistakes. Watch this fun demo with the AI product team to show Copilot Editor's true value to the average Cribl user!

Top Features of Splunk Observability Cloud for Engineers

In this video we’ll walk you through a demonstration of Splunk Observability Cloud’s key capabilities. You’ll see how you can monitor Kubernetes cluster health in Infrastructure Monitoring, and alert on your services’ health using AutoDetect Detectors and Alerts. We’ll then take a look at traces and metrics in APM, and use Related Content to find correlated log entries of error traces. Then we’ll use AlwaysOn Profiling to troubleshoot long duration traces for our service.

MCP = Observability + Code, a Real-life Example

Our bot is hitting an error. We can see it in the distributed trace. Here, see what happened when we noticed it: Austin fired up Claude Code (hooked up to Honeycomb with its MCP tool) and got it to find the error, fix it, deploy, and check that the fix worked. It got a little overconfident at first, but the ending is happy. IRL this took 22 minutes; the video speeds up the AI agent interactions and cuts out waiting. This video includes Austin Parker, Jessica Kerr, and Ken Rimple.

Beyond Shift Left: Engineering Leaders Increase Speed and Resilience With Observability

We recently had the privilege of hosting several industry experts and technology executives across platform strategy, SRE, and engineering enablement for breakfast at our Observability Day in London. We noted that they’re all facing the same fundamental tension: deliver faster, scale smarter, stay resilient, and somehow get ahead of what’s coming next. But how do you move fast without breaking things? And how do you prove the value of the things you don’t break?

Top 5 Observability Tools DevOps Teams Should Know

Observability and monitoring are the cornerstone of resilient, high-performing applications. Nearly every IT or software engineering leader we come into contact with emphasizes the importance of the ability to understand and diagnose what is going on with their applications at all times. Having clear and concise visibility into your applications is no longer optional.

Working with GPUs on Kubernetes and making them observable

GPUs are everywhere powering LLM inference, model training, video processing, and more. Kubernetes is often where these workloads run. But using GPUs in Kubernetes isn’t as simple as using CPUs. You need the right setup. You need efficient scheduling. And most importantly you need visibility. This post walks through how to run GPU workloads on Kubernetes, how to virtualize them efficiently, and how Coroot helps you monitor everything with zero instrumentation or config.

Inside the Wins: Real Stories of Transforming Azure Observability into Business Value

Azure environments are growing fast, and so are the challenges of monitoring them at scale. In this blog, part of our Azure Monitoring series, we look at how real ITOps and CloudOps teams are moving beyond Azure Monitor to achieve hybrid visibility, faster troubleshooting, and better business outcomes. These real-life customer stories show what’s possible when observability becomes operational. Want the full picture? Explore the rest of the series.

Real-Time Observability with ClickHouse, Coroot, and GlassFlow

Coroot is excited to feature an editorial from GlassFlow for our first Open Source Spotlight. We hope to improve the workflow of our global community of SREs and DevOps professionals by sharing exciting projects like Glassflow, which make innovation accessible for everyone through the freedom of open source. If you have an open source or open core project you’d like to see on our blog next, send us a message!

How to Improve Uptime and Achieve Root Cause Analysis (with Open Source!)

Observability doesn’t begin and end at telemetry or your ELK stack: most open source or vendor tools require configuration, dashboard customization, and may not actually pinpoint the data you need to mitigate system risks. Coroot was designed to solve the problem of time-consuming root cause analysis: it handles the full observability journey — from collecting telemetry to turning it into actionable insights. We also strongly believe that simple observability should be an innovation everyone can afford to benefit from: which is why our software is open source.

A Developer's Framework for Selecting the Right Tracing Vendor

Distributed tracing tracks requests as they flow through microservices, revealing bottlenecks, failures, and performance patterns. Without proper tracing, debugging production issues becomes guesswork—especially in complex architectures with dozens of services. Modern applications generate millions of traces daily. The right vendor helps you extract actionable insights without drowning in data or breaking your budget.

Peacetime Observability: Spotting Risks Before They Become Incidents

Most of the time, nothing’s broken. Traffic’s flowing, alerts are quiet, and everything seems fine. That’s peacetime, when no one’s getting paged. Coroot helps in both peacetime and wartime. When things go wrong, it guides you to the root cause fast. But during peacetime, it helps you spot risks early, clean up inefficiencies, and prevent those incidents from happening in the first place.

Why database observability is key to successful cloud data platform adoption

Data is the lifeblood of businesses the world over, from the smallest startup to the largest enterprise. Making sure that it’s available when you need it, secured for authorized use, and recoverable from faults is vital to operating data platforms, no matter where your business is on its cloud journey. This can only be achieved by putting the right data into the hands of the right people, in a timely way, to make the right decisions about how to manage that platform effectively.

Monitoring Backstage with OpenTelemetry:Closing the observability blind spot

‘One small step for a man, but a huge leap for developers’ — me, when I realised how to observe my Backstage with OpenTelemetry. Backstage is often the “portal” through which we manage all our other systems, but who watches the watcher? Recently, we gave a KubeCon Talk, highlighting that monitoring Backstage itself is critical. When Backstage isn’t observable, it becomes a blind spot in your infrastructure.