Operations | Monitoring | ITSM | DevOps | Cloud

Stop Leaking PII in Your #Telemetry with Cribl Guard

Sensitive data sneaks into destinations more often than teams realize. In this clip, we capture live events, spot emails and login tokens slipping through, and fix it instantly with Cribl Guard. A few clicks, a commit and deploy, and Guard redacts the data in real time. No complex configs. No regex nightmares. Just fast protection that keeps your telemetry clean and your security tight.

Pastries with SREs: Holding onto extra observability data and desserts

In this episode of Pastries with SREs, we dig into why you should keep all of your observability data, even if you don’t need it quite yet. We explore: With enriched logs and flexible, cost-effective storage, you can stop worrying about what you might need later and start answering questions with confidence, no matter when they arise. Additional resources.

How to Reduce Your Cloud Costs with Coroot

Cloud costs often grow quietly until they suddenly command everyone’s attention. Gartner estimates that companies overspend on cloud services by up to 70 percent, mostly because they lack clear visibility into where the money is actually being spent. Cloud invoices speak the language of infrastructure: nodes, instance types, regions, volumes, and egress. Engineering teams speak the language of services, deployments, and code.

AI and DevOps in 2025: How Autonomous Engineering Will Transform Software Operations and Reliability

DevOps started as a way to break down barriers between development and operations, but by 2025 the movement has shifted into something far more ambitious. Instead of simply speeding up releases or tightening workflows, companies are now adopting autonomous engineering systems-tools powered by AI that don't just support DevOps practices but actually carry them out.

Side-by-Side Variable Comparison for Snapshot Debugging

When you’re debugging a tricky issue in a distributed system, “what changed?” is often the most important question. You add logs, you capture data, you redeploy, and suddenly your browser is full of open tabs, copied JSON blobs, and screenshots of log lines. Comparing behavior between two requests, two users, or two releases turns into a manual, error-prone chore. Lightrun Snapshots were built to fix the data collection side of that story.

What's Special About MCP?

AI agents can interact with the world using tools. Those tools can be generic or specific. For example: Generic: Specific: The most general ones, like “run a bash command” and “read and write files” are built into the agent. More specific ones are provided through Model Control Protocol (MCP) servers. Every tool provided to the agent comes with instructions sent as part of the context.

How to Monitor Java Applications on Windows with SolarWinds Observability | APM Setup Guide

This video provides a step-by-step walkthrough for configuring monitoring for Java applications running on Windows using SolarWinds Observability. The demonstration covers the complete process—from adding a new service to instrumenting the application with the Java APM library and verifying connectivity. Topics covered in this video include: This guide is designed for developers, DevOps engineers, and system administrators who need to instrument Java applications on Windows for performance monitoring, distributed tracing, and full-stack observability.

Top 7 Observability Platforms That Auto-Discover Services

You can use an observability platform that automatically discovers your services and provides ready-to-use dashboards with minimal setup. If you're running a system where microservices come and go, containers shift around, or serverless functions scale up quickly, this kind of experience saves you a lot of time. You gain visibility as soon as something goes live, without requiring any additional steps on your part. In this blog, we talk about the top seven platforms that offer these capabilities.

Data Observability: Build confidence in the data life cycle

Datadog Data Observability provides a complete solution with quality checks (e.g., volume, row changes, freshness), custom SQL-based monitors, anomaly detection, column-level lineage across systems like Snowflake and Tableau, full pipeline visibility, and targeted alerts when data issues arise.

Use OpenTelemetry with Observability Pipelines for vendor-neutral log collection and cost control

Today, many DevOps and security teams operate in a world of complex, hybrid, or multi-vendor environments. As more teams look to avoid lock-in by adopting open standards, OpenTelemetry (OTel) is quickly gaining adoption as the primary open source method for DevOps and security teams to instrument and aggregate their telemetry data. However, OTel alone may lack the advanced processing functions, native volume control rules, and hybrid environment support that large organizations need.

AI Observability: How to Keep LLMs, RAG, and Agents Reliable in Production

AI observability closes the gap between “something’s wrong” and “here’s what to fix.” If you run AI in production, you might have felt the whiplash. Yesterday, your LLM answered in 300 milliseconds (ms). Today p99 crawls, costs spike, and nobody’s sure if the culprit is model behavior, data freshness, or GPUs stuck at the ceiling. Dashboards light up, but they don’t tell you which issue puts customers at risk. That’s the gap AI observability closes.

AI Isn't Here to Replace Your Dashboard... Yet

Non-deterministic UIs are the future and will replace your dashboards, but they’re not here yet. So until then, we’re stuck with conversational interfaces. In an effort to try and describe what I consider the future of UIs to look like, I wrote about how you (and I) have been designing dashboards wrong. The core insight was that we've been designing for static representations of data that sit on a TV in the office, when the actual use case is someone at a desk using them to debug an issue.

Search Telemetry Without Limits in a Multi Cloud and AI World

Cribl Search gives you one lens across all your telemetry data no matter where it lives. Instead of forcing teams to move data into one system or jump between tools, you get a familiar pipe based query experience with dashboarding and alerting built in. Storage and query processing stay separate so you decide where your data lives while your users get fast, simple access in one place.

Why Gaining Control of Your #telemetry Data Is a Game Changer

Disconnected pipelines. Unknown data sources. Costs that do not add up. Many teams struggle to answer a simple question. What data do we have and where is it going? In this clip, a Cribl customer explains how bringing all telemetry data together changed everything. With Cribl, their team can finally see what they collect, where it flows, and what it costs. That clarity unlocked smarter reduction, better routing decisions, and major optimization across security and observability workflows.

From Data Lake to Lakehouse. Why Cribl is Preparing for the Agentic #ai Era #telemetry

Customers asked for a simpler way to store and access telemetry data, and Cribl delivered. First came Cribl Lake. Cost effective data storage, flexible access, and identity based authorization instead of infrastructure based access rules. A simple way to retain data at rest and run slow, inexpensive analytics when needed. But the story did not end there.

Canvas Is Now GA: AI-Guided Observability for Modern Teams

When we introduced Canvas in beta, our goal was to reimagine how teams explore and collaborate around their observability data without requiring manual querying. Canvas has quickly become the AI-guided workspace that helps teams transform raw telemetry into meaningful, shared understanding faster than ever before. And today, we’re thrilled to announce that Canvas is now Generally Available (GA) for all Honeycomb users.

The "Meh-trics" Reloaded: Why I Was 100% Wrong About Metrics (and Also 100% Right)

Okay, I'm going to say something that would make 2016 Charity want to throw her laptop across the room: we're making a major investment in metrics at Honeycomb. I know, I know. "But Charity, you literally called them ‘shit salad!’" I did. Also "nerfed dimensions." I said they would "fucking kneecap you." For most of the past decade, I've been social media’s most reliable anti-metrics evangelist. Have I repented? No.

Enhancements to Honeycomb Telemetry Pipeline Deliver Greater Visibility, Smarter Control, and Lower Costs

In July, we introduced powerful new Honeycomb Telemetry Pipeline features that helped teams take control of their observability data with safe sampling, flexible rehydration, and a visual pipeline builder. Since then, we’ve built on that foundation. Today, we’re introducing the latest enhancements to Honeycomb Telemetry Pipeline, which give teams deeper visibility into pipeline health, more efficient access to archived telemetry data, and reduced operational complexity.

Introducing Honeycomb Private Cloud

More and more enterprises are shifting toward private cloud and hybrid deployments for control, data residency, and security. At the same time, observability is no longer a “nice to have” tool. It's mission-critical for teams driving rapid change across cloud-native, multi-service architectures. Leaders are realizing they need deep visibility and rapid debugging everywhere their systems run.

How to Onboard AWS & Azure Hosts in SolarWinds Observability

Connecting your cloud infrastructure has never been easier. In this quick walkthrough, you’ll see how SolarWinds Observability natively integrates with AWS and Azure to onboard virtual machines and supported managed services—fast. Select your hyperscaler Click “Add Data” → Choose “Hosts” Follow simple steps to connect your cloud environment via API Whether you're running AWS EC2, Azure VMs, or other managed services, SolarWinds helps you get visibility in minutes.

Node.js Performance Monitoring Guide

Node.js applications power millions of APIs, microservices, and real-time systems. But without proper monitoring, performance issues, memory leaks, and errors can go undetected until they impact users. This guide explains how to monitor Node.js applications in production, what metrics to track, and which tools deliver the best results.

AlOps - Laying a Strong Foundation with Full-Stack Observability

It is fair to say that AIOps is much more than just a catchy tagline; in fact, it is now a fundamental aspect of every enterprise looking to manage a modern, cloud-native architecture along with a distributed system. As AIOps becomes more widely adopted and organizations start expanding, the amount of logs, metrics and traces becomes too much for role-based tracking and monitoring tools. This is the moment in which full-stack observability tools are needed, providing valuable data that observability AIOps engines rely on for their predictive, proactive, and performance issue detection.

#observability needs more than tools. It needs the right data.

Good observability starts with good data. In this clip, we hear how Cribl gives teams real control over their data pipelines so they can collect, enrich, and route telemetry from any source to the right destination. It is not just about more dashboards or another platform. It is about building an observability ecosystem that connects IT, security, and the business through cleaner data and smarter AIOps. Tool rationalization and AI driven pipelines are not future goals. They are happening right now.

How to Achieve Deep Network Visibility with SolarWinds Observability SaaS

Looking for a faster way to discover every device on your network? This video walks through how SolarWinds Observability automatically scans and classifies network gear—including routers, switches, access points, firewalls, and SD-WAN devices—in seconds. You’ll learn how to: This is the easiest way to get full network visibility without scripts, config files, or manual inventory work.

How to Monitor .NET Applications on Linux with SolarWinds Observability | Step-by-Step Setup

This video provides a step-by-step walkthrough for configuring monitoring for.NET applications running on Linux using SolarWinds Observability. The demonstration covers the full setup process—from adding a new service to verifying the APM library connection. Topics covered in this video include: This guide is intended for developers, system administrators, and DevOps engineers who need to quickly and reliably instrument.NET applications on Linux for performance monitoring and observability.

How to Monitor RabbitMQ

A queue quietly fills up overnight. Memory hits the configured watermark and RabbitMQ blocks all publishers. Your entire message pipeline freezes, and you discover the problem when users start complaining. This scenario repeats across thousands of production systems because teams don't monitor RabbitMQ properly. The broker exposes comprehensive metrics, but most engineers don't know which ones predict failures or how to track them.

What is Network Observability vs. Network Monitoring?

Network observability may be seen as a newer term in the world of networking, but it has become critical for managing modern distributed networks. As networks grow more complex with cloud services, remote workers, and distributed applications, traditional network monitoring approaches no longer provide sufficient visibility into network health and performance.

Bringing Observability to Data

While observability practices have evolved in recent years, they have largely focused on application services and infrastructure. Yet it is data what powers our applications, businesses, and AI models. When data issues occur, the consequences can be far reaching, from poor product experiences to billing errors to misinformed AI outcomes. In this session, Jonathan Morin, Group Product Manager at Datadog, shares real-world examples of incidents and explains how data observability can address them, helping teams detect issues earlier, reduce costly downtime, and restore trust in their data.

Expanding Access, Not Risk: Using the Read-Only Role in Honeycomb Teams

Observability works best when everyone who needs visibility can get it without the risk of unintentional changes. Honeycomb’s role-based access control system helps teams strike that balance with a selection of Owner, Member, and Read-Only member roles. This control gives teams more flexibility in how they share access across their organization, helping you scale visibility safely without sacrificing control.

Elastic named a Leader in the IDC MarketScape: Worldwide Observability Platforms 2025 Vendor Assessment

We're proud to share that Elastic has been named a Leader in the IDC MarketScape: Worldwide Observability Platforms 2025 Vendor Assessment (doc, November 2025). We believe this recognition validates our ongoing mission: to deliver an observability platform that is open, extensible, and AI-driven to power full-stack observability that unifies operational and business data at scale, allowing SRE teams to move from detect and resolve problems faster.

Customer panel: Transforming IT & security

In an era where telemetry data grows at a 28% compound rate while budgets remain flat, traditional IT and Security approaches are facing unprecedented pressure. Join our distinguished customer panel as they share their transformative journeys with Cribl's data engine solutions. Our panelists will discuss how Cribl's vendor-neutral portfolio has enabled them to regain control over their data infrastructure, achieving both immediate operational improvements and strategic long-term advantages.

APM vs Observability: What comes next?

Remember how I said that blog was going to be my last entry on the topic of "APM vs Observability?" Well, it turns out I had a little more to say. I'd like to spend a few moments talking about the future of APM and Observability. I think it comes down to two major initiatives: AI and Open Telemetry. (NOTE: in this section, I'm using the word "observability" to refer to the discipline of monitoring and observability as a whole, rather than any specific tool, technique, or vendor-based solution.)

Understand, diagnose, and optimize SQL queries: Introducing Grafana Cloud Database Observability

It’s widely acknowledged that most application performance problems stem not from the application itself, but from the underlying database. Slow or inefficient database queries are often the primary cause of these issues, acting as the biggest driver of application performance incidents. If you’ve been troubleshooting slow API calls or sluggish services, chances are the root cause likely resides within your database layer.

OpenTelemetry Java Agent for Spring Boot: Complete Setup Guide

The OpenTelemetry Java Agent provides zero-code instrumentation for Spring Boot applications through bytecode manipulation. This guide covers setup, configuration, auto-instrumentation capabilities, and production deployment strategies for implementing distributed tracing and observability.

How OpenTelemetry can enhance observability in distributed systems: Practical examples

Observability has become one of the fundamental elements of performance and reliability as modern applications move toward cloud-native architectures, microservices, and multi-cloud. Traditional monitoring techniques often fall short in such dynamic, distributed environments. That’s where OpenTelemetry (OTel) , an open-source observability framework comes into picture.

Conquer Complexity, Accelerate Resolution with the AI Troubleshooting Agent in Splunk Observability Cloud

The digital landscape has transformed dramatically, and with it, the demands on our systems have grown exponentially. Traditional monitoring tools struggle to provide sufficient insight into complex, distributed, cloud-native environments. Observability is the answer, moving beyond merely knowing "what" is happening to understanding "why" it's happening, and its impact on user experience and business outcomes.

If it Wanted to, it Would: The Bitter Lesson for LLM Users

There’s a viral saying folks use about flaky crushes, spouses, and forgetful friends: "if he wanted to, he would." The idea is straightforward: when someone cares, they make the effort. As it turns out, the same principle applies surprisingly well to AI. Systems, like people, have things they "want" to do. Each model has patterns of reasoning and synthesis it performs naturally.

The Hidden Bottlenecks in AI Infrastructure (and How to Fix Them)

Artificial intelligence has entered an era where infrastructure is the real moat. Teams spend millions on GPUs, yet models still stall, latency spikes unpredictably, and throughput flatlines at 20% of what spec sheets promise. These hidden bottlenecks lurk far beneath the surface - in power grids, network fabrics, memory bandwidth, orchestration layers, and even governance policies. In this guide, we uncover where AI infrastructure actually breaks, what the emerging data and research reveal, and how Clarifai's reasoning and orchestration stack helps eliminate these unseen friction points.

Making Observability AI-Native with the Logz.io MCP Server

Now available: Secure, real-time access to your observability data via Logz.io’s Model Context Protocol (MCP) Server. The Logz.io MCP Server brings your logs, metrics, and telemetry data into the Model Context Protocol (MCP), an emerging open standard that lets AI systems query real data securely and contextually, in real time. That means any MCP-compatible LLM, like Claude Desktop, Cursor, your own AI agent… can now connect directly to your Logz.io environment.

Messaging Infrastructure Is Still in the Dark: The Observability Illusion Costing Millions

In today’s always-on digital world, even the best messaging platforms—like Apache Kafka and Apache ActiveMQ—can become blind spots that undermine resilience. This article exposes the “observability illusion” many organizations face, showing how limited visibility and manual processes lead to outages, high costs, and constant firefighting. Learn how meshIQ transforms reactive operations into proactive engineering through unified observability, automation, and self-service.

Improve Observability in Your CI/CD Pipeline

The backbone of modern software development is automation and at the heart of that lies the CI/CD pipeline. It’s what turns code into deployable software, delivering changes to users faster, safer, and more predictably. In simple terms, a CI/CD pipeline automates everything from the moment developers push code to when it reaches production. It integrates, tests, builds, and deploys software continuously ensuring faster releases with fewer human errors.

What the RFC?! Making sense of syslog before you migrate

Syslog: it's everywhere, it’s ancient, and let’s be honest — it rarely shows up the way the RFC says it should. Before you cut over to Cribl Stream, it pays to understand exactly what you're dealing with and why it matters. In this talk, we’ll demystify the syslog format (yes, the actual RFC 3164 and 5424 stuff), look at what happens when data goes rogue, and explore how Cribl can help bring order to the chaos.

The Modern SOC: Transforming security operations with Al and automation

Security teams are dealing with massive data growth, siloed tools, and constant alert fatigue. All of this makes it harder to detect and respond to threats. AI has become a key part of the solution, but its effectiveness depends on having access to complete, high-quality data. In this session, Palo Alto Networks and Deloitte will explore how AI and automation are redefining the modern Security Operations Center (SOC). Learn how leading organizations are leveraging intelligent workflows, automated threat detection, and machine learning to accelerate response times, reduce analyst fatigue, and strengthen overall security posture.

SIEM Migration in 68 Days

In this session, we will discuss how the University of Pittsburgh was able to modernize their data processing strategy, migrate to a new SIEM solution, and avoid ballooning SIEM costs all within 68 days from the first install of a Cribl product. We will showcase how we were able to use Cribl's software to easily handle the following scenarios: 100% agent replacement and consolidation using Cribl Stream Workers and Edge.

Observability and FedRAMP in Action: The VA's Mission to Deliver Reliable Digital Service

Ensuring digital services remain accessible, reliable, and secure is a high priority for any organization operating at scale. For the Department of Veterans Affairs (VA), this focus is central to its mission of providing quality care to veterans, their families, and caregivers. Often described as “the largest IT shop in the United States,” the VA manages 2.7 million pieces of equipment across a vast network of interconnected systems.

Unify Observability, Surface Business Impact, and Solve Problems Using AI Agents with Latest Splunk Observability Innovations

In September at.conf25, we announced how Splunk is shaping the future of digital resilience in the age of AI. Agentic AI is rewriting what it takes to build a leading observability practice. As vibe coding gains steam, applications will be built with less human involvement. At the same time, the rise of AI agents demands specialized telemetry to ensure models are performing as intended—aligned to their business purpose and cost.

Splunk Advances the OpenTelemetry Project with Its Latest Donation, the OpenTelemetry Injector

Splunk is very excited to be sponsoring Kubecon North America once again, kicking off this week in Atlanta, GA. As many know, Splunk is one of the top contributors to the OpenTelemetry project. We’re happy to have sent many of the Splunkers who serve as project maintainers and contributors to lead SIG meetings and engage with the greater community in the OpenTelemetry Observatory, sponsored by Splunk.
Sponsored Post

Preparing for cloud failures: Monitoring strategies for distributed hybrid infrastructure

When AWS experienced its recent outage, the ripple effect was immediate. Critical workloads slowed, dashboards went blank, and many teams realized multi-cloud isn't automatically resilient. Cloud-level failures are inevitable due to the interdependent components and complex IT architecture. The recent AWS disruption reminded many teams that the cloud isn't a magic uptime guarantee. Even the most mature providers can-and do-experience large-scale service interruptions.

AI Agents Observability with OpenTelemetry and the VictoriaMetrics Stack

Nowadays, AI agents are becoming more and more popular and often deployed as part of production systems. However, this rapid adoption brings unique observability challenges that require flexible solutions. On the one hand, AI agents are fundamentally just like any other software services that produce the same classic observability signals we’re familiar with: metrics, logs, and traces.

From Observability to Network Intelligence: How Kentik Built the Foundation for Networks That Think

The age of dashboards is ending, as observability has only created more noise for network teams to sift through. Kentik SVP of Product, Mav Turner, lays out why true network intelligence requires a clean, contextual data foundation to finally create a network that thinks.

Top Observability Tools for 2026: The Definitive Guide

As we move toward 2026, observability is evolving from an engineering luxury to an operational necessity. Modern applications span microservices, containers, APIs, and data pipelines and when something breaks, users expect instant recovery. That urgency is fueling rapid market growth. According to Market.us, the Global Data Observability Market is projected to reach several billion dollars by 2033, growing at a CAGR exceeding 20% between 2024 and 2033.

From Telemetry to Truth: Why Observability Must Be Service-Centric

Modern enterprises depend on systems that appear calm: dashboards glow, availability reads steady, and metrics suggest composure. But the signals only tell part of the story. Conversion softens at the margins, regional sign-in times drift, a compliance report misses an expected field. The puzzle isn’t visibility; it’s meaning. Components describe status; services carry outcomes.

Observability vs. Monitoring: What's the Difference?

Modern systems are complex, distributed, and fast-changing, so keeping them reliable requires more than watching dashboards. Observability vs. Monitoring explains how teams gain the deep insight needed to detect, diagnose, and resolve issues. Monitoring collects predefined metrics and alerts you to known problems, while observability provides rich, contextual telemetry to investigate unknown failures.

Coffee and Claude: How Honeycomb MCP Makes AI Work for You

If you caught our recent Introducing Honeycomb MCP: Your AI Agent’s New Superpower webinar, you know it was a lively mix of big ideas, demos, and a few laughs about the messy, fast-moving world of AI. Hosted by Austin Parker, Morgante Pell, and James Bland from AWS, the conversation explored how Honeycomb’s new Model Context Protocol (MCP) is changing the way developers and AI agents interact with data.

Observability vs. Monitoring: Key Differences Explained (2026 Guide)

People often get confused between Monitoring and Observability, using the terms interchangeably in DevOps. However, they represent two distinct yet complementary concepts that play a crucial role in ensuring application reliability and performance. As modern applications evolve, over 90% of new digital services are built using microservices and cloud-native architectures. Traditional monitoring alone can’t provide full visibility into distributed systems.