Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

Monitoring Cloud Foundry in SAP Business Technology Platform (BTP)

Cloud Foundry is possibly the most popular environment on SAP Business Technology Platform. When customers build applications with the SAP Cloud Application Programming (CAP) framework to extend SAP S/4HANA solutions and achieve a clean core, they typically deploy using Cloud Foundry. After the applications on Cloud Foundry go into productive use, they become business critical and that creates a need for observability in those applications and the platform. Monitoring of Cloud Foundry is now an essential requirement of SAP operations teams.

The Android Developer's Journey into Hardware Observability

In this article, I walk through how the growth of internal observability tooling for an AOSP device might look like, and the variety of pitfalls one might encounter as they scale from 1s to 10s to 1000s of Android devices in the field, based off my experience talking to AOSP developers and teams, and personally as an Android app developer working on AOSP hardware.

Agentless monitoring for cloud VMs: Simplify scaling and observability

Managing cloud infrastructure is challenging enough without adding the burden of deploying and maintaining monitoring agents. What if there was a simpler, more efficient way to monitor your virtual machines (VMs)? In the first part of this series, we looked at the (link) and presented a better solution: agentless monitoring. Agentless monitoring is an efficient approach to observability that eliminates the need to install and manage software agents on each monitored device.

OpenTelemetry Metrics Explained: A Guide for Engineers

OpenTelemetry (often abbreviated as OTel) is the golden standard observability framework, allowing users to collect, process, and export telemetry data from their systems. OpenTelemetry’s framework is organized into distinct signals, each offering an aspect of observability. Among these signals, OpenTelemetry metrics are crucial in helping engineers understand their systems.

Lakehouse Demo

Cribl Lakehouse is the first lakehouse built for the unpredictable nature of telemetry data. Unlike traditional solutions for structured data, it eliminates schema complexity and manual transformation while delivering elastic scalability, automated, ​​cost-optimized tiered storage, and federated queries across diverse datasets. IT and security teams can effortlessly store and analyze massive volumes of evolving telemetry data in real time—without data engineering expertise—unlocking the full value of their data with a unified, management experience.

The One Where We Meet Cribl Copilot

We’re kicking off our new live weekly product demo series—streaming on YouTube, X, and LinkedIn! Each week, we’ll dive into the latest features and hidden gems from the Cribl Suite of tools to help you unlock the full potential of your telemetry data. For our first session, we’re thrilled to welcome Nikhil Mungel, the visionary behind Cribl Copilot. This AI-powered assistant is designed to: Instantly surface answers from the documentation Build pipelines with just a simple request.

How to Build Observability into Chaos Engineering

If you've ever deployed a distributed system at scale, you know things break—often in ways you never expected. That’s where Chaos Engineering comes in. But running chaos experiments without robust observability is like debugging blindfolded. This guide will walk you through how observability empowers Chaos Engineering, ensuring that your experiments yield meaningful insights instead of just causing chaos for chaos’ sake.

OpenTelemetry Is Not "Three Pillars"

OpenTelemetry is a big, big project. It’s so big, in fact, that it can be hard to know what part you’re talking about when you’re talking about it! One particular critique I’ve seen going around recently, though, is about how OpenTelemetry is just ‘three pillars’ all over again. Reader, this could not be further from the truth, and I want to spend some time on why.

Optimizing Observability Data Volume and Cost with AI

Struggling with high observability costs? In this video, Jade Lassery breaks down the challenges of managing excessive data and skyrocketing expenses. She introduces the Logz.io AI agent, a powerful solution designed to optimize data usage, reduce unnecessary costs, and improve efficiency. Learn how to take control of your observability spending while maintaining high performance. Watch now to discover smarter data management strategies!

Increase control and reduce noise in your AWS logs using Datadog Observability Pipelines

Today’s SRE and security operations center (SOC) teams often find themselves overwhelmed by the sheer volume and variety of logs generated by critical AWS services such as VPC Flow Logs, AWS WAF, and Amazon CloudFront. While these logs can be valuable for detecting and investigating security threats, as well as troubleshooting issues in your environment, managing them at scale can be challenging and costly.

OpenTelemetry: The Future of Observability with Advanced Tracing and Metrics

Hey there! Oscar here. After spending countless hours wrestling with various monitoring tools and proprietary solutions, I wanted to share my thoughts on what I believe is revolutionizing the observability landscape: OpenTelemetry (OTel). OpenTelemetry revolutionizes observability in distributed systems.

Integrating OpenTelemetry with Grafana for Better Observability

Modern application observability is essential for ensuring system performance, diagnosing issues, and optimizing user experiences. OpenTelemetry (Otel) and Grafana serve as two key components in achieving end-to-end visibility. While OpenTelemetry focuses on instrumenting applications to collect telemetry data, Grafana specializes in visualizing this data, making it actionable and insightful.

Drilldown apps: An improved queryless experience for faster insights into your observability data

See how we're improving the apps to help you quickly get insights into your logs, metrics, traces, and profiles, and find out why we changed the name from Explore apps to Drilldown. Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more. We also have plans for every use case.

Enhance Network Performance Management With Next-Gen AIOps: Configuring Integration of DX Spectrum With DX Operational Observability

To unlock the power of observability and advanced analytics of AIOps, teams need to collect exceptional monitoring data, establish connections and correlations between the data, and understand context with the help of robust and current topological maps. Because modern networks often span on-premises, cloud, and hybrid infrastructures, monitoring their performance and troubleshooting issues can be difficult. These complex infrastructures often lead to observability gaps for network teams.

KubeCon 2024 | Interviews with Observability Experts | Observability Insights with Aunsh Chaudhari

In this interview from KubeCon 2024, I sit down with Aunsh Chaudhari, a Product Manager at Splunk, to discuss the biggest trends shaping observability today. With a background in software development and hands-on experience with observability tools, Aunsh shares insights on OpenTelemetry adoption, cost optimization strategies, and the shift toward unified observability. We also touch on emerging topics like AI in observability and the challenges of scaling observability in modern environments.

Slicing Up-and Iterating on-SLOs

One of the main pieces of advice about Service Level Objectives (SLOs) is that they should focus on the user experience. Invariably, this leads to people further down the stack asking, “But how do I make my work fit the users?”—to which the answer is to redefine what we mean by “user.” In the end, a user is anyone who uses whatever it is you’re measuring.

Stronger together: (Agentic) AIOps and observability are the keys to IT resilience

Every new layer of infrastructure piles onto an already fragile web of interconnected challenges, making it painfully clear: traditional monitoring can’t keep up. You’re drowning in alerts, buried in data, and yet somehow still flying blind when real issues arise. More notifications don’t mean more insight, and more data doesn’t guarantee better decisions.

Wiring Up a Next.js Self-Hosted Application to Honeycomb

Are you attempting to connect Honeycomb to a standalone (not hosted with Vercel) Next.js application? Most of the Next.js OpenTelemetry samples in the wild show how to connect Next.js to Vercel’s observability solution when hosting on their platform. But what if you’re hosting your own standalone Next.js server on Node.js? This blog post will get you started ingesting your Next.js application’s telemetry into Honeycomb.

Preempting Problems in a Sociotechnical System

Here at Honeycomb, we emphasize that organizations are sociotechnical systems. At a high level, that means that “wet-brained” people and the stuff they do is irreducible to “dry-brained” computations. That cashes out as the inability to ultimately remove or replace people in organizations with computers, in spite of what artificial general intelligence (AGI) ideologues would have you believe.

The ROI of Developer-First Observability: Why It's a Game Changer

In today’s fast-paced software landscape, downtime is costly, debugging is time-consuming, and developers are constantly under pressure to resolve issues quickly. Observability tools have traditionally been built for operations and SRE teams, focusing on post-mortem analysis rather than proactive debugging. When developers gain real-time insights into live applications and fix issues without disrupting the software lifecycle it has been proven to be a game changer for a myriad of reasons.

Understanding the Observability Data Lifecycle: From Data Ingestion to Automated Actions

Modern IT estates are increasingly complex, generating vast amounts of data – some critical and actionable, but much of it mere noise. Extracting meaningful insights to ensure optimal system health and IT performance is beyond the scope of humans. This is where observability, enhanced by AI and automation, becomes essential.

Right Data, Right Now: Why Timely, Actionable Network Observability is Essential

For teams in many organizations, the work of IT and network management keeps getting more difficult. A recent EMA survey offers some findings that clearly illustrate this point. When respondents were asked which networking skills are the most difficult to find, several roles received a response of 30% or more, including network security, network monitoring and troubleshooting, and data center networking.

Ensuring Optimal Kubernetes Cluster Health with Calico Observability

Have you ever wondered how to navigate the complexities of managing Kubernetes clusters effectively? Observability is the key, and Elasticsearch plays a pivotal role in storing and analyzing the critical data that keeps your systems running smoothly.

Solve Problems Faster with New, Smarter AI and Integrations in Splunk Observability

As businesses scale across hybrid and multi-cloud environments and integrate AI-powered technologies, complexity grows — and with it, the risk of performance degradation and cost of downtime. To avoid facing customer-impacting IT issues, organizations need better ways to correlate data across environments, detect anomalies before they escalate, and resolve incidents more efficiently. That’s where Splunk and Cisco come in.

Datadog Vs. New Relic: Comparing Observability Tools In 2025

Datadog and New Relic didn’t become some of the best observability platforms today by accident. Unlike traditional monitoring tools, both are built from the ground up to be cloud-native. This design is crucial for tracking system health across hybrid cloud infrastructure, modern applications, and microservices/containerized architectures. Both platforms also offer more flexible pricing models than the traditional subscription-based pricing you’ll see elsewhere.

Stop Logging the Request Body!

With more and more people adopting OpenTelemetry and specifically using the tracing signal, I’ve seen an uptick in people wanting to add the entire request and response body as an attribute. This isn’t ideal, as it wasn’t when people were logging the body as text logs. In this blog post, I’ll explain why this is a bad idea, what are the pitfalls, and more importantly, what you should do instead.

From Datadog to Grafana Cloud: Why companies migrate and how it changes business for the better

“Impossibly expensive.”“Generic database metrics.”“Exceeding limits.”“No transparency.” These are the words our customers use to explain why they looked for a Datadog alternative and migrated onto Grafana Labs’ observability solutions. Grafana Cloud provided the scalability that LexisNexis Risk Solutions needed to migrate acquired companies into a unified observability platform. “We’ve had migrations from Datadog.

Coralogix Releases eBPF Observability for K8s Workloads

There are several big barriers to an effective tracing strategy. Modern applications require complex code instrumentation, and legacy applications might not be so easy to alter, and that’s assuming every engineering team can be engaged to make the necessary changes. eBPF & OpenTelemetry flip this entire problem on its head, and Coralogix is one of the first major observability platforms to leverage this exciting functionality, to provide an unobtrusive, low risk overview of your system.

How Azure Observability Optimizes Performance and Monitoring

Observability in Azure isn’t just about tracking metrics—it’s about truly understanding how your cloud infrastructure, applications, and services are performing. It helps you spot issues before they become problems, optimize performance, and ensure security. In this guide, we’ll break down Azure Observability in a way that’s easy to follow, covering key concepts, best practices, and some useful tricks to give you an edge.

Frontend Monitoring: Deliver Seamless and Performant User Experiences

88% of online consumers are less likely to return to a site after a bad user experience. This means that addressing frontend issues such as slow load times, broken features, and unresponsive elements is crucial. Frontend monitoring helps development and IT teams proactively catch and resolve these issues to improve their user experience.

Why observability needs FinOps, and vice versa: the Vantage integration with Grafana Cloud

Ben Schaechter is co-founder & CEO of Vantage, a cloud cost management platform that provides actionable insights for every engineer. Observability tools have changed the way we monitor infrastructure and applications, as teams get complete visibility into performance across complex, multi-cloud environments. But as all that infrastructure scales, costs rise with it, and organizations are left to ask: Where are my costs going—and why?

Beyond monitoring: The power of observability

The demand for seamless user experiences and robust system reliability is at an all-time high, and businesses are racing to meet these expectations. But as system complexity increases, traditional monitoring tools are falling short. Observability offers a paradigm shift. It goes beyond tracking metrics and provides deep insights to understand the “why” behind system behavior by parsing and contextualizing unstructured data.

Why Observability 2.0 Is Such a Gamechanger

One of the hardest parts of my job is to get people to appreciate just how much of a difference Honeycomb/observability 2.0 is compared to their current way of working. It’s not just a small step up or a linear improvement. Rather, it’s an entire step change in the way that you write, deploy, and operate software for your customers.

How to Optimize Costs and Strengthen IT with Teneo's Deep Observability

Teneo understands that it can be hard to balance cost and depth of observability in todays fast-paced digital landscape, where organizations face the challenge of managing increasingly complex IT infrastructures while keeping costs under control. Achieving this balance requires a new approach, this is why we have developed our Open Observability platform, a critical component of Teneo’s StreamlineX framework.

Kubernetes Monitoring and Alerting Made Easy with Splunk Observability Cloud and OpenTelemetry

In this video, I'll show you how to quickly setup monitoring and alerting for your Kubernetes clusters using Splunk Observability Cloud. We’ll start by deploying the Splunk OpenTelemetry Collector using Helm, and then use the Kubernetes Navigator inside Splunk Observability Cloud to view the health of our cluster and the applications it’s hosting. I’ll demonstrate AutoDetect detectors and alerts by intentionally triggering an issue in the cluster and walk through the alerting process. We’ll review the alerts in Splunk Observability Cloud and then resolve the issue in the cluster.

Keeping Spending in Check: Observability's Positive Impact on Cost Management

Tool sprawl within organizations doesn’t just create a fragmented user experience; it poses a real threat to enterprises’ bottom lines. Consider these statistics: This fragmentation significantly limits worker productivity. IT leaders spend hundreds of hours trying to manage multiple tools, map their environments, and upkeep aging systems that are either outdated or simply no longer necessary.

Kentik - Cloud Observability

Kentik Cloud provides comprehensive visibility across all major public clouds, offering seamless insight into cloud-to-on-prem network paths and the public internet routes connecting them. Identify latency, loss, jitter, and application-specific traffic while providing deep visibility into cloud networking constructs like ACLs to spot security issues. With powerful analytics, Kentik Cloud enables you to visualize intra-cloud traffic, identify idle resources for optimization, and leverage historical data to uncover trends and seasonal patterns—ensuring optimal cloud performance and cost efficiency.

Streamlining Telemetry with Apica's Fleet Management Solution: A Deep Dive

In the rapidly evolving IT environment, observability at scale has become a critical challenge for organizations aiming to maintain operational excellence. The proliferation of telemetry collection agents across diverse infrastructures often increases complexity, resource strain, and configuration inconsistencies.

Booking.com's Journey to Enhanced Observability

Since its early startup beginnings in Amsterdam, Booking.com has redefined the travel industry, establishing itself as a premier platform for millions of travelers worldwide. With over 28 million accommodation listings and a staggering 1.5 million room nights booked every day, Booking.com operates on a scale that demands a robust and constantly monitored infrastructure.