Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring database exposure on Kubernetes and VMs

This week, security researchers at Wiz published a report about an internal database at DeepSeek being exposed to the internet. This kind of security risk is surprisingly common and can affect any company. The only way to prevent it is through continuous monitoring. But in modern infrastructures, services can be exposed in many different ways, making detection tricky. At Coroot, we realized that the telemetry data we already collect can help identify these risks — without requiring any extra setup.

Catching Up With Fender: How Frontend Observability Powers Better User Experiences

For years, Fender Musical Instruments has been synonymous with iconic guitars and amplifiers. But in recent years, the company has expanded its legacy into the digital realm, offering tools like Fender Play, an innovative learning platform for aspiring musicians. Behind this digital evolution lies a focus on delivering exceptional user experiences for its consumer-facing applications—a mission supported by Honeycomb for Frontend Observability.

Realizing the business value of OpenTelemetry-native observability

Transform your organization's observability strategy with open standards and simplified data collection Modern organizations face an unprecedented observability challenge. As systems grow more complex and distributed, traditional monitoring approaches are struggling to keep pace. With data volumes doubling every two years and systems spanning multiple clouds and technologies, organizations need a new approach to maintain visibility into their operations.

Using AI for Troubleshooting: OpenAI vs DeepSeek

AI is now a go-to tool for everything from writing to coding. Modern LLMs are so powerful that, with the right prompt and a few adjustments, they can handle tasks almost effortlessly. At Coroot, we’ve been experimenting with AI for observability. Our goal is to make it useful in the final stage of troubleshooting—when we’ve already identified which service is causing issues, like Postgres, but finding the exact root cause is still tricky due to the many possible scenarios.

The power of cloud native observability

Unstructured data clouding your observability goals? Learn why monitoring alone cannot solve business-critical performance issues as Sr. Director of Technical Marketing Adam White explains how combining structured and unstructured data with real-time analytics unlocks dynamic insights into root cause analysis and performance management in the cloud.

Why Data Tiering is Critical for Modern Security and Observability Teams

In today's digital landscape, security and observability teams face an unprecedented challenge: managing massive volumes of data while maintaining both performance and cost-effectiveness. As organizations generate more data than ever before, the traditional approach of storing everything in high-performance, expensive systems is becoming unsustainable. How will your team evolve how it manages and uses telemetry data across the enterprise?

Learn How Network Observability Can Help Your Organization to Be DORA Compliant

We recently worked on an RFP for a customer whose primary driver was compliance with the new Digital Operational Resilience Act (DORA) regulations. The project aimed to make financial services more reliable and secure, protecting both consumers and the technology provider. Helping with this RFP was a rewarding learning experience due to this effort’s high priority and the key challenges faced by this organization.

Top 10 Modern Observability Best Practices

In the realm of modern software development practices, observability is no longer an optional add-on. It is a mission-critical capability. Like how control theory revolutionized industrial systems, and quality assurance redefined manufacturing processes, observability transforms the software systems and their development processes in many ways inspired by the brick-and-mortar industries. This post explores the best practices in modern observability to help you leverage its full potential.

Restructuring How We Think About Alerts

Back in Alerts Are Fundamentally Messy, I made the point that the events we monitor are often fuzzy and uncertain. To make a distinction between what is valid or invalid as an event, context is needed, and since context doesn’t tend to exist within a metric, humans go around and validate alerts to add it. As such, humans are part of the alerting loop, and alerts can be framed as devices used to redirect our attention. In this post, I want to drive this concept a bit further.

The Future and The Floor: Framing Investments for Growth

There are a limited number of investments that a team can make in any given year and it can be daunting to choose the “right” ones. In R&D, there is always more to do. There is always more to research, design, build, fix, maintain, and improve. Spread across multiple domains, the possibilities multiply: we’re spoiled for choice—and, while inspiring, the breadth of possible investment areas can be overwhelming.

Top 5 Obstacles to Observability in 2025

I’ve spent over 25 years in tech product marketing and customer support, working with pioneering companies like Dell/EMC, Apple, Keeper Security, and now, SolarWinds. In my current role, I’ve had the privilege of helping organizations of all sizes achieve comprehensive observability in their IT environments. I’ve also witnessed firsthand the challenges that can arise on this journey.

AI in Observability: Mapping Root Causes with Precision

Explore how AI is transforming observability by mapping system connections and uncovering root causes with precision. The Logz.io AI Agent analyzes logs, metrics, and service dependencies to provide actionable insights without the need to sift through overwhelming amounts of data.

Coroot v1.7: Monitoring ClickHouse and Zookeeper with eBPF

At Coroot, we started using eBPF to give users insights into their system performance without needing them to change code or redeploy services. This approach not only makes setup easier but also ensures full visibility, even for third-party and legacy services. To truly achieve this, though, the tool needs to support a wide range of application protocols. Coroot has long supported popular ones like HTTP, gRPC, Postgres, MySQL, Redis, Memcached, MongoDB, Kafka, and Cassandra.

KubeCon 2024 | Interviews with Observability Experts | Observability Insights with Josh Lee

Join me at KubeCon 2024 as I sit down with Josh Lee, Developer Advocate at Altinity, to discuss the latest trends, challenges, and insights in observability. In this interview, we cover key topics such as OpenTelemetry adoption (including the Open Agent Management Protocol), data sovereignty, standardization through semantic conventions, and the need to unify observability tooling across organizations.

Fast-Track Kubernetes Observability with Logz.io and OpenTelemetry: A quick getting started guide

In formal terms, OpenTelemetry is an open source framework used for instrumenting, generating, collecting, and exporting telemetry data for applications, services, and infrastructure. It provides vendor-neutral tools, SDKs and APIs for generating, collecting, and exporting telemetry data such as traces, metrics, and logs to any observability backend, including both open source and commercial tools.

Top Dynatrace Competitors and Alternatives for Modern Observability in 2025

Observability tools are crucial for maintaining the seamless performance and reliability of systems. Dynatrace has been one of the leading solutions for monitoring and observability over the past few years. However, there are many alternatives that provide similar features, often at more accessible price points and with unique capabilities. In this article, we will explore the best Dynatrace alternatives for 2025 to help you find the right fit for your organization.

SolarWinds Network and Infrastructure Observability

SolarWinds observability helps IT teams gain complete visibility across on-prem and cloud environments. Monitor everything from physical servers to AWS, Azure, and Kubernetes with real-time insights and traffic flow analysis. Quickly identify and resolve issues to optimize performance, simplify workflows, and reduce downtime. Get the unified visibility you need with SolarWinds—wherever you need IT.

New Relic Cost Optimization: 9 Surefire Ways To Cut Your Observability Costs

New Relic has established itself as a top observability platform with full-stack monitoring. Unifying all telemetry data — metrics, events, logs, and traces — into one platform delivers deep performance insights and enables faster troubleshooting without juggling multiple tools. Also, New Relic prioritizes developers with tools like CodeStream, integrating error details and telemetry directly into the IDE.

AIOps: Prove It!

I’ve read a steadily increasing stream of articles about using AI in SRE, and I have yet to find one that inspires my trust. Each article makes impressive claims about the capabilities of AI and the way it can be applied to SRE tasks, but the vast majority are light on details. AI tools, and especially LLMs, are growing incredibly quickly, and I feel that these tools have a ton of potential.

Understanding Observability, Monitoring, and Telemetry Differences

In the area of IT infrastructure management, three terms often surface: observability, monitoring, and telemetry. These concepts, while interconnected, each play a unique role in maintaining system health and performance. Observability, monitoring, and telemetry form the backbone of any robust IT environment. Yet, their differences and interrelations can sometimes blur, leading to confusion. This article aims to demystify these terms, providing clarity on their distinct roles and how they work together.

Open source log management tools in 2025

Log management tools provide visibility into the performance and behavior of systems, applications, networks, and infrastructure components. By collecting and analyzing logs, you can monitor for anomalies, track trends, and identify potential issues before they escalate. Choosing the right log management solution requires careful consideration of several factors to ensure that it meets your specific needs and goals. Here are the most popular open source log management tools to help you choose.

Microservices Aren't the Goal: What we Check Before Splitting a Monolith

Most "we should move to microservices" conversations start as architecture debates, but they're almost always driven by operational pain. Releases feel fragile. Incidents take longer to diagnose. Scaling one busy area means scaling everything. Coordination costs grow faster than the product. Over time, we've learned to treat microservices as a tool that you pick to remove a specific constraint, not as a badge of maturity. The most useful starting question is blunt: what outcome is the current architecture blocking today, and is distribution really the cheapest way to unlock it?

Top 6 Open-Source Jaeger Alternatives [comparison 2025]

Jaeger, a renowned distributed tracing system, has been a trusted companion for developers and operations teams seeking to unravel the complexities of microservices architectures. However, as the landscape continues to evolve, the time has come to explore Jaeger alternatives that offer distinct features and advantages.

The Evolution of Observability: From StatsD to OpenTelemetry and Beyond

Observability has evolved from simple system monitoring to a comprehensive discipline, blending metrics, logs, and traces into unified insights. Today, it is the backbone of modern infrastructure management and application performance optimization. As we move forward, the integration of AI and security into observability platforms is shaping the future, making them more proactive, intelligent, and robust.

Golang Monitoring using OpenTelemetry

When it comes to monitoring Golang applications, there are various tools and practices you can use to gain insights into your application's performance, resource usage, and potential issues. By using OpenTelemetry for monitoring in your Go applications, you can gain valuable insights into the behavior, performance, and resource utilization of your distributed systems, allowing you to troubleshoot issues, optimize performance, and improve the overall reliability of your software.

Chaos testing a Postgres cluster managed by CloudNativePG

As more organizations move their databases to cloud-native environments, effectively managing and monitoring these systems becomes crucial. According to Coroot’s anonymous usage statistics, 64% of projects use PostgreSQL, making it the most popular RDBMS among our users, compared to 14% using MySQL. This is not surprising since it is also the most widely used open-source database worldwide.

Implementing High-Cardinality Instrumentation in Frontend Apps

As the Product Manager for Honeycomb’s new frontend product, Honeycomb for Frontend Observability, I’ve had the joy this past year of speaking to dozens of frontend engineering teams about observability. Many frontend teams come from worlds where they either rely on QA and customer reports to identify issues in production, or they use real use monitoring (RUM) and error monitoring tools to catch the most egregious issues.

The importance of understanding and observing an application's middle-tier components

Just like how the filling makes a sandwich, an application's performance is closely tied to how effectively its middle-tier components function. While the front-end is what users see and interact with (UI), and the back-end deals with data storage, the middle tier forms the vital core where the real magic happens—processing, logic implementation, and enforcement of business rules.

How to Use Static Thresholds for Effective Alerts in Splunk Observability Cloud

In this video, we explore the concept of static thresholds, which are a foundational tool in your observability alerting solution. You’ll learn: Additionally, we will demonstrate static thresholds in Splunk Observability Cloud. We’ll configure a static threshold for AWS EC2 memory utilization. We’ll also look at additional threshold settings like trigger sensitivity and duration. By the end of this video, you'll have the knowledge to effectively incorporate static thresholds into your observability strategy.

DataDog vs Prometheus [2025 comparison]

DataDog and Prometheus are both popular monitoring solutions used to collect and analyze metrics and monitor the performance of systems, but Prometheus is open source and Datadog is proprietary. Datadog provides a unified platform for monitoring, troubleshooting, and optimizing modern cloud-native applications and infrastructure. Prometheus is the most popular tool for monitoring time series metrics. So, how to choose between Datadog and Prometheus?

How Telemetry Pipelines Save Your Budget

This is an updated version of an earlier blog post to reflect current definitions of a telemetry pipeline and additional capabilities available in Mezmo Our recent blog post about observability pipelines highlighted how they centralize and enable telemetry data actionability. A key benefit of telemetry pipelines is users don't have to compare data sets manually or rely on batch processing to derive insights, which can be done directly while the data is in motion.

Enrich your on-call experience with observability data at your fingertips by using Datadog On-Call

The stress, sudden disruptions, and high stakes of resolving issues while on call is one of the most challenging aspects of an engineer’s job. Many organizations, from startups to large enterprises, still struggle with their on-call experience, which leads to longer resolution times and lower employee retention rates. Constant context switching, managing multiple tools, and racing against time to resolve issues can cause frustration, burnout, and inefficiency.

The Most Important Developer Productivity Metric

We love to talk about the value of observability in accelerating feedback loops by enabling teams to understand what changes they need to make to software. But a barrier that often holds teams back from completing the feedback loop is how long it takes to actually get feedback on code under development, or push code into production.

Optimize Observability and Cut Costs Without Losing Insights | What is Adaptive Telemetry? | Grafana

Managing telemetry can quickly spiral out of control, leading to ballooning costs and overwhelming data volumes. But what if you could save time, reduce costs, and maintain the critical insights your team relies on? In this video, learn how Adaptive Telemetry helps you: Sign up for a free Grafana Cloud account today and unlock the potential of distributed tracing in your performance testing workflow.

Top 13 Splunk Alternatives in 2025: From Open Source to Enterprise Solutions

Splunk is a powerful tool for data analysis and monitoring, but its high costs and complex implementation can be challenging for many organizations. Here are 13 proven Splunk alternatives that provide robust monitoring capabilities, comprehensive data analysis, and more cost-effective solutions for organizations of all sizes.

Essential Observability with Coroot

There is a phenomenal amount of Observability tools on the market, coming in all shapes and sizes, offering many tools and approaches to solve what seems to be an endless number of problems. It also can be overwhelming to use, hard to set up and expensive to run, especially if you are going with SaaS based market leaders like DataDog.

Faster Fixes, Happier Customers: Gearset Leverages Honeycomb for Success

Gearset has been revolutionizing Salesforce DevOps since its founding in 2015. The Cambridge-based team set out with a clear mission: to make Salesforce deployments simpler, faster, and more reliable for every team. Today, Gearset’s powerful product suite is trusted by over 2,500 companies worldwide to deploy metadata, automate CI/CD pipelines, seed sandboxes, and secure critical customer data.
Sponsored Post

Python Observability : A Complete Guide

Observability is a critical element of modern software development, unlocking awareness across complex and distributed systems with ease. This has allowed developers to monitor, understand, and debug their applications effectively, leveraging existing resources for more efficient lifecycle management and iteration. In the context of Python, observability is an engine for boosting and maintaining the performance, reliability, and stability of the implementation. In this guide, we're going to look at the key aspects of building and deploying Python observability, the importance of this process, and the tools available to implement it.

Why Observability Needs AI: Revolutionizing Monitoring for Modern Complex Systems

In this insightful talk, Asaf Yigal, Co-founder and VP of Product at Logz.io, shares the turning point in observability: addressing the growing complexity of modern environments with AI-driven solutions. From Kubernetes to multi-cloud infrastructures, traditional observability tools fall short in solving complex problems. Discover how Logz.io leverages artificial intelligence to simplify monitoring, enhance troubleshooting, and revolutionize how companies tackle observability challenges. Learn why smarter, AI-powered tools are the future of observability.

Introducing GenAI for Observability: Root Cause Analysis Made Easy

Discover how Logz.io is transforming observability with GenAI, enabling you to troubleshoot complex problems and optimize cloud configurations effortlessly. In this video, we showcase how GenAI leverages your data to perform advanced root cause analysis, automating the process of identifying and resolving exceptions in modern, complex environments. Learn how GenAI analyzes deployment changes, workload patterns, and configuration updates to provide a detailed report in under a minute. Say goodbye to manual troubleshooting and hello to smarter, AI-powered insights.

The Future of Observability: Embracing Change with AI-Driven Insights

Discover how AI is revolutionizing observability and transforming the way we work. In this insightful talk, we explore the parallels between the adoption of Google search and the shift toward natural language-driven observability. Learn why outdated methods like manual graphs, alerts, and extensive data storage are becoming obsolete. It’s time to embrace change, ask questions naturally, and get the answers you need—effortlessly.

Guide to Data Observability

The way we manage, qualify, and utilize our data is constantly tested. With the amount of information we have at our disposal, managing and ensuring data quality has become a strategic lever for companies striving for excellence. How can we ensure our data management is flawless and the data quality on which we base our decisions is optimal? This is where data observability becomes an essential component.

Using GitHub Copilot to Speed Up Your Development Workflow

As a software engineer, I’m always evaluating tools and technologies that can optimize my workflow. Developer productivity isn’t just about writing more code—it’s about reducing friction, whether that’s context-switching, making repetitive edits, or understanding unfamiliar parts of a codebase. That’s where GitHub Copilot comes in: making tasks that once felt monotonous or time-consuming into faster, more intuitive processes.

Getting started with Coroot: Concepts and Terminology

When you build software, its terminology, concepts and relationship between them is quite obvious to you, when you’re starting to use software built by someone else – might not be so much so. In this blog post I tried to cover most important Coroot concepts and terminology – reading it will hopefully help you to understand Coroot much better if you’re just starting up with it.

Beyond the hype: Is a 10x leap in efficiency possible with AIOps in IT observability?

Now that AI has revolutionized IT forever, what are its implication on IT observability? Typically, IT operations, SREs, and DevOps professionals use IT observability to gain a holistic view of their IT infrastructure. In that pursuit, they used AIOps in several ways. Now, AI has helped IT observability with better anomaly detection, faster root cause analysis, and proactively identifying opportunities to dynamically scale IT to ensure uptime, performance, and security.

Is Datadog Worth the Price? An In-Depth Cost Analysis

Datadog has established itself as one of the leading solutions for monitoring, logging, and analytics. But with the increasing number of alternatives available, many businesses are asking, "Is Datadog worth the price?" This article breaks down Datadog's pricing structure, the value of its features, and compares it to competitive alternatives. By the end, you'll have a clear understanding of whether Datadog is the right fit for your business.

Structured Logging Best Practices: Implementation Guide with Examples

In structured logging, log messages are broken down into key-value pairs, making it easier to search, filter, and analyze logs. This is in contrast to traditional logging, which usually consists of unstructured text that is difficult to parse and analyze.

What's That Collector Doing?

The Collector is one of many tools that the OpenTelemetry project provides end users to use in their observability journey. It is a powerful mechanism that can help you collect telemetry in your infrastructure and it is a key component of a telemetry pipeline. The Collector helps you better understand what your systems are doing—but who watches the Collector? Let’s look at how we can understand the Collector by looking at all the signals it’s emitting.

The three pillars of observability

Do you feel you’re always playing catch-up with incidents? If so, you’re not alone. As IT environments become more complex, alerts keep piling up, and finding the root cause feels like searching for a needle in a haystack. And ITOps and incident responders are left scratching their heads and wondering: what went wrong? It can be frustrating when you don’t have end-to-end visibility into your systems. This is where observability comes in.

Observability Insights From KubeCon 2024 - Summary

In this video, I’m breaking down the biggest themes and key takeaways from KubeCon 2024’s observability sessions. From OpenTelemetry’s growing role as the standard for telemetry data to how AI and continuous profiling are shaping the future of proactive, scalable and cost-effective observability. If you missed KubeCon 2024 or want to stay on top of observability trends, this recap will get you up to speed in just a few minutes.

Application Performance Monitoring (APM) Guide for DevOps Teams in 2025

In today's rapidly evolving technology landscape, Application Performance Monitoring (APM) has become a critical component for DevOps teams striving to maintain high-performing, reliable applications. This comprehensive guide explores everything modern DevOps teams need to know about implementing and optimizing their APM strategy.

Monitoring Windows Servers With the OpenTelemetry Collector

This post was written by Martin Thwaites and Vivian Lobo. The OpenTelemetry Collector is an exceptional solution for proxying and enhancing telemetry, but it’s also great for generating telemetry from machines too. In this post, we’ll go through a basic, opinionated setup of using the OpenTelemetry Collector to extract metrics and logs from a Windows server.

Simplifying Java One Liners (Lambda Expressions) Debugging with Lightrun

In Java programming, lambda expressions or Java one-liners have become widely adopted practices for writing concise and expressive code. These compact, anonymous functions introduce functional programming concepts to Java, streamlining operations on collections, simplifying data manipulation, and enhancing code readability. Introduced in Java 8, lambda expressions are designed to represent blocks of executable code.

Top 10 DigitalOcean Alternatives to Consider in 2025

The 2025 cloud computing landscape presents a diverse array of options beyond DigitalOcean's familiar waters. As businesses outgrow basic cloud solutions, they're discovering platforms that better match their evolving needs. From startups seeking cost-effective scaling to enterprises demanding robust security features, today's cloud providers offer specialized solutions for every use case.