Operations | Monitoring | ITSM | DevOps | Cloud

Debug Logs and Analyze Trends with Log Data Rehydration

Everyone in your organization needs logs to perform the critical functions of their job. Developers need them to debug their applications, security engineers need them to respond to incidents, and support engineers need them to help customers troubleshoot issues. These various use cases create general requirements for enriched log data, often including accessing insights from outside typical retention windows.

Datadog: The Good, The Bad, The Costly

When things break, logs are often the first place you turn to figure out what's going on, which is why Datadog makes it easy to find them. The ability to pivot between traces, metrics, and logs in one place speeds up investigations and helps teams move faster during incidents. That level of correlation is a big reason so many teams rely on Datadog. ‍

Mezmo Recognized with 25 G2 Awards for Spring 2025

We’re thrilled to share that Mezmo has been recognized by G2 with 25 badges across four key categories: Enterprise Monitoring, Log Monitoring, Log Analysis, and Cloud Infrastructure Monitoring. These awards are more than just a celebration of our platform—they’re a reflection of you, our customers. Your feedback, support, and insights push us to build better solutions and deliver the highest standards of performance and service.

Reducing Telemetry Toil with Rapid Pipelining

Intellyx BrainBlog by Jason English for Mezmo ‍ “Bubble bubble, toil and trouble” describes the mysterious process of mixing together log data and metrics from multiple sources as they enter an observability data pipeline. ‍ Customers demand high performance, functionality-rich digital experiences with near-instantaneous response times.

Deployment Tracking with Mezmo Live Streaming Tail

You've deployed a new feature into production. You've done your unit testing, fixed lots of bugs, your code is awesome. Now it's time for hundreds/thousands/millions of users to break...err...use your feature. You're diligent about tracking usage in real-time, and getting customer feedback when something goes wrong. You track the performance and response time impacts on the server. All is good...except...that feature isn't quite working for a specific group of users. Now what?

Petabyte Scale, Gigabyte Costs: Mezmo's Evolution from ElasticSearch to Quickwit

At Mezmo, we handle an enormous volume of telemetry data for our customers and ourselves, requiring a robust and efficient search and analytics backend. For years, ElasticSearch served us well, but as our infrastructure grew to a multi-cluster, multi-petabyte scale, we started to see the cracks—rising costs, performance bottlenecks, and scalability concerns. We needed a change, one that would make our system more cost-effective while maintaining speed and reliability.

How Telemetry Pipelines Save Your Budget

This is an updated version of an earlier blog post to reflect current definitions of a telemetry pipeline and additional capabilities available in Mezmo Our recent blog post about observability pipelines highlighted how they centralize and enable telemetry data actionability. A key benefit of telemetry pipelines is users don't have to compare data sets manually or rely on batch processing to derive insights, which can be done directly while the data is in motion.

AWS re:Invent '24: Generative AI Observability, Platform Engineering, and 99.9995% Availability

I attended Amazon Web Services re:Invent conference. This is AWS's annual user conference, which takes over most of Las Vegas for a week. There’s a lot to do and take in—customer stories galore, new tech, learning different use cases, and all the walking. But you’re here to hear what I learned, so I’ve broken it down into sections. Enjoy!

From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines

Last week, I attended one of the last conferences of the year with team Mezmo: the Gartner IT Infrastructure, Operations & Cloud Strategies Conference in Las Vegas. Not surprisingly, there were over 20 sessions covering observability and how it is getting increasingly critical in the new complex distributed computing environment. Of course, there were many sessions, including all keynotes that addressed the advent and impact of AI on IT operations and observability.