Operations | Monitoring | ITSM | DevOps | Cloud

September 2021

Understanding Lambda Sleep Cycles With CONCURRENCY

It started with a simple question: Why did one query take 10 seconds, while another almost identical query took 5? At Honeycomb, we use AWS Lambda to accelerate our query processing. It mostly works well, but it can be hard to understand and led us to wonder: What was really going on inside this box called Lambda? These questions kicked off the development of CONCURRENCY, a new aggregate in the Query Builder that lets us look at how many spans are active at once.

Log Observability and Log Analytics

Logs play a key role in understanding your system’s performance and health. Good logging practice is also vital to power an observability platform across your system. Monitoring, in general, involves the collection and analysis of logs and other system metrics. Log analysis involves deriving insights from logs, which then feeds into observability. Observability, as we’ve said before, is really the gold standard for knowing everything about your system.

What Is Distributed Tracing?

Modern software development is evolving rapidly, and while the latest innovations allow companies to grow through greater efficiency, there is a cost. Modern architectures are incredibly complex, which can make it challenging to diagnose and rectify performance issues. Once these issues affect customer experience, the consequences can be costly. So, what is the solution? Observability — which provides a visible overview of the big picture.

What Is Data Observability and Why Do You Need It?

The word observability has its root in control theory. R.E. Kálmán in 1960 defined it as a measure of how well you can infer the internal states of a system from knowledge of its external outputs. Observability is such a powerful concept because it allows you to understand the internal state of a system without the complexity of the inner workings. In other words, you can figure out what’s going on just by looking at the output.

Micro Lesson: Introduction to Observability Solution

This video describes what observability is, why we need observability, and how it is different from monitoring. The video also explains how Sumo Logic's Observability Solution helps in all the stages of the incident remediation process to ensure the production apps are functioning reliably.

Extending Observability to App Infrastructure

We know organizations today rely on software applications to drive their digital transformation, providing customers with the tools, features and experience end-users have come to expect when doing things such as transact, work and communicate, to name a few. Ensuring a great user experience, however, means making sure the various elements making up a usable application are running smoothly and reliably.

Observability: The 5-Year Retrospective

Two years ago, I wrote a long retrospective of observability for its third anniversary. It includes a history of instrumentation and telemetry, a detailed explanation of the technical spec, and why the whole “three pillars” thing is nonsense. At the time, it’s what was needed to steer conversations away from silly rabbit holes about data types and back to what matters: how we understand our systems.

Why LogDNA Received the EMA Top 3 Award for Observability Platforms

We’re honored to be included in Enterprise Management Associates’ EMA Top 3 Award for Observability Platforms. This award recognizes software products that help enterprises reach their digital transformation goals by optimizing product quality, time to market, cost, and ability to innovate—all the things we’re passionate about at LogDNA.

Intro to distributed tracing with Tempo, OpenTelemetry, and Grafana Cloud

I’ve spent most of my career working with tech in various forms, and for the last ten years or so, I’ve focused a lot on building, maintaining, and operating robust, reliable systems. This has led me to put a lot of time into researching, evaluating, and implementing different solutions for automatic failure detection, monitoring, and more recently, observability. Before we get started: What is observability?

Unexpected Parallels Between Yoga and Observability

Yoga is to ideal human health what observability is to an application’s ideal functioning. It is well established that observability is a critical factor for the successful implementation and maintenance of cloud-native, serverless, cloud-agnostic, and microservices-based applications. Well-established observability helps DevOps and development teams cross the boundaries of complex systems and get complete visibility into their functioning.

Getting Started with OpenTelemetry and VMware Tanzu Observability

Modern application architectures are complex, typically consisting of hundreds of distributed microservices implemented in different languages and by different teams. As a developer, SRE, or DevOps engineer, you are responsible for the reliability and performance of these complex systems. But while you might have metrics that will help you debug when there’s an issue, metrics alone can’t help you narrow down and ultimately identify the root cause.

The More You Monitor: What Are the Three Pillars of Observability?

A common way to discuss observability is to break it down into three types of telemetry: metrics, traces, and logs. These three data points are often referred to as the three pillars of observability. In this episode of The More You Monitor Product Manager, Chris Sternberg, breaks down the three pillars of observability and how they can help you gain better control and visibility of your infrastructure, applications, and networks. It’s important to remember that although these pillars are key to achieving observability, they are only the telemetry and not the end result.

The Confident Commit | ep. 11 Observability and CI/CD: meaningful measurement with Charity Majors

Rob sits down with Charity Majors to discuss the journey to creating Honeycomb, business building practices, and the importance of proper CI/CD and monitoring. Charity gives us the latest insights on observability and the necessities for engineering team success. What metrics are meaningful for your team to measure? Which ones are not? Tune in today to find out. Watch, learn, and leave us a comment with your thoughts, questions, or ideas for future podcast episodes.

How Refinery Helps With Sampling Complex Event Data

Sampling is the practice of extracting a subset of data from a dataset to make conclusions about that larger dataset. It’s far from a perfect solution, but when it’s implemented with Refinery, Honeycomb’s trace-aware sampling proxy, sampling can help you manage very high volumes of complex event data.

Elastic named EMA Top 3 Award winner in Automatic End-to-End Observability

We are excited to announce that Elastic Observability has earned the Enterprise Management Associates Top 3 Award for Observability in 2021, a recognition of our commitment to empowering customers with products and features that advance digital transformation and solve real-life problems. This award is driven by EMA’s exhaustive, quantitative research into the top challenges and use cases facing developers, DevOps, SREs, IT professionals, and business professionals.

Catchpoint Co-Founders Q&A: What Better Way To Celebrate Our 13th Birthday?

To celebrate our 13th birthday today, I sat down with Catchpoint's co-founders and my friends, Mehdi Daoudi, Chief Executive Officer, Drit Suljoti, Chief Product and Technology Officer, and J. Scotte Barkan, Chief Technology Officer (dialing in from Long Island after a long week of patch fixes), for an informal chat. We looked back to the days when they all met at DoubleClick prior to the three of them (along with Veronica Ellis, now a Principal Engineer at Eventbrite) founding Catchpoint.

No more searching for a needle in a haystack: A world where Elastic & StackState team up

Meeting the goal of delivering great performance and reliability in the face of our ever-changing, increasingly autonomous IT environments is fundamentally challenged by a data problem. Sure, there’s lots of it - logs, metrics, and APM traces - but it is exceedingly hard to extract actionable information when there are so many fast moving parts.

De Watergroep and Devoteam build Elastic Observability pipeline to deliver water to millions

De Watergroep is responsible for the supply of water to more than 3 million customers and hundreds of companies in Belgium. An organisation operating in the public sector, De Watergroep's main goal is to continuously ensure the availability of high-quality drinking water. De Watergroep also is constantly engaged in technological innovation, focusing on keeping distribution costs low, and making maintenance more cost efficient.

Tracing AWS Lambdas with OpenTelemetry and Elastic Observability

Open Telemetry represents an effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries. Recently, OpenTelemetry became a CNCF incubating project, but it already enjoys quite a significant community and vendor support. OpenTelemetry defines itself as “an observability framework for cloud-native software”, although it should be able to cover more than what we know as “cloud-native software”.

Metrics now generally available in Honeycomb

Starting today, Honeycomb Metrics is now generally available to all Enterprise customers. You’ve adopted our event-based observability practices, in part to overcome the debugging roadblocks you hit when using custom metrics to identify application issues. But metrics do still provide value at the systems level. Now, you can easily see and use your metrics data alongside your event data in Honeycomb—all in one interface.

OpenTelemetry - Defining Observability Industry Standards

Plenty of blogs have answered the very Google-able question, “What is OpenTelemetry?” To keep it short and sweet, OpenTelemetry is a collaborative effort across the observability space to create industry-wide standards that will benefit all cloud service providers and observability customers. Technically speaking, OpenTelemetry is a collection of APIs, SDKs, exporters, and collectors.

The Confident Commit ep. 10 | Observability improving speed and reliability with Ben Sigelman

Rob sits down with Lightstep CEO, Ben Sigelman to discuss observability and how it connects with delivering change with confidence. Get answers to questions like: Watch, learn, and leave us a comment with your thoughts, questions, or ideas for future podcast episodes. And don't forget to Like and Subscribe to The Confident Commit Podcast playlist for alerts to new episodes published biweekly. The Confident Commit: A podcast for developers, engineering managers, and business leaders alike to join in the conversation on how to deliver software better and faster.

An Introduction to Distributed Tracing

There’s no strict definition of a distributed system. But generally speaking, if you have reached a point where you’re running more than five interdependent services at once, that means you’re running a distributed system. It also means you are more than likely experiencing difficulties when troubleshooting using traditional debugging tools. Unfortunately, pulling up multiple tools, each built for a monolithic world, doesn’t help pinpoint the problem.