You know that observability plays a crucial role in helping to manage today’s distributed, cloud-native, microservices-based applications. But you may be surprised to learn that – despite its close association with modern applications – observability as a concept was born more than a half-century ago. Its origins stretch all the way back to the late 1950s, long before anyone was talking about microservices and the cloud.
Cisco continues investment in its Full-Stack Observability strategy with intent to acquire Opsani.
Should we build and run the full observability stack in pre-prod? How much realism vs. waiting for prod? Answer: Yes. You absolutely want observability in pre-production environments—local, dev, performance test, staging, CI, everywhere.
The year is over, and the word ‘Observability’ has been one of the buzzwords that kept everyone checking throughout the year for deserving reasons. The organizations do not want to leave any stone unturned to maintain performance and offer robust services from ‘monitoring’ practices to ‘observability’, ‘telemetry’, and visibility capacities. So let’s get into the meaning of each term and understand how they are vital for business growth.
A new year is a chance to have a new start, and one thing that it’s a great opportunity to think about is the monitoring and observability platform you’re using for your applications. If you’ve been using a legacy monitoring system, you’ve probably heard about observability all over the ‘net and want to figure out if this is really something you need to care about.
Red Hat OpenShift is an enterprise Kubernetes platform that provides users with a unified cloud experience wherever it’s deployed. VMware Tanzu Observability by Wavefront offers observability and analytics for multi-cloud Kubernetes environments. Now these two products work even better together.
With more than 1.5M room nights booked per day, Booking.com requires a solid infrastructure that’s constantly monitored. And indeed, Booking.com now has a footprint of 50,000+ physical servers running across four data centers and six additional points of presence. The sheer size of this server fleet makes it viable for Booking.com to have dedicated teams specializing into looking only at the reliability of those servers.
Today’s systems are more distributed, dynamic, and complex than ever before – plus, users have more expectations. Also, the historical reliance on an operations team to monitor, triage, and/or resolve issues has become untenable as the number of services increased. This means that many of the tools that were well-suited before might no longer be adequate.
You need not fear a long-lived streaming workload. A few simple tricks can transform a request that may not ever terminate for hours or days into something you can get regular health and status updates on. We in fact have one of those continuous processing services—Beagle, our Service Level Objective stream processor—which we’ve instrumented in this fashion.
Unlike traditional IT Ops, the role of the SRE isn’t simply focused on finding and solving technical problems. The big win for today’s SREs is supporting the organization’s strategic innovation initiatives. With the appropriate observability capabilities, it’s possible to quantify the value that software infrastructure contributes to this innovation effort.
What’s the first thing most people do when they’re unhappy with a business? Take to social media to complain about it. Observing those comments – otherwise known as “user sentiment observability” – gives you a head’s up as to when problems become big enough to impact user experience. How can you monitor that voice of the customer? And why is it important to do so? Let’s take a deeper look at the issues.
We know that you value collaboration. That’s why we share incident reviews and learnings—because we believe the entire community benefits by working together transparently. In the spirit of working better together, we invited ecosystem partners from ApolloGraph, Cloudflare, LaunchDarkly, and PagerDuty to present at Honeycomb Developer Week, a three-day event filled with snackable, time-efficient learning sessions to help you uplevel your observability skills.
TL;DR: Use auto-instrumentation from OpenTelemetry. Traces will happen. Then your code can use global library functions to customize those traces with your specific important data.
Every few years, the tech world either rebrands an old term or tries to find a way to use old technology to create new advancements. This rabbit hole is easy to fall into with observability, yet it is distinct from some of its predecessors.
When we talk about observability, we tend to focus first and foremost on the metrics, logs, and traces that you can collect from applications – such as request rates, error rates, and request duration. Infrastructure-level metrics, like CPU and memory utilization, might factor into the discussion as well. Here’s a third category of critical observability insights that teams tend to overlook: the network.
Like any great technology, the interest in and adoption of Kubernetes (an excellent way to orchestrate your workloads, by the way) took off as cloud native and containerization grew in popularity. With that came a lot of confusion. Everyone was using Kubernetes to move their workloads, but as they went through their journey to deployment, they weren’t thinking about security until they got to production.
In the past, we’ve written about what instrumentation is and the insights it provides. Instrumenting your code generates telemetry that shows you how your system is performing, and whether your system is healthy. Like with most other companies, at Honeycomb we don’t write all of the code that runs in our systems.
The term Site Reliability Engineer (SRE) first appeared in Google in the early 2000s. In Google’s 2016 SRE Book, Benjamin Treynor Sloss wrote that, generally speaking, “an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).” This means that the SRE teams at Google decide how a system should run in production as well as how to make it run that way.
A developer's viewpoint is distinct. It can be difficult to keep track of operations and detect the fault that is causing the software to malfunction when handling numerous sectors. What if you could detect the issue ahead of time and fix it as soon as possible? The tactics that we concentrate on and put into action are those that assist us in properly managing our tasks. Knowing about observability makes this possible. Let's take a closer look at it in this blog.
Oh goody, I’m so tickled to get this one. *rubs hands gleefully* Funny story, back in 2016–2017 we thought we were building Honeycomb primarily for DB use cases. The use cases are that killer. I’ve never seen another tool do the kinds of things you can do on the fly with Honeycomb and databases.
You’ve heard of observability, which has fast become one of the IT industry’s buzzwords du jour. But what about actionability, or the ability to translate observability into meaningful action? The latter term may not be a trending buzzword (not yet) – indeed, “actionability” perhaps sounds almost boring – but it’s just as essential as observability in managing complex, cloud-native environments.
We are excited to announce the launch of Speedscale CLI, a free observability tool that inspects, detects and maps API calls on local applications or containers. The offering underscores the importance of continued and proactive API testing to quickly detect and debug defects within a shifting array of upstream and downstream interdependencies.
What comes first – observability or AIOps? Can you achieve observability without AIOps? Do you need AIOps if you already have an observability solution in place? These are all questions that any team considering AIOps will want to answer in order to determine the real-world value that AIOps tools stand to offer.
Happy New Year 2022! In 2021, Exoprise’s critical focus was on improving its product for monitoring digital experiences and mobilizing internal teams to improve customer adoption and SaaS/network experiences everywhere. As Covid continues to dominate the world, IT and business teams are increasingly looking for solutions like Exoprise Digital Experience Monitoring (DEM) to ensure end-users are productive with a seamless work-from-home experience.
It’s harder to understand and operate production systems in 2021 than it was in 2001. Why is that? Shouldn’t we have gotten better at this in the past two decades? There are valid reasons why it’s harder: The architecture of our systems has gotten a lot more sophisticated and complex over the past 20 years. We’re not running monoliths on a few beefy servers these days.