Operations | Monitoring | ITSM | DevOps | Cloud

April 2023

From Monolithic to Microservices: Code Instrumentation Trends

Software architectures are greatly influenced by the size and scale of the software applications. With growing size, the code base becomes complex. With scale, the deployment becomes challenging. The result: debugging becomes an increasingly time-consuming process for developers.

Monitoring: The Rise of Data Observability

There’s an increasingly high cost to poor data quality today. Poor customer data costs companies six percent of their total sales, as per a UK Royal Mail study. And, as per IBM, bad data costs U.S. businesses $3.1 trillion per year. As companies transform into data-driven businesses, we witness a sharp interest in data observability.

Observable Frontends: the State of OpenTelemetry in the Browser

The modern standard for observability in backend systems is: distributed traces with OpenTelemetry, plus dynamic aggregations over these events. This works very well in the world of web servers. But what about the web client? This post describes the state of OpenTelemetry support for React web clients, as of early April 2023.

Observability: A Complete Guide

As technology advances, so does the need for software engineers and DevOps teams to understand the precise inner workings of the systems they create. In 2023, observability is quickly becoming a key factor in gaining success for many businesses. The study states that many businesses are in different stages of adopting observability into their arsenal of tools, and that the need for these practices is on the rise.

Why Enterprise Observability Is a Better Approach to Unified Monitoring

To say modern enterprises are built on the cloud is an understatement. According to Gartner, enterprise spending on cloud applications surged to 57.7% in 2022, and investments into public cloud services hit a respectable 41%. These numbers are only predicted to grow further in 2025 as enterprises accelerate cloud rollouts and move into a cloud microservices architecture to realize greater agility, productivity, and competitiveness.

Should Every Incident Get a Retro?

At a recent training session, Jeli spent a great deal of time covering incident retrospectives and what makes an incident worthy of studying. My colleague Ben Hartshorne asked a fascinating question, which I’ll paraphrase here: That caught me by surprise. We had a great discussion, and it made me consider approaches I hadn’t before.

Cost-Cutting Strategies and Smart Tooling Choices to Maximize Your Vendor Budget

Tech debt. Vendor redundancy. System fragmentation. Startups and cloud–born companies are looking at vendors for cost-cutting opportunities. But how do you balance vendor costs and value when those resources and tools bring efficiencies as high as the monthly bills? In this session, Charity Majors and Gergely Orosz share advice on managing spend in a vendor-dependent world.

Monitoring service performance: An overview of SLA calculation for Elastic Observability

Elastic Stack provides many valuable insights for different users. Developers are interested in low-level metrics and debugging information. SREs are interested in seeing everything at once and identifying where the root cause is. Managers want reports that tell them how good service performance is and if the service level agreement (SLA) is met. In this post, we’ll focus on the service perspective and provide an overview of calculating an SLA.

Lightrun Launches New .NET Production Troubleshooting Solution: Revolutionizing Runtime Debugging

Lightrun, the leading Developer Observability Platform for production environments, announced today that it has extended its support to include C# on its plugins for JetBrains Rider, VSCode, and VSCode.dev. With this new runtime support, .NET developers can troubleshoot their apps against.NET Framework 4.6.1+, .NET Core 2.0+, and.NET 5.0+ technologies.

How the All-In Comprehensive Design Fits into the Cribl Stream Reference Architecture

Join Cribl's Ed Bailey and Ahmed Kira as they provide more detail about the Cribl Stream Reference Architecture, which is designed to help observability admins achieve faster and more valuable stream deployment. During this live stream discussion, Ed and Ahmed will explain the guidelines for deploying the comprehensive reference architecture to meet the needs of large customers with diverse, high-volume data flows. They will also share different use cases and discuss the pros and cons of using the comprehensive reference architecture.

The Sun's Setting on Cisco Prime Infrastructure, Rising on SolarWinds Hybrid Cloud Observability

Cisco recently announced its plan to End of Life (EOL) Cisco Prime Infrastructure. While they’re offering an alternative solution with this announcement, Cisco DNA Center, support for multi-vendor environments appears to be decreasing.

Alerting on the User Experience

When your alerts cover systems owned by different teams, who should be on call? We get this question a lot when talking about SLOs. We believe that great SLOs measure things that are close to the user experience. However, it becomes difficult to set up alerting on that SLO, because in any sufficiently complex system, the SLO is going to measure the interaction between multiple services owned by different teams.

Honeycomb's Deployment Protection Rule for GitHub Actions

Today, GitHub announced the public beta of Deployment Protection Rules for GitHub Actions for GitHub Enterprise users. In support of that launch, we’ve partnered with GitHub to create the Honeycomb Deployment Protection Rule (available as a GitHub App). This rule lets you run Honeycomb queries so that you can get real-time performance feedback from your services before deciding whether to prevent deployment of your code to a specific environment.

Observability overload: Insights into the rise of tools, data sources, and environments in use today

With countless observability tools, data sources, and environments to juggle, the organizations that deploy and manage today’s distributed applications often face an uphill battle to gain visibility into their application performance. That was a key takeaway from the Grafana Labs Observability Survey 2023, which incorporated input from more than 250 industry practitioners who are all too familiar with these complexities.

How an Observability Pipeline Can Help With Cloud Migration

Do you want to confidently move workloads to the cloud without dropping or losing data? Of course, everyone does. But easier said than done. Cloud migration is tricky. There’s so much to think through and so much to worry about — how can you reconfigure architectures and data flows to ensure parity and visibility? How do you know the data in transit is safe and secure? How can you get your job done without getting in trouble with procurement?

Beyond Observability and Tracing: Doing More With The Data We Have

Observability is a term that has been thrown around a lot in the past few years in the software development industry. Different people use it in different ways, but one thing that is clear is that it attempts to provide a solution to a real pain engineers are feeling. It is the pain of not knowing what is happening in the microservices architecture and how and why systems are behaving in production.

Elastic Common Schema and OpenTelemetry - A path to better observability and security with no vendor lock-in

At KubeCon Europe, it was announced that Elastic Common Schema (ECS) has been accepted by OpenTelemetry (OTel) as a contribution to the project. The goal is to achieve convergence of ECS and OpenTelemetry’s Semantic Conventions (SemConv) into a single open schema that is maintained by OpenTelemetry. This FAQ details Elastic’s contribution of Elastic Common Schema to OpenTelemetry, how it will help drive the industry to a common schema, and its impact on observability and security.

Lightstep from ServiceNow deepens commitment to OpenTelemetry project

At Lightstep, we’ve seen many organizations grapple with “cloud-native sticker shock” as they come to understand that these complex systems require sifting through massive amounts of data across architectures and proprietary solutions. In today’s macroeconomic environment, organizations are looking to reduce costs while driving innovation, especially when it comes to cloud-native applications.

The Three Pillars of Observability: Metrics, Logs and Traces

Metrics, Logs and Traces are often referred to as The Three Pillars of “Observability“. The term observability has been used in control theory to refer to how the state of a system can be inferred from the system’s external outputs. Applied to IT, observability is how the current state of an application can be assessed based on the data it generates. Applications and the IT components they use provide outputs in the form of metrics, events, logs and traces (MELT).

Optimize your CI/CD Pipeline with Coralogix Tagging

Continuous Integration/Continuous Delivery (CI/CD) has now become the de-facto standard for all engineering teams seeking to keep pace with the demands of the modern economy. At Coralogix, we operate some of the most advanced build and deploy pipelines in the world. We’ve baked that knowledge into our platform with a CI/CD Observability feature called Coralogix Tagging.

Achieving Great Dynamic Sampling with Refinery

Refinery, Honeycomb’s tail-based dynamic sampling proxy, often makes sampling feel like magic. This applies especially to dynamic sampling, because it ensures that interesting and unique traffic is kept, while tossing out nearly-identical “boring” traffic. But like any sufficiently advanced technology, it can feel a bit counterintuitive to wield correctly, at first. On Honeycomb’s Customer Architect team, we’re often asked to assist customers with their Refinery clusters.

What Is Observability? Everything a Beginner Needs to Know

Observability originated in the field of engineering and has recently gained popularity in the world of software development. Put simply, observability refers to the ability to understand the internal state of a system based on its external outputs. IBM defines it as follows: As systems have become more complex, often including remote elements in cloud-based systems, management of the systems and troubleshooting faults and downtime have also become more complex.

Flowmon: How to Choose a Network Observability Platform

Individual EU Member States are expected to transpose the NIS2 and RCE directives into national What are the key characteristics organisations should have to shift from network monitoring to network observability? The need is to have more of a platform approach. Let´s see how to choose a Network Observability Platform to succesfully manage the networks in highly distributed environments.

WhatsUp Gold: How to Choose a Network Observability Platform

Individual EU Member States are expected to transpose the NIS2 and RCE directives into national What are the key characteristics organisations should have to shift from network monitoring to network observability? The need is to have more of a platform approach. Let´s see how to choose a Network Observability Platform to succesfully manage the networks in highly distributed environments.

Observability for business decision making: bridging the gap between IT and the business

Hear from Corneile Britz, Observability and DevOps Specialist and Co-Founder of Boxfish how Observability can provide real-time data to support business decision-making and improve customer experience, enabling collaboration and trust between IT and business stakeholders.

Developing a culture of observability

In the race to attract and retain customers, businesses must deliver great customer experiences, release reliable products fast, and scrutinize costs to achieve consistent growth. That can either be a well-oiled machine or a tangle of disjointed communications and workflows that frustrate customers, employees, and management alike. By developing a culture of observability, you can have a framework that harmonizes the experience for everyone.

Does OpenTelemetry in .NET Cause Performance Degradation?

Contrary to Betteridge’s Law of Tabloid Headlines, the answer to the question, "does OpenTelemetry in.NET cause performance degradation?" is yes, but context is important. I get this question so often that I thought it was time to get some stats on it. I’ve heard comments like: I can only assume that these are based on previous versions, or things like OpenTracing / OpenCensus (the heritage frameworks that were the feeders for OpenTelemetry).

Why the visibility gap is holding your IT operations back

Depending on your business, MTTR stands for mean time to repair or mean time to recovery – but it can also mean resolution, resolve, or restore. No matter how you define it, the basic measurement is the same: it’s the time it takes from when something goes down to when it is back and fully functional. This includes everything from finding the problem to fixing it. For ITOps teams, keeping MTTR to an absolute minimum is crucial.

Enhancing cloud native application observability on AWS with business transaction insights

With business transaction insights in AppDynamics Cloud, you can turn cloud native chaos into business context. Here’s how. In any organization, technology plays a vital role in nearly every aspect of the business — from marketing to operations to human resources. But increasingly, its role in revenue generation is taking center stage. Profitability and growth are now in the hands of CTOs and CIOs.

Log Management in the Age of Observability

The explosive growth of interconnected data across distributed systems has disrupted traditional development, DevOps, and ITOps practices and forced many organizations to rethink their cloud strategies. Higher-velocity feature development and more responsive support requests involve developers throughout the delivery cycle and require them to monitor and observe application behavior before releasing it to production.

Expanding Our Vision: Unifying Client-Side Observability Data

In 2021, we started Request Metrics as a simple and developer-friendly service to measure and improve web performance. We built an incredible platform that distilled complex data down into simple reports and recommendations. Lots of teams around the world found valuable insights in Request Metrics that they couldn’t get anywhere else. But web performance data can be very unpredictable—the web slows down in all sorts of ways.

Cribl Reference Architecture Series: Scaling Effectively for a High Volume of Agents

Join Cribl’s Ed Bailey and Ahmed Kira in an insightful discussion about scaling your Cribl Stream architecture to accommodate a large number of agents. Managing high-volume agent data flows presents a unique set of challenges that must be addressed to ensure the reliable transmission of data from your endpoints to your analytics systems, meeting business resiliency requirements. Errors arising from agent scale and data volume can lead to difficult-to-diagnose and even more challenging-to-fix issues that tend to surface at the most inopportune times.
Sponsored Post

OpenTelemetry 101: A Non-Technical Guide to Starting Your Open Observability Journey

If you’re involved in IT Operations, you’ve probably heard of OpenTelemetry. It’s a hot topic in the observability industry, and for good reason. OpenTelemetry is a set of open-source tools and APIs that make it easy to collect telemetry data from your applications and infrastructure. This data can then be used to monitor your systems, troubleshoot problems, and improve performance.

Gain agility through observability

As companies navigate geopolitical challenges, macroeconomic headwinds, and the post-pandemic comedown, business leaders face intense pressure to drive software transformation, reduce costs, and compete faster in the cloud-transition era of “lift and shift.” Amid layoffs and a slowed pace of hiring, the demand for better tools, real-time insights, seamless experiences, and contextual analysis has skyrocketed.

Troubleshoot faster and modernize your apps with AWS Monitoring and Observability

As a company born in the Amazon Web Services (AWS) cloud, we understand that operating at cloud scale requires balancing security, compliance, and operational safety with your commitment to innovation, speed, and agility. From cost optimization at scale to operational resiliency to application modernization, we know you’re facing various challenges and need reliable solutions.

The Future of Observability is Bright as Honeycomb Announces $50M in Series D Funding

TL;DR—This is a fundraising post! Yes, even in this economy. Here at Honeycomb, we've always focused more on the problems we help our customers solve rather than playing the meta game of posturing in startup-land—so these fundraising blog posts are usually the least fun to write (and read, probably). But this one is a little different.

Revolutionize Your Observability Data with Cribl.Cloud - Streamline Your Infrastructure Hassle-Free!

Cribl.Cloud provides control over observability data without the hassle of running infrastructure. Cribl.Cloud quickly spins up all Cribl products — Stream, Edge, and Search — in just a few minutes.Teams can get working quickly and make their observability data valuable while Cribl handles scaling and security.
Sponsored Post

Airlines aiming to transform need modern Observability

The last decade has been nothing but a roller coaster ride for the airline industry. The pandemic has transformed it forever and now it needs to reevaluate its digital transformation priorities on how to manage traveler expectations. Taking it a step further, travelers buying behavior is changing farther as now they will want to book tickets while chatting with an AI interface. The transformation was already underway. In 2020, Google Cloud and Sabre announced a partnership to modernize Sabre. Recently, American Airlines announced their modern rebooking app launched in partnership with IBM. Lufthansa announced industry's first continuous pricing tailored to suit individual customer attributes.

eBPF Explained: Why it's Important for Observability

eBPF is a powerful technical framework to see every interaction between an application and the Linux kernel it relies on. eBPF allows us to get granular visibility into network activity, resource utilization, file access, and much more. It has become a primary method for observability of our applications on premises and in the cloud. In this post, we’ll explore in-depth how eBPF works, its use cases, and how we can use it today specifically for container monitoring.

How to monitor Kafka and Confluent Cloud with Elastic Observability

The blog will take you through best practices to observe Kafka-based solutions implemented on Confluent Cloud with Elastic Observability. (To monitor Kafka brokers that are not in Confluent Cloud, I recommend checking out this blog.) We will instrument Kafka applications with Elastic APM, use the Confluent Cloud metrics endpoint to get data about brokers, and pull it all together with a unified Kafka and Confluent Cloud monitoring dashboard in Elastic Observability.

Quantum Entangled Observability

As the world of technology continues to evolve, the demand for cutting-edge solutions to monitor and optimize system performance has never been higher. Today, we’re excited to introduce a revolutionary new concept in observability: Quantum Entangled Observability (QEO). This ground-breaking method leverages the peculiar properties of quantum mechanics to provide unparalleled insights into your systems’ inner workings.