Operations | Monitoring | ITSM | DevOps | Cloud

September 2022

The Future of Ops Is Platform Engineering

Two years ago I wrote a piece in The New Stack about the Future of Ops Careers. Towards the end, I wrote: I described the second category as “operations engineering minus the infrastructure,” dedicated to evaluating and assembling a production stack of third-party platform providers, enabling software engineers to self-serve their services and own their own code in production. I said: That second category I was describing now has a name. We call those teams "platform engineering.".

Key Observability Scaling Requirements for Your Next Game Launch: Part III

So far in our series on scaling observability for game launches, we’ve discussed ways to 1) quickly analyze large volumes of telemetry data and, 2) ensure high-quality telemetry data for more effective analysis at lower costs. The best practices in these blogs outline best practices for scaling observability during game launch day – which is necessary to ensure high performance across all infrastructure components – to ensure no lag, no glitches, and no bugs.

Observability and Auto-Remediation

Organizations today are under pressure to stay ahead and maintain IT applications and infrastructure optimally. That means their IT teams are tasked to make sure that functions move along smoothly while minimizing downtime. To keep the lights on, enterprises add whatever domain-specific tools they need. However, these tools are often reactive, and not nearly robust enough to handle complex application topologies.

Exciting News About the Cribl Certified Observability Engineer Program!

At Cribl, we want to make it as easy as possible for anyone to learn about our products. Whether you’re a potential future customer, new user at an existing customer, a partner, we believe knowledge about our products should be free and easy to consume, convenient to access at any time and at a pace desired by the learner. We're excited to announce that we've issued our 1000th certification!

The Complex But Elegant Relationship Between AIOps and Observability

Digital transformation requires organizational evolution. Constant demand for rapid delivery of upgrades and new products forces change. Surely, the old days of managing monolithic applications housed in private servers are over. Applications consist of virtualized, containerized, and serverless code that’s networked via APIs across a hybrid infrastructure of public and private clouds.

What is an Observability Engineer?

What is an observability engineer? Is it your SIEM admin? How about your application performance monitoring admin? Neither? Both? Observability engineering is more than administering a tool. There is more to it than data onboarding, writing parsers, and getting data in. As an observability tool admin, you work with data producers and consumers to get data in a human-readable and searchable format from the source to the analytics system.

Getting Started with OpenTelemetry: Three Companies Check Into OTel Observability

Comprehensive observability starts with good instrumentation. OpenTelemetry, aka “OTel,” sets a unified standard, enabling you to instrument your applications once, then send that data to any backend observability tool of choice. OpenTelemetry’s standard for generating and ingesting telemetry data is slated to become as ubiquitous as current container orchestration standards. Because of this, development teams are increasingly adopting OpenTelemetry to their applications.

Part 6: Observability Maturity Model Summary

For decades, IT operations teams have relied on monitoring for insight into the availability and performance of their systems. But the shift to more advanced IT technologies and practices is driving the need for more than monitoring – and so observability evolved. With infrastructures and applications that span multiple dynamic, distributed and modular IT environments, organizations need a deeper, more precise understanding of everything that happens within these systems.

Demystifying Observability and Making it Work for You

This article is the final installment in a series that demystifies observability. The first three focused on the history of observability, dispelling myths around observability, and what observability is and what it can offer. In this last article of the series (Check out part 1), I want to offer a complete definition of observability.

Sense and Signals

Complex, distributed software systems are chatty things. Because there are many components interoperating amongst themselves and with things outside their bounds like users, those components and the systems themselves emit many information signals. It’s the goal of monitoring, logging, and observability (o11y) tools to help the systems’ “stewards,” those developers and operators tasked with maintaining and supporting them, make sense of those signals.

An Open Source Observability Platform | SigNoz

Cloud computing and containerization have brought many benefits to IT systems, like speed to market and on-demand scaling. But it has also increased operational complexity. Applications built on dynamic and distributed infrastructure are challenging to operate and maintain. A robust observability framework can help application owners stay on top of their software systems. In this article, we will introduce SigNoz - an open source observability platform.

The Complete Kubectl Cheat Sheet [PDF download]

Kubernetes is one of the most well-known open-source systems for automating and scaling containerized applications. Usually, you declare the state of the desired environment, and the system will work to keep that state stable. To make changes “on the fly,” you must engage with the Kubernetes API.

How to get complete CI/CD pipeline observability

It's not like it used to be back in the day! Before CI/CD, we were building on-premises, service-oriented products following system style architecture and we were able to map out the build system and end-to-end process in a PowerPoint or Visio document. Although time-consuming and inefficient, it was relatively straightforward and the build pipeline was unlikely to change drastically. But that's no longer the case.

Observability 101: a chat with Jude Bakeer

We recently sat down with Jude Bakeer, one of LogicMonitor’s Solutions Engineers, to talk about the future of IT and Observability. Part of Jude’s role requires her to talk to customers and enterprises every day. Over the years, she’s gathered unparalleled insights into key trends across these industries and segments – from ops teams to C-level executives.

Troubleshoot in less than 60 seconds with Grafana: Inside NOS's observability stack

It may seem like ancient history, but there was a time when telecommunications companies only had to worry about connecting customers over landlines. Today, their businesses depend on vast cellular networks to not only provide strong wireless phone coverage in countless locations, but also handle the demands of tablets, computers, and machine-to-machine communications.

The Difference Between Monitoring and Observability and Why It Matters

Organizations are adopting cloud native and multi-cloud architectures to drive innovation, achieve faster time to market, improve yield, and deliver exceptional experiences to their customers. However, for all the business benefits of modernizing, the process does not come without challenges.

Sponsored Post

The Importance of Observability for Site Reliability Engineers (SREs)

Site reliability engineers (SREs) play a crucial role in ensuring the reliability of systems. From creating software to improving system reliability in production, responding to incidents, and fixing issues, SREs are responsible for guaranteeing the health of applications.. And observability helps support SREs'. Because an observable system allows them to identify and fix issues promptly, resulting in SRE's being better equipped to fast-track development cycles.

Understanding the Observability Maturity Model

Based on research and conversations with enterprises from various industries, StackState created the Observability Maturity Model. This model defines the four stages of observability maturity. The ultimate destination is level four, Proactive Observability with AIOps. However, even moving from level one to two, or from level two to three, is a huge improvement in your ability to get essential insights into your IT environment.

Part 5: Proactive Observability With AIOps- Level 4

Level 4, Proactive Observability With AIOps, is the most advanced level of observability. At this stage, artificial intelligence for IT operations (AIOps) is added to the mix. AIOps, in the context of monitoring and observability, is about applying AI and machine learning (ML) to sort through mountains of data looking for patterns.

Key Observability Scaling Requirements for Your Next Game Launch: Part II

In Part I in our series outlining best practices for scaling observability, we reviewed the data analysis capabilities that can help engineers troubleshoot faster during high pressure situations during a game launch. Nobody wants lag time or crashes in their game launch. Similarly, no one wants terminated sessions or for your gamer customers to log off and play a competitor’s game.

Beat the holiday rush with Elastic Observability

September is here, and that means many retailers have already begun preparing for the upcoming holiday season. One weekend in particular tends to be the real-life stress test that companies have come to develop a love-hate relationship with: Cyber Weekend. Or more specifically, Black Friday, Cyber Monday, and the weekend in between.

Harness Continuous Observability to Continuously Predict Deployment Risk

In my previous blog, I discussed how continuous observability can be used to deliver continuous reliability. We also discussed the problem of high change failure rates in most enterprises, and how teams fail to proactively address failure risk before changes go into production. This is because manual assessment of change risk is both labor intensive and time consuming, and often contributes to deployment and release delays.

How To: Connecting Azure Blob to Cribl Stream to Replay Observability Data

One of the core features of Cribl Stream is our Replay capability. We pride ourselves on giving customers choice and control over their data. The ability to archive data in cheap object storage, and then providing the ability to reach into the same object storage is one example of this. It’s safe to say that S3 and AWS have become synonymous with the term object storage. It’s like a modern day Kleenex, or Band-Aid.

Key Observability Scaling Requirements for Your Next Game Launch: Part I

After months–or potentially, years–of hard work by teams across a gaming enterprise, when the day arrives for a game launch, the last thing your enterprise needs is slowdowns, glitches, outages or poor performance. It’s the death knell for any game, because for your avid gaming customers, there’s always something else (read: a game that isn’t yours) to check out.

What is AIOps? A beginner's guide

Artificial Intelligence for IT Operations (or AIOps for short) continues to be a hot topic among developers, SREs, and DevOps professionals. The case for AIOps is especially crucial given the expansive nature of today’s observability efforts across hybrid and multi-cloud environments. As with most observability platforms, it all starts with your telemetry data: metrics, logs, traces, and events.

Which is More Important: Observability or AIOps?

In the last half-decade, AIOps and observability have arguably been the hottest two topics in IT operations management. Gartner first mentioned AIOps—Artificial Intelligence for IT Operations—in 2016, defining it as using big data and machine learning to automate IT operations processes, such as event correlation, anomaly detection and causality determination.

8 reasons why network observability is critical for DDoS detection and mitigation

Distributed denial-of-service (DDoS) attacks have been a continuous threat since the advent of the commercial internet. The struggle between security experts and DDoS protection is an asymmetrical war where $30 attacks can jeopardize millions of dollars for companies in downtime and breaches of contract. They can also be a smokescreen for something worse, such as the infiltration of malware.

Running the OpenTelemetry Collector in Azure Container Apps

In this post, we’ll look at how to host the OpenTelemetry Collector in Azure Container Apps. There are a few gotchas with how it’s deployed, so hopefully this will stop you from hitting the same issues. If you don’t care about the details and just want to run a script, I’ve created one here.

Database Decision-Making for Observability, from Simple to Complex

A goal of open-source observability is unifying several different signals to provide the observability everyone wants. It’s always interesting to speak to people on this journey, and how they try to provide it through open-source projects, and the challenges they can face. I was thrilled to host Pranay Prateek on the most recent episode of the OpenObservability Talks podcast.

Sending NGINX Logs to Honeycomb is Darn Easy

Written by Andrew Puch and Brian Langbecker You use NGINX as a proxy for your application, and you want to leverage your favorite features in Honeycomb to help make sense of the traffic data. Have no fear: Honeycomb is more than capable and ready to help! Things you will need: Before you start with the instructions, let’s discuss a lightweight tool called Honeytail. This utility will tail log files, parse the various formats, and send the data to Honeycomb.

Using Observability with Kubernetes to Automate Site Reliability Engineering

In this video, Anthony Evans, solution architect, explains how the StackState topology-powered observability platform can help SREs to automate site reliability, putting their organizations on the path to becoming a zero-downtime enterprise. See how StackState helps to unify and correlate data across your stack, visualize your entire IT environment, instantly pinpoint root cause, reduce alert storms and with AIOps capabilities, even prevent problems proactively. It's all here!

Debugging Node.js HTTP Requests

HTTP is the backbone of all API-centric, modern web apps. APIs are the place where the core business logic of an application lives. As a result, developers spend a lot of time optimizing the API business logic. This article addresses a Node.js developer’s dilemma while debugging an HTTP API request. We take a sample Node.js/Express.js-based HTTP service to demonstrate a new way of debugging Node.js applications using the Lightrun observability platform.

Top 10 Logging Frameworks Across Various Programming Platforms

A logging framework is a software tool that helps developers output diagnostic information during the execution of a program. This information is used to debug the program or monitor its performance. There are many different logging frameworks available, starting with simple logging libraries to full-fledged logging and observability platforms.

Honeycomb Announces Major Updates to PagerDuty Integration

Today, we’re announcing major new updates to Honeycomb’s PagerDuty integration. These updates put more of the information you need into PagerDuty notifications and allow for greater configurability. These enhancements are available to all users who leverage Honeycomb Triggers and Burn Alerts to send notifications via PagerDuty.

Replay Data from Azure Blob with Cribl Stream

One of the core features of Cribl Stream is the Replay capability. We pride ourselves on giving customers choice and control over their data. The ability to archive data in cheap object storage, and then providing the ability to reach into the same object storage is one example of this. It’s safe to say that S3 and AWS have become synonymous with the term object storage. It’s like a modern-day Kleenex, or Band-Aid. However, it’s important to remember that there are other, equally featured object storage options available. In this video, we’ll walk through an example of Replay with Azure Blob, and view logs within Humio.

Changes are Observability's Biggest Blind Spot

Classically, the space of observability lies within layers of information on a dashboard. It operates by using the fundamental trio of data — metrics, logs and traces — from each layer of the environment to assess the health of an IT infrastructure. However, a time component is critical, making the stack observable at any point in time. Gathering reliable data and insights into your IT infrastructure remains the primary role of observability tools and services.

What Is An Observability Data Pipeline?

Have you ever wondered how to get your organization's data into one place so you can easily monitor and troubleshoot your systems? If so, you're not alone. This is a common challenge faced by many organizations. The solution is an observability data pipeline. To better understand what this is and how it works, we've put together a brief overview.

Authors' Cut-No More Pipeline Blues: Accelerate CI/CD with Observability

It’s no secret that CI/CD pipelines make the lives of engineering and operations easier by accelerating the feedback loop for higher quality code and apps. They build code, run tests, and safely deploy new versions of your application. But just like any aspect of development, poor integration, invisible bottlenecks, and bugs can plague your pipelines. And debugging them? Well, it’s complicated.

Real World Insights - My Take on the Observability Maturity Model

A prelude to our upcoming six-part Observability Maturity Model Fundamentals blog series. By Lodewijk Bogaards At StackState, we have spent eight years in the monitoring and observability spaces. During this time, we have spoken with countless DevOps engineers, architects, SREs, heads of IT operations and CTOs, and we have heard the same struggles over and over.

Get the Most Value from Your Observability Investment by Building for the Future

Technically speaking, observability offers visibility into the data being generated by your infrastructure devices, systems, and applications — but in reality, it offers the opportunity to see what’s happening, There’s no guarantee that you’ll get what you want; you have to set things up in a way that makes it possible for you to get the insights you need.

Feature Focus: August 2022

It’s already September! Time flies by when you’re getting things done, and we’ve been a busy bunch of bees here at Honeycomb. 🐝 We’re excited that we’ve gotten to share some of those changes with you already, like our relaunched interactive sandbox and the beta release of our OpenTelemetry log support and Go distribution, but that’s just the tip of the iceberg.

Building a one-stop Open Source Observability Platform | OpenObservability Podcast

Pranay, one of the the co-founders at SigNoz, was recently invited as a guest speaker by Jonah Kowall, CTO at Logz.io on his OpenObservability Podcast. In the podcast, Pranay talks about the mission behind SigNoz - unifying traces, metrics, and logs in a single platform and interface. He also shared anecdotes about the evolution of SigNoz since its inception, the community adoption, and its contribution to SigNoz.

Ask Miss O11y: How Can I Convince My Organization to Invest in Instrumenting for Observability?

We recently hosted a Twitter Space, and a question came in regarding speaking to executives about instrumenting for observability. It’s a great topic we love expanding on. Here’s the answer we provided.

How Observability and AIOps Are Transforming the World

Artificial intelligence for IT operations (AIOps) is the application of artificial intelligence (AI) and associated technologies—like machine learning (ML) and natural language processing—for normal IT operations activities and endeavors. AIOps helps ITOps, DevOps, and site reliability engineer (SRE) teams work better by examining IT information and observability telemetry.