Operations | Monitoring | ITSM | DevOps | Cloud

September 2023

Sponsored Post

3 Ways FinTechs Can Improve Cloud Observability at Scale

Financial technology (FinTech) companies today are shaping how consumers will save, spend, invest, and borrow in the economy of the future. But with that innovation comes a critical need for scalable cloud observability solutions that can support FinTech application performance, security, and compliance objectives through periods of exponential customer growth. In this blog, we explore why cloud observability is becoming increasingly vital for FinTech companies and three ways that FinTechs can improve cloud observability at scale.

Lightrun's Product Updates - Q3 2023

Throughout the third quarter of this year, Lightrun continued its efforts to develop a multitude of solutions and improvements focused on enhancing developer productivity. Their primary objectives were to improve troubleshooting for distributed workload applications, reduce mean time to resolution (MTTR) for complex issues, and optimize costs in the realm of cloud computing. Read more below the main new features as well as the key product enhancements that were released in Q3 of 2023!

Checkly Named a Cool Vendor in the 2023 Gartner Cool Vendors in Monitoring and Observability Report

Checkly announced its inclusion in the 2023 Gartner Cool Vendors Report, Gartner Cool Vendors in Monitoring and Observability. Following Recent Inclusion in Two Gartner Hype Cycles, the Third Recognition Affirms Checkly as an Innovation Leader in Monitoring as Code. Checkly provides synthetic monitoring as code that offers a faster, integrated and more scalable approach to API and browser digital experience monitoring. This enables a unified process to be followed through the entire software development life cycle, from test through staging and continuous monitoring production environments.

Elastic SQL inputs: A generic solution for database metrics observability

Elastic® SQL inputs (metricbeat module and input package) allows the user to execute SQL queries against many supported databases in a flexible way and ingest the resulting metrics to Elasticsearch®. This blog dives into the functionality of generic SQL and provides various use cases for advanced users to ingest custom metrics to Elastic®, for database observability. The blog also introduces the fetch from all database new capability, released in 8.10.

What Is a Feature Flag? Best Practices and Use Cases

Do you want to build software faster and release it more often without the risks of negatively impacting your user experience? Imagine a world where there is not only less fear around testing and releasing in production, but one where it becomes routine. That is the world of feature flags. A feature flag lets you deliver different functionality to different users without maintaining feature branches and running different binary artifacts.

Observability at Scale Needs Summary

The shift from traditional monitoring to observability is widespread, and necessary. It's the way we make sense of increasingly complex and distributed systems. But when we capture all this data at scale... what do we do with it all? If this data itself had inherent value, we’d all be rich. But in the real world data does not provide us value until we can act on what it tells us.

Rescue Struggling Pods from Scratch

Containers are an amazing technology. They provide huge benefits and create useful constraints for distributing software. Golang-based software doesn’t need a container in the same way Ruby or Python would bundle the runtime and dependencies. For a statically compiled Go application, the container doesn’t need much beyond the binary.

How to monitor SLOs with Grafana, Grafana Loki, Prometheus, and Pyrra: Inside the Daimler Truck observability stack

In order for fleet managers at Daimler Truck to manage the day-to-day operations of their vast connected vehicles service, they use tb.lx, a digital product studio that delivers near real-time data along with valuable insights for their networks of trucks and buses around the world. Each connected vehicle utilizes the cTP, an installed piece of technology that generates a small mountain of telemetry data, including speed, GPS position, acceleration values, braking force and more.

Circonus Passport: Automatically adapt your observability data collection strategy on the fly

Learn about Passport, the industry's first dynamic telemetry management solution. Passport significantly eases telemetry agent management, allowing you to adapt your data collection based on environmental signals. High CPU? Start collecting data at 10s granularity vs 60s. Collect more data when you need it, less when you don't to gain better visibility, speed MTTR, and reduce observability costs.

Putting Developers First: The Core Pillars of Dynamic Observability

Organizations today must embrace a modern observability approach to develop user-centric and reliable software. This isn’t just about tools; it’s about processes, mentality, and having developers actively involved throughout the software development lifecycle up to production release. In recent years, the concept of observability has gained prominence in the world of software development and operations.

Can you have a career in Node without knowing Observability?

”Isn’t Observability something for Ops to worry about?” I’ve heard this response more than once when talking about how developers should learn OpenTelemetry. I wanted to write this piece to show you how important and how easy it is to learn observability from day one as a coder.

Circonus Launches Open Beta for Passport, Ushering in a New Era of Flexible Observability

Sky-high observability costs or visibility gaps? This is the unfortunate trade-off many organizations have to make when it comes to determining how much telemetry data they should collect and send to their observability tools. Teams either collect more data than they need and pay the price, or they collect less and suffer visibility gaps. Today, this all changes.

Four ways full-stack observability drives organizational success

Learn how full-stack observability can benefit your organization with real-time visibility into all layers of your IT infrastructure. With digital environments growing more complex, customer expectations are at an all-time high — and IT teams are being asked to manage more with fewer resources while also being “more strategic.” Impossible, right? Well, it can be without full-stack observability.

LLMs Demand Observability-Driven Development

Our industry is in the early days of an explosion in software using LLMs, as well as (separately, but relatedly) a revolution in how engineers write and run code, thanks to generative AI. Many software engineers are encountering LLMs for the very first time, while many ML engineers are being exposed directly to production systems for the very first time.

Top 10 Mistakes People Make When Building Observability Dashboards

Observability dashboards are powerful tools that enable teams to visualize and monitor the performance, health, and behavior of their applications and infrastructure. However, building observability dashboards is not a straightforward task, and many organizations make common mistakes hindering their ability to gain meaningful insights and respond to issues effectively.

Native OpenTelemetry support in Elastic Observability

OpenTelemetry is more than just becoming the open ingestion standard for observability. As one of the major Cloud Native Computing Foundation (CNCF) projects, with as many commits as Kubernetes, it is gaining support from major ISVs and cloud providers delivering support for the framework. Many global companies from finance, insurance, tech, and other industries are starting to standardize on OpenTelemetry.

Data-driven Network Observability

The network may be the last thing most people think about, but it’s one of the most crucial components of application delivery. Here we discuss the importance of a data-driven approach to network observability. We unpack how Kentik’s approach to machine learning, big data, and a unified data repository can help network operations solve problems faster to ensure a reliable network with great application performance.

Solving Faster in the Cloud Hybrid Infrastructure Observability with Kentik

How can you quickly discover misconfigured security groups, access control lists, or routing tables? We explore how practitioners serving distributed teams or customer workloads can tighten up policies, impact costs, and unblock their colleagues with cloud infrastructure observability that starts with the network.

Class is in Session with The Observability Professor!

Please join the Observability Professor, Perry Correll, and Ed Bailey as they kick off a series of live streams about the magic and challenges of observability. In this session, Perry and Ed will talk about the foundational aspects of what is observability and its value to an enterprise. In later sessions, they will talk about steps for better telemetry from your applications and logs and how to use that data to help your business achieve clear insights into your application and customer behavior. It will be a fun and interesting discussion!

Elastic AI Assistant for Observability

Harness the power of generative AI to turn insights into actions. Powered by the Elasticsearch Relevance Engine™ (ESRE™), Elastic’s AI Assistant (in technical preview for Observability) transforms problem identification and resolution by eliminating manual data chasing across silos to an interactive assistant that delivers accurate and context-aware remediation for SREs.

Correlation Does Not Equal Causation - Especially When It Comes to Observability [Part 1]

Observability has been tied up with causality from its origins in the mathematical realm of control theory in the early 1960s. A system (of any kind, hardware or software, natural or engineered) was deemed to be ‘observable’ if it generated self-descriptive data from which it was possible to infer how states of the system were causally related to one another.

One-Click Insights with Board Templates

Whether you’re a new Honeycomb user or a seasoned expert looking to uncover fresh insights, chances are you’ve sent tremendous amounts of data into Honeycomb already. The question is, now what? We have the answer: Board templates. Teams can now create Boards based on pre-built templates that generate visualizations with a single click.

Unlocking IT: Considerations for a Powerful Observability Strategy

In today's cloud-native landscapes, observability is more than a buzzword; it's a critical element for software development teams looking to master the complexities of modern environments like Kubernetes. There’s a multi-faceted nature to observability with all its various levels and dimensions — from basic metrics to comprehensive business insights. It’s complex and can continue indefinitely…if you let it.

Cisco Secure Application Delivers Business Risk Observability for Cloud Native Applications

Built on Cisco's Full-Stack Observability Platform, Cisco Secure Application provides organisations with intelligent business risk insights to help them better prioritise issues, respond in real-time to revenue-impacting security risks and reduce overall organisational risk profiles.

OpenTelemetry Gotchas: Phantom Spans

This guest post is written by Ian Duncan, Staff Engineer - Stability Team at Mercury. To view the original post, go to Ian's website. At work, we use OpenTelemetry extensively to trace execution of our Haskell codebase. We struggled for several months with a mysterious tracing issue in our production environment wherein unrelated web requests were being linked together in the same trace, but we could never see the root trace span.

Observability for the Public Sector: Greater Visibility for a More Resilient Digital Future

Observability continues to prove its worth. In The State of Observability 2023, the annual research report Splunk created in partnership with the Enterprise Strategy Group, we share the characteristics that set the observability leaders (those with a mature observability practice) apart from the rest.

A Journey to Observability: Following Your Data From Generation to Analysis

I’m launching a new Observability Series called the Observability Professor, and it is designed to cover some common topics and terms in a vendor agnostic way. That’s right, no marketing! So what’s special, what’s new, what’s it going to cover that everyone else in the industry missed? Background: There are endless amounts of blogs, papers, and books on Observability; what it is and what it offers.

How to Develop a Modern Monitoring & Observability Strategy for Businesses of Any Size

In the dynamic world of IT, the way we monitor systems has seen a remarkable evolution. Gone are the days when monitoring was limited to basic server checks or infrastructure health. With the rise of cloud-native applications, serverless architectures, and container orchestration platforms like Kubernetes, the digital landscape has become a multi-dimensional maze.

Hot Topic: Increasing Cost-Efficient Observability with Cold Tier

Even as the global economy shows signs of a rebound, today’s observability customers are more focused than ever on driving utmost value from their investments. This isn’t simply because economics have forced organizations to closely review overhead and drive out unnecessary costs; the reality is that observability has become one of the leading budget items for every cloud software organization, full stop.

How to use the Grafana Faro Web SDK with Grafana Cloud Frontend Observability to gain additional app insights

Frontend observability (or real user monitoring) is a critical, yet often overlooked, part of systems monitoring. Website and mobile app frontends are just as complex, if not more so, than the backend systems observability teams typically prioritize. They also represent the first interaction users have with our applications — so it’s important to have full visibility into that experience.

Streamlining Incident Investigation

Honeycomb Customer Success Manager Josh Levin explains how to troubleshoot production incidents using Honeycomb's telemetry data: metrics, traces, and logs. While these data forms have separate interfaces, you can investigate seamlessly within Honeycomb. Josh highlights the key role of the "retriever" service in data ingestion and querying and demonstrates cross-validating tracing data with metrics to spot anomalies in pod deployments and resource usage, presented in a separate dataset. He also uses effective log filtering and searching for keywords like "update status.".

The 12 Cats of Observability

On the surface, business-critical IT infrastructure and cats may not seem like they have a lot in common. But they’re way more alike than you might think. Our feline friends contain multitudes, as any cat parent will tell you. They’re complex and can sometimes drive you up a wall. But once they warm up to you—and you warm up to them—the joys and benefits of having them in your life outweigh just about everything. Sounds a lot like technology, right?

Simplify observability with the Grafana OpenTelemetry Starter and Spring Boot 3

To help simplify instrumenting Spring Boot applications with Grafana Cloud, we are excited to introduce the Grafana OpenTelemetry Starter, a project that connects the latest Micrometer enhancements from Spring Boot 3 with Grafana Cloud using OpenTelemetry. By using these tools, you will have logs, metrics, and traces in a single service — in the same easy way that you can use Prometheus with Spring Boot.

Deploying the OpenTelemetry Collector to Kubernetes with Helm

The OpenTelemetry Collector is a useful application to have in your stack. However, deploying it has always felt a little time consuming: working out how to host the config, building the deployments, etc. The good news is the OpenTelemetry team also produces Helm charts for the Collector, and I’ve started leveraging them. There are a few things to think about when using them though, so I thought I’d go through them here.

Observability vs Monitoring: What's the Difference?

Observability and monitoring: These terms are often used interchangeably, but they represent different approaches to understanding and managing IT infrastructure. If you are new to these terms or are often confused between the two, this blog is for you! In this blog, we'll explore the key concepts of observability and monitoring, their evolution in IT operations, their differences and similarities, and their importance in modern infrastructure.

Incident Review: What Comes Up Must First Go Down

On July 25th, 2023, we experienced a total Honeycomb outage. It impacted all user-facing components from 1:40 p.m. UTC to 2:48 p.m. UTC, during which no data could be processed or accessed. This outage is the most severe we’ve had since we had paying customers. In this review, we will cover the incident itself, and then we’ll zoom back out for an analysis of multiple contributing elements, our response, and the aftermath.