Operations | Monitoring | ITSM | DevOps | Cloud

November 2023

Multi-Cluster Observability Part 2: Developing The Right Strategy

This is the second of a three-part blog series. Prior to reading this, be sure to check out Part 1, Benefiting from multi-cluster setups requires familiarity with common variations. In your Kubernetes journey, it's highly likely that you'll encounter the need to manage multiple clusters simultaneously.

How Embedded Device Observability Helps Latch Build Ultra-Reliable Products

Embedded developers have historically found it difficult to obtain high-quality data on the performance and health of their devices once deployed in the field. They've had to rely on customer reports and navigate through complex and time-consuming processes to effectively address any issues that arise. For companies like Latch, who care deeply about product reliability and quality, this isn’t good enough. Find out how they use Memfault to collect high-quality debugging and performance data from their devices in the field and use it to ensure their customers get the best possible product.

DevAlert 2.0 Now Available

DevAlert 2.0, which is now immediately available from Percepio, is a major upgrade to our edge observability platform. The upgrade provides much improved diagnostic capabilities, including core dumps for Arm Cortex-M devices. This allows remote inspection of crashes, errors or security anomalies in full detail, including the function call stack, parameters and variables and with source code display.

How Generative AI Makes Observability Accessible for Everyone

We are pleased to share a sneak peek of Query Assistant, our latest innovation that bridges the world of declarative querying with Generative AI. Leveraging our large language models (LLMs), Coralogix’s Query Assistant translates your natural language request for insights into data queries. This delivers deep visibility into all your data for everyone in your organization.

Observability Is About Confidence

Observability is important to understand what’s happening in production. But carving out the time to add instrumentation to a codebase is daunting, and often treated as a separate task to writing features. This means that we end up instrumenting for observability long after a feature has shipped, usually when there’s a problem with it and we’ve lost all context. What if we instead treated observability similarly to how we treat tests?

Logit.io Unveils Exciting Enhancements: Integrating OpenSearch 2.10.0

We're thrilled to share an exciting update from Logit.io. As part of our ongoing commitment to providing cutting-edge observability solutions to our users, we've integrated OpenSearch 2.10.0 into our platform, bringing a host of advanced features to enhance your experience. Let's dive into what's new and how these changes can benefit your observability workflows.

Streamline your CD pipeline for Cisco Cloud Observability

How can you leverage a monitoring-as-code mechanism to initiate new workload monitoring, or to create new visualizations? In this demo, see how Cisco AppDynamics can integrate with Flux CD (Continuous Delivery)—a GitOps Kubernetes operator tool that offers a simple and efficient interface to synchronize manifests within CD workflows from GitHub repositories. See how easy it is to upgrade existing software with just a few lines of code such as when instrumenting new workloads with the OpenTelemetry Agent or customizing a Grafana dashboard.

Using Honeycomb for LLM Application Development

Ever since we launched Query Assistant last June, we’ve learned a lot about working with—and improving—Large Language Models (LLMs) in production with Honeycomb. Today, we’re sharing those techniques so that you can use them to achieve better outputs from your own LLM applications. The techniques in this blog are a new Honeycomb use case. You can use them today. For free. With Honeycomb.

Gartner IOCS replay: Achieving unified observability with data mesh

The single pane of glass is perhaps the most enduring and elusive goal of enterprise IT operations teams. When we polled our customers a couple of years ago, out of 184 respondents, 99% of them rated it as important to their business – with 64% indicating “extremely important”. The shared dream is to have: But unfortunately, the single pane of glass has become a bit of myth.

What's the Difference between AIOps and Observability?

In the ever-evolving world of IT, keeping an eye on application, service and system performance and addressing issues in real-time is crucial both to an organization’s customer experience, as well as its overall success. Two terms and approaches that have gained significant attention in recent years are AIOps and observability. While they both relate to improving IT monitoring and management, they serve distinct roles in enhancing operational efficiency.

Opinionated Observability with Ralph Meijer - Navigate Europe 23

Join Ralph Meijer from Netdata as he explores the concept of "Opinionated Observability" in our latest Navigate Europe 23 talk. Dive into the intricacies of transforming complex metrics into user-friendly visualizations and alerts. Ralph shares his professional journey and the challenges he faced in different roles, emphasizing the importance of understanding metric context and units. Discover how Netdata's innovative approach simplifies observability, offering pre-defined dashboards and alerts for efficient monitoring.

The role of observability in incident response

Observability has brought a new approach to IT infrastructure management, easing the workload on IT admins across the world and bringing more accuracy and efficiency. One of the clear beneficiaries of this evolution in IT infrastructure management is incident response. Incident response is the systematic process of identifying, analyzing, and mitigating security threats, breaches, or operational issues to minimize their impact on the continuity of business operations.

Micro Lesson: Monitoring and Troubleshooting with AWS Observability Solution

This video introduces Sumo Logic's AWS Observability solution, which is an all-in-one approach to give visibility into the important elements of the cloud infrastructure and assist in troubleshooting complex issues. This video further describes the features of the observability solution such as pre-built dashboards, prepackaged log searches, and the out-of-the-box alerts that help in monitoring and troubleshooting.

Elastic Observability monitors metrics for Google Cloud in just minutes

Developers and SREs choose to host their applications on Google Cloud Platform (GCP) for its reliability, speed, and ease of use. On Google Cloud, development teams are finding additional value in migrating to Kubernetes on GKE, leveraging the latest serverless options like Cloud Run, and improving traditional, tiered applications with managed services. Elastic Observability offers 16 out-of-the-box integrations for Google Cloud services with more on the way.

Defensive Instrumentation Benefits Everyone

A lot of reasoning in content is predicated on the audience being in a modern, psychologically safe, agile sort of environment. It’s aspirational, so folks who aren’t in those environments may feel like the path there includes doing “the new thing” or using “the new tool.” If you write software and your employer hasn’t caught up to all the newest, best ways to work, I hope this pragmatic post helps you sleep better at night.

Multi-Cluster Observability Part 1: Building A Foundation

In the world of modern Kubernetes, things have come a long way from the days of a single cluster handling one app. Now, it's common to see setups that span multiple clusters across different clouds. Initially, managing those clusters was a complicated operation with many moving parts. Using tools such as SUSE Rancher, RedHat OpenShift or AWS EKS, made managing multiple clusters somewhat easier.

Building scalable OSS observability with Mimir, Loki, Tempo, and Pyroscope | ObservabilityCON 2023

In this video, we cover the latest and greatest news about the scalability and performance of the open source telemetry backends that make up the Grafana LGTM Stack: Grafana Mimir for Prometheus metrics, Grafana Loki for logs, and Grafana Tempo for traces.

Kubecon 2023: Code, Culture, Community, and Kubernetes

Kubecon 2023 was more than just another conference to check off my list. It marked my first chance to work in the booth with my incredible Kentik colleagues. It let me dive deep into the code, community, and culture of Kubernetes. It was a moment when members of an underrepresented group met face-to-face and experienced an event previously not an option.

What is CI/CD observability, and how are we paving the way for more observable pipelines?

Observability isn’t just about watching for errors or monitoring for basic health signals. Instead, it goes deeper so you can understand the “why” behind the behaviors within your system. CI/CD observability plays a key part in that. It’s about gaining an in-depth view of the entire pipeline of your continuous integration and deployment systems — looking at every code check-in, every test, every build, and every deployment.

Tame observability complexity: Understanding the observability tool landscape

Choosing, deploying, maintaining, and rationalizing observability and monitoring tools can be a constant challenge for ITOps, DevOps, and SRE teams. As teams monitor increasingly complex systems, the need for instrumentation that monitors those systems grows at the same rate, leading directly to a growing problem of observability data engineering, integration, and enrichment.

NXP + Memfault + Golioth: Bringing Observability and Device Management to IoT Devices

NXP, Golioth, and Memfault have collaborated to give IoT developers the same composable tooling that cloud developers are accustomed to with modern data architectures. With this partnership, NXP developers can leverage a single, secure connection for instant access to data routing, core dump analysis, and observability for rapid time-to-market and improved IoT device performance. In the webinar, our presenters cover.

Best practices to scale and modernize your observability strategy

ObservabilityCON 2023 took place in London this week, showcasing all the latest and greatest trends in open source observability. Following the opening keynote, the event featured a range of breakout sessions — led by both Grafana Labs experts and members of the Grafana OSS community — that explored observability best practices and lessons learned.

What Is Observability? Key Components and Best Practices

Software systems are increasingly complex. Applications can no longer simply be understood by examining their source code or relying on traditional monitoring methods. The interplay of distributed architectures, microservices, cloud-native environments, and massive data flows requires an increasingly critical approach: observability.

How the LGTM Stack changed the observability culture at Wise Payments

The observability team at Wise Payments – Europe’s leader in cross-border money transfers – had long provided the company’s developers access to a multitude of tools. But as costs and complexity increased, Ibukun Itimi, Engineering Lead for Observability and Andrew Brown, Reliability Squad Lead, saw an opportunity to change not only the tools they were using, but also the observability culture.

Reach new heights in business excellence with full-stack observability

Organizations are constantly looking to grow and expand, which requires establishing strong foundations, especially for the IT infrastructure. The challenge in achieving this is to consistently push the limits of the IT infrastructure to deliver more business excellence. To ensure success, management operations should be fine-tuned, and this often requires improving tool sets, skillsets, and personnel.

10 Practical Machine Learning Use Cases in Observability - Navigate Europe 23

Dive into the world of machine learning and its practical applications in observability with Andrew Maguire from Netdata. Explore a variety of use cases, challenges, and considerations in implementing ML for enhanced monitoring and analytics. Learn about the potential benefits and the importance of human oversight in this insightful presentation.

How To Save Money On Your Observability Costs

In today's digital age, the complexity and scope of dynamic system architectures are expanding at an unprecedented rate. As a result, IT teams find themselves grappling with the challenge of monitoring and addressing conditions across multi-cloud environments. With the increasing complexities, IT operations, DevOps, and SRE teams are searching for enhanced observability within these multifaceted computing environments.

From Oops to Ops: SLOs Get Budget Rate Alerts

As someone living the Honeycomb ops life for a while, SLOs have been the bread and butter of our most critical and useful alerting. However, they had severe, long-standing limitations. In this post, I will describe these limitations, and how our brand new feature, budget rate alerts, addresses them. We usually don’t have SREs writing product announcements, but I’m so excited about this one that I said, “Screw it, I’m doing it!”

Managing observability spend with Grafana Cloud's Cost Management Hub

Learn how Grafana Cloud helps analyze, manage and optimize observability spend from a central location called the cost management hub. The move to cloud-native architectures like K8s and Prometheus has caused an unprecedented increase in telemetry data that has resulted in observability bills skyrocketing. With Grafana Cloud and the central cost management hub, you will be able to answer any cost-related question with the tools to inspect, attribute, optimize and monitor your observability spend.

Cisco Cloud Observability on AWS: Deploying is easy with the AppDynamics add-on for Amazon EKS Blueprints with Terraform

Quickly deploy the Cisco AppDynamics Kubernetes® and App Service Monitoring solution for cloud native application observability using Helm charts and Amazon EKS Blueprints for Terraform module. In this blog, I’ll show you how to deploy the AppDynamics Kubernetes and App Service Monitoring solution for cloud native application observability using Helm charts and the Amazon EKS Blueprints for Terraform module. Now, you can do it in just minutes.

How Asserts.ai will make it even easier for Grafana Cloud users to understand their observability data

At Grafana Labs, our mission has always been to help our users and customers understand the behavior of their applications and services. Over the past two years, the biggest needs we’ve heard from our customers have been to make it easier to understand their observability data, to extend observability into the application layer, and to get deeper, contextualized analytics.

Announcing Application Observability in Grafana Cloud, with native support for OpenTelemetry and Prometheus

The Grafana LGTM Stack (Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics) offers the freedom and flexibility for monitoring application performance. But we’ve also heard from many of our users and customers that you need a solution that makes it easier and faster to get started with application monitoring.

How to Create Log-Based Metrics to Improve Application Observability

As a Site Reliability Engineer (SRE) or DevOps professional, you are well aware of the importance of observability in ensuring the smooth functioning and performance of your applications. Observing and monitoring your applications can help you identify and resolve issues in real-time, resulting in increased reliability and improved user experience. Logs play a crucial role in this process as they provide detailed information about the activity and behavior of your applications.

Selecting Observability and Security Solutions in Compliance with RBI: Fintech Challenges

Fintech, an abbreviation for financial technology, encompasses many firms and technologies that employ innovation and tech to enhance and automate financial services and operations. Their goal is to enhance the efficiency, accessibility, and user-friendliness of financial services. Fintech entities span numerous sectors within the financial industry, such as online payments, lending, digital banking, investing, insurance, and more, all aimed at streamlining financial processes.

What Do Developers Need to Know About Kubernetes, Anyway?

Stop me if you’ve heard this one before: you just pushed and deployed your latest change to production, and it’s rolling out to your Kubernetes cluster. You sip your coffee as you wrap up some documentation when a ping in the ops channel catches your eye—a sales engineer is complaining that the demo environment is slow. Probably nothing to worry about, not like your changes had anything to do with that… but, minutes later, more alerts start to fire off.

How observability and AIOps work better together

If you’re juggling complex, cloud-based, containerized systems and aiming to meet high customer expectations, your old monitoring processes probably don’t cut it anymore. Increasing infrastructure complexity means you need to instrument more, log more, and monitor more. That leads to even more complexity. The answer is better observability, right? Yes and no. Observability and monitoring are critical, but they are only part of what you need for service awareness and availability.

More is More - A Case for Dynamic Observability

Dynamic observability is the concept that the amount of data collected should scale based on signals from your environment. Elastic infrastructure is not a new concept. Much of the internet is powered by services that provision more resources based on signals derived from metrics like cpu load, memory utilization and queue depth. If we can use tools to right size our infrastructure, why can’t we also use tools to right size the amount of data we collect?

Resolve issues faster with Grafana Cloud Application Observability

Grafana Cloud Application Observability provides an out-of-the box experience to monitor application performance and minimize MTTR. With its native support of the open standards OpenTelemetry and Prometheus, Application Observability unifies signals across the full stack, accelerating root cause analysis while removing proprietary formats and vendor lock-in. Watch this demo of how to use Application Observability in Grafana Cloud.

Zero-code application observability with Grafana Beyla and eBPF: demo

The eBPF-based OSS auto-instrumentation tool Grafana Beyla makes it easier to get started with application observability. Beyla provides RED (Rate, Errors, Duration) metrics through OpenTelemetry or Prometheus for your existing web services, whichever language they are written in. You don’t need to change any line of application code or configuration; you only need to deploy the Beyla in the same host as the service that you want to monitor. Collecting monitoring data with the eBPF autoinstrument tool has very low overhead, and allows you to capture data about your runtime, which is impossible with manual code instrumentation. Watch this in-depth demo of how to use Grafana Beyla to get started with application observability.

Control Prometheus cardinality and metrics cost with Adaptive Metrics

Adaptive Metrics is a cost management feature in Grafana Cloud that helps enterprises control Prometheus cardinality and reduce their observability spend by identifying and eliminating unused metrics. Grafana Cloud customers using Adaptive Metrics see 20-50% reduction in their observability bill.

How Mercado Libre scales its AWS microservices without losing visibility

Learn how Mercado Libre acts more quickly, strategically, and proactively thanks to Datadog’s centralized platform and context-rich alerting.Mercado Libre hosts the largest online commerce and payments ecosystem in Latin America, which means thousands of dollars can be lost if some of their critical applications stop working for even 1 minute. Senior Technical Manager Juliano Martins and software expert Marcelo Quadros share a few reasons why they chose Datadog as their observability platform of choice for their AWS environment: the power of our infrastructure monitoring solution, extensive range of integrations, strong reputation in the market, and more.

Simplify OpenTelemetry Pipelines with Headers Setter

In telemetry jargon, a pipeline is a directed acyclic graph (DAG) of nodes that carry emitted signals from an application to a backend. In an OpenTelemetry Collector, a pipeline is a set of receivers that collect signals, runs them through processors, and then emits them through configured exporters. This blog post hopes to simplify both types of pipelines by using an OpenTelemetry extension called the Headers Setter.

Observability Shifts Right

Observability first emerged as a focal point of interest in the DevOps community in the 2017 time frame. Aware that business was demanding highly adaptable digital environments, DevOps professionals realised that high adaptability required a new approach to IT architecture. Whereas historically, digital stacks were monolithic or, at best, coarsely grained, the new stacks would have to be highly modular, dynamic, ephemeral at the component level, and spread over multiple cloud-based services.

Tackling Staffing, Funding, and Data Challenges Head-On with TAQA

Join Ed Bailey and TAQA Group's Andrew Ochse as they discuss the diverse services that TAQA offers, look at the challenges with scaling and staffing, and explore in great detail the solutions to classic problems such as insufficient funding, poor data quality, and slow connections linking global sites to their Security Operations Center (SOC).

Why public sector needs AI-powered observability: Cost savings, ROI, and analyst efficiency

Elastic Observability customers saw 243% ROI and $1.2 million in savings over 3 years For government and education organizations around the world, facilitating an efficient, reliable customer experience is essential when providing critical services and building trust with stakeholders. As technology infrastructure expands and the IT landscape becomes a complex mix of private cloud, public cloud, and air-gapped environments, the ability to see across all systems and data is challenging yet critical.

The Art of Event-Driven Observability with OpenTelemetry with Henrik Rexed - Navigate Europe 23

Join Henrik Rexed from Dynatrace in this comprehensive session as he unravels the complexities of event-driven observability using OpenTelemetry. Whether you're new to the field or looking to enhance your skills, this talk provides valuable insights, practical examples, and best practices to guide you on your journey to mastering observability.

Elastic Observability ES|QL Demo

Elevate Your Data Game with Elastic Observability and ES|QL! Discover the future of data querying with Elastic’s groundbreaking new feature: ES|QL! In this video you'll deep dive into how ES|QL revolutionizes the way you interact with complex, distributed data, ensuring seamless and efficient data analysis. Who Is This For? Whether you are a data analyst eager to optimize your query writing skills, or a business leader looking to democratize data insights across your organization, this video is tailor-made for you!

Quantifying the value of AI-powered observability

Organizations saw a 243% ROI and $1.2 million in savings over three years In today’s complex and distributed IT environments, traditional monitoring falls short. Legacy tools often provide limited visibility across an organization’s tech stack and often at a high cost, resulting in selective monitoring. Many companies are therefore realizing the need for true, affordable end-to-end observability, which eliminates blind spots and improves visibility across their ecosystem.

SEC Charges on SolarWinds: A Wake-Up Call for Cybersecurity and Risk Management

Cribl’s Ed Bailey and Jackie McGuire look into the recent SEC fraud charges leveled against SolarWinds and its CISO, concerning alleged fraud and internal control failures tied to known cybersecurity risks and vulnerabilities. These charges carry long-term implications for corporate handling of cybersecurity and risk management. Tune into the live stream for an engaging conversation, and come prepared with your questions and insights on the future of cybersecurity.

Demystifying Cloud and Cloud-Native Observability

In the ever-evolving and fast-changing landscape of cloud computing and modern software development, achieving 360-degree visibility into your critical business services, applications and infrastructure is essential. This is where observability comes into play. Observability, especially in a cloud-based or cloud-native environment, has become a critical aspect of maintaining and optimizing complex systems and services.

Application Observability on RKE2 With SUSE Rancher and StackState

Please join Jeroen van Erp, StackState's Product Manager while he shows you how you can achieve full observability of your SUSE Rancher managed clusters. He'll demonstrate StackState's Kubernetes troubleshooting capabilities for development teams. You can easily manage your Rancher clusters and gain visibility into all your Kubernetes resources by installing the StackState agent from the SUSE Rancher marketplace. Jeroen will walk you through the service overview, service dependency map and powerful troubleshooting features that StackState offers.

From isolation to integration: Why siloed IT teams should leverage full-stack observability

Discover how full-stack observability brings siloed teams together for greater productivity, efficiency and profitability. When application entities become increasingly distributed, so does the data they hold — and that’s a huge challenge for organizations managing and governing expanding complex application environments.

[Webinar] End-to-end Azure observability: The complete essentials of Azure monitoring

As your business expands, you need to scale your infrastructure accordingly. And with the complexity of modern cloud infrastructures, it's crucial to have a comprehensive observability strategy in place. Discover ways to achieve operational excellence throughout your Azure infrastructure with our webinar.

Stop aiming for a 'perfect' monitoring and observability strategy - and start using AIOps

Change is the only constant in today’s continuously shifting IT landscape. Whether you’re adding new observability tools, retiring existing monitoring systems, establishing new business units, or onboarding IT systems from acquisitions, managing these non-stop changes can challenge even your expert ITOps team. Trying to get your monitoring house in order is a daunting task.

Observability for Sustainability

For the past 20 years, the various stakeholder communities that together constitute the IT industry have attempted to address sustainability. The original efforts grew out of the realisation that even as far back as 2005, the hardware and software that underlay the digital world were responsible for approximately 5% of overall energy consumption and that both the percentage and absolute amounts of energy required were growing in the double digits.

Introducing Honeycomb for Kubernetes: Bridging the Divide Between Applications and Infrastructure

In our continuous journey to support teams grappling with the complexities of Kubernetes environments, we’re thrilled to announce the launch of Honeycomb for Kubernetes, a dedicated solution designed to bridge the growing divide between infrastructure/platform teams and application developers. This is available to all plans (including Free!) at no additional cost.

What is Observability? Grafana for Beginners Ep. 1

When you are getting started with observability, the jargon and concepts used to explain observability may go straight over your head. Let’s take out the complexity and talk about observability in the simplest terms possible. Join Lisa Jung, a senior developer advocate at Grafana, to get your learning on with the Grafana for Beginners series. You will learn about concepts such as observability and DevOps and how Grafana can be used to observe your system as a part of your DevOps practice.

APM vs Tracing vs Observability

Application Performance Monitoring (APM), tracing, and observability are fundamental software development and system management approaches. Each of these three concepts uniquely ensures that your applications operate, efficiently, smoothly, and reliably. Your organisation will more than likely already adopt one of these approaches, or even two, potentially all three.