Operations | Monitoring | ITSM | DevOps | Cloud

January 2023

Your Data Just Got a Facelift: Introducing Honeycomb's Data Visualization Updates

Data visualizations take complex information and present it in a clean and easy-to-understand visual. Done right, they can allow quick insight through easy pattern and outlier recognition. Done wrong, it can confuse, obfuscate, and lead to wrong conclusions. Yikes! Over the past few months, we've been hard at work modernizing Honeycomb’s data visualizations to address consistency issues, confusing displays, access to settings, and to improve their overall look and feel.

A beginner's guide to Kubernetes application monitoring

Application performance monitoring (APM) involves a mix of tools and practices to track specific performance metrics. Engineers use APM to monitor and maintain the health of their applications and ensure a better user experience. This is crucial to high quality architecture, development, and operations, but it can be difficult to achieve in Kubernetes since the container orchestration system doesn’t provide an easy way to monitor application data like it does for other cluster components.

Distributed tracing in Kubernetes apps: What you need to know

Kubernetes makes it easier for businesses to automate software deployment and manage applications in the cloud at scale. However, if you’ve ever deployed a cloud native app, you know how difficult it can be to keep it healthy and predictable. DevOps teams and SREs often use distributed tracing to get the insights they need to learn about application health and performance.

Bad Observability

Observability has become a bit of a buzzword in the industry for the last few years. Exactly what "observability" means depends on who you ask, but most people would agree its about both: There's plenty of content out there telling you how to implement observability, or what good looks like. But what about bad observability? What are some anti-patterns to watch out for?

Monitoring Kubernetes layers: Key metrics to know

Kubernetes monitoring can be difficult and complex. In order to determine the health of your project at every level, from the application to the operating system to the infrastructure, you need to monitor metrics in all the different layers and components — services, containers, pods, deployments, nodes, and clusters.

Five eye-catching Grafana visualizations used by Energy Sciences Network to monitor network data

ESnet (Energy Sciences Network) is a high-performance network backbone built to support scientific research. Funded by the U.S. Department of Energy and part of Lawrence Berkeley National Laboratory, ESnet provides fast, reliable connections between national laboratories, supercomputing facilities, and scientific instruments around the globe. Our mission is to allow scientists to collaborate and perform research without worrying about distance or location.

How to use Kubernetes events for effective alerting and monitoring

Kubernetes, a graduated project of the Cloud Native Computing Foundation (CNCF) ecosystem, is the most prominent and widely used container orchestration systems. It’s used to manage and deploy containers in a wide range of environments, from IoT devices based on Raspberry Pis to enterprise environments consisting of millions of services.

Grafana vs. Power BI vs SquaredUp

You’re part of a data-driven engineering team. You have a rich, complex, and dynamic set of tools but you’re struggling to discover and share insights from all that data. So, you're looking for a platform that will help unify it all. Naturally, you want to compare Grafana vs. Power BI - the big names. Plus, there's a new player on the block - SquaredUp.

How to monitor Kubernetes clusters with the Prometheus Operator

Kubernetes has become the preferred tool for DevOps engineers to deploy and manage containerized applications on one or multiple servers. These compute nodes are also known as clusters, and their performance is crucial to the success of an application. If a Kubernetes cluster isn’t performing optimally, the application’s availability and performance will suffer, leading to unhappy users and even revenue loss.

How Grafana Labs unlocks the power of recruitment data with Grafana dashboards

As the recruitment team here at Grafana Labs, we used to struggle to get a comprehensive view of our recruitment data. We had multiple sources of information, but it was difficult to pool that information so we could see the big picture and identify trends and patterns that could help us hire the right talent in a highly competitive market.

Dashboard Fridays: Sample Kubernetes dashboard

Engineers need to understand the status of microservices run on EKS, like health status of clusters and nodes, to avoid issues impacting business critical microservices. Plus, you need to be able to keep an eye on EKS resources, including whether the Kubernetes cluster has auto-scaled (where enabled). Usually, to view these metrics, it requires looking at each EKS cluster and node group individually in the AWS Console, or via another complex third-party dashboarding tool. The data is siloed and difficult to consolidate.

Dashboard Fridays: Sample Azure Monitor Dashboard

These Azure dashboards built in SquaredUp show some of the capabilities of SquaredUp’s Azure plugin. SquaredUp lets you easily create dashboards for your Azure resources, scoping a new tile with just a few clicks. The Azure plugin provides the ability to show metrics, alerts, and cost, as well leverage KQL queries against Application AppInsights and Log Analytics workspaces - all from one plugin. When scoping a tile, you can also choose whether to group, aggregate, sort or filter the data.

Dashboard Fridays: Sample Kubernetes dashboard

Engineers need to understand the status of microservices run on EKS, like health status of clusters and nodes, to avoid issues impacting business critical microservices. Plus, you need to be able to keep an eye on EKS resources, including whether the Kubernetes cluster has auto-scaled (where enabled). Usually, to view these metrics, it requires looking at each EKS cluster and node group individually in the AWS Console, or via another complex third-party dashboarding tool. The data is siloed and difficult to consolidate.

Reduce mean time to hello world with OpenTelemetry, Grafana Mimir, Grafana Tempo, and Grafana: Inside Adobe's observability stack

How is Grafana like an invisibility cloak? At Adobe, it’s one of just four tools they’re using to build observability directly into their CI/CD pipeline, making it essentially invisible — but nonetheless impactful — to thousands of developers across the organization who use it in their day-to-day lives.

Azure Managed Grafana users can now upgrade to Grafana Enterprise

In November 2021, we announced a strategic partnership with Microsoft to develop a Microsoft Azure managed service that lets customers run Grafana natively within their Azure cloud platform. Azure Managed Grafana, which became generally available in August 2022, makes it simple for Azure customers to deploy secure and scalable Grafana instances and connect to open source, cloud, and third-party data sources for visualization and analysis.

4 New AWS Monitoring Dashboards for EC2, EBS, RDS and S3

This is just a quick blog to draw attention to some new and enhanced monitoring dashboards we have added to eG Enterprise in the upcoming release (v 7.2) to provide quick and powerful overviews of a range of AWS services. As with all our dashboards, color-coded overlays provide guided drilldown for help desk operators and administrators. If a component has an issue, an amber or red indicator is overlaid to allow the viewer to click through to further diagnostic information.

Watch: 5 tips for improving Grafana Loki query performance

Grafana Loki is designed to be cost effective and easy to operate for DevOps and SRE teams, but running queries in Loki can be confusing for those who are new to it. Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It doesn’t index the content of the logs, but rather a set of labels for each log stream.

New Year's (observability) Resolutions

A new year has started and I've been pondering my hopes and dreams for the year to come. In the world of SRE, observability is the most prominent pillar of my work. So, I decided to drill into the topic of observability and what I'd like to see happen in the industry in 2023. Rather than focusing on any tool, technology, or methodology, I'lll be exploring concepts that can be broadly applied in any organization.

How to forecast holiday data with Grafana Machine Learning in Grafana Cloud

A little over a year ago, we released Grafana Machine Learning, enabling Grafana Cloud Pro and Advanced users to easily view forecasts of their time series. We recently enhanced Grafana Machine Learning with Outlier Detection, which allows you to monitor a group of similar things, such as load-balanced pods in Kubernetes, and get alerted when something starts behaving differently than its peers.

Spot Eco: Introducing a faster, more flexible dashboard

With Eco’s automated reservation management, maximizing savings on your cloud bill and increasing your team’s bandwidth is easy. The new dashboard now includes responsive modular components and an improved commitment filter, so tracking savings and monitoring your environment is even easier.

How to monitor Kubernetes with Grafana and Prometheus: Inside Powder's observability stack

David Calvert is a site reliability engineer working remotely from the south of France. He’s currently focused on observability, reliability, and security aspects of cloud infrastructure. You can find him as dotdc on GitHub and @0xDC_ on Twitter. Over the past three years, I’ve built and operated Kubernetes clusters for two different companies — the first one on-premises, and the second on a public cloud platform for my current job at Powder.

How to use the Grafana Ansible collection to manage Grafana Agent across multiple Linux hosts

Anyone who is trying to set up monitoring for multiple machines knows how tough it can get to manage multiple Grafana Agents across them. To make things easier, we recently added the Grafana Agent role to the Grafana Ansible collection, which will help users manage the Agent across multiple Linux hosts. (Need to know how to get started with the Grafana Ansible collection for Grafana Cloud?

4 billion logs, 120 TB of data: How Just Eat Takeaway.com uses Grafana Cloud to scale

In 2017, Just Eat Takeaway.com (JET) was transitioning from a scrappy startup to a surging scaleup. With a global customer base and workforce, the food delivery marketplace’s front line teams needed to scale the real-time monitoring of the platform. Their initial efforts looked like “NASA’s mission control with Grafana dashboards,” said Senior Technology Manager Alex Murray.

Phantom Metrics: Why Your Monitoring Dashboard May Be Lying to You

Whether you’re a DevOps, SRE, or just a data driven individual, you’re probably addicted to dashboards and metrics. We look at our metrics to see how our system is doing, whether on the infrastructure, the application or the business level. We trust our metrics to show us the status of our system and where it misbehaves. But do our metrics show us what really happened? You’d be surprised how often it’s not the case.

How JPMorgan Chase uses Grafana and AI to monitor SLOs, SLIs, and more

For the team at JPMorgan Chase, the daily stakes of having a stable system are high. “We are in the business of making sure that trades are executed, and systems are stable and up and running for a positive client experience,” said Askari Imam, VP, Asset Wealth Management (Product and Integration Delivery).

Unreadable Metrics: Why You Can't Find Anything in Your Monitoring Dashboards

Dashboards are powerful tools for monitoring and troubleshooting your system. Too often, however, we run into an incident, jump to the dashboard, just to find ourselves drowning in endless data and unable to find what we need. This could be caused not just by the data overload, but also due to seeing too many or too few colors, inconsistent conventions or the lack of visual cues.