Operations | Monitoring | ITSM | DevOps | Cloud

April 2021

What's new in Grafana Enterprise Metrics 1.3, our scalable, self-hosted Prometheus service

We built Grafana Enterprise Metrics (GEM) to empower centralized observability teams to provide a multi-tenanted, horizontally scalable Prometheus-as-a-Service experience for their end users. The GEM plugin for Grafana is a key piece of realizing this vision. It provides a point-and-click way for teams operating GEM to understand the state of their cluster and manage settings for each of the tenants within it.

Get instant Grafana dashboards for Prometheus metrics with the Elixir PromEx library

I have been using Grafana for almost four years now, and in that time it has become my go-to tool for my application observability needs. Especially now that Grafana allows you to also view logs and traces, you can easily have all three pillars of observability surfaced through Grafana. As a result, when I started working on the Elixir PromEx library, having Grafana be the end target for the metrics dashboards made perfect sense.

Benchmarking Grafana Enterprise Metrics for horizontally scaling Prometheus up to 500 million active series

Since we launched Grafana Enterprise Metrics (GEM), our self-hosted Prometheus service, last year, we’ve seen customers run it at great scale. We have clusters with more than 100 million metrics, and GEM’s new scalable compactor can handle an estimated 650 million active series. Still, we wanted to run performance tests that would more definitively show GEM’s horizontal scalability and allow us to get more accurate TCO estimates.

How PayIt, a secure cloud service provider for digital government, uses Grafana and Prometheus for observability at cloud native scale

A trip to the DMV — and a realization that there had to be a better, more modern way for the system to work — sparked the idea for PayIt, a secure cloud service provider for digital government that launched in 2013. The company’s mission is to help state, local, and government agencies reach their constituents better and more effectively, shifting the reliance from in-office payments to digital ones.

We've added first-class Windows support to Grafana Agent

The Grafana Agent team is happy to announce that Grafana Agent 0.14.0-rc2 includes improved Windows support. Up until now, running Grafana Agent — our tool for gathering metrics, logs, and traces — in Windows was difficult and not well supported for Windows best practices. In short, it was not a good Windows citizen. In the new release candidate, we’re making changes to improve the experience, based on feedback from GitHub issues, customer contacts, and our own experience.

Q&A with Grafana Labs CEO Raj Dutt about our licensing changes

When Grafana Labs CEO and co-founder Raj Dutt announced to the team that the company would be relicensing our core open source projects from Apache 2.0 to AGPLv3, he opened the floor for discussion and encouraged anyone who had further questions to reach out. We believe in honesty and transparency, so we collected hard questions from Grafanistas, and Raj answered them for this public Q&A. The time felt right. As I’ve said publicly before, I’ve been thinking about this topic for years.

Grafana, Loki, and Tempo will be relicensed to AGPLv3

Grafana Labs was founded in 2014 to build a sustainable business around the open source Grafana project, so that revenue from our commercial offerings could be re-invested in the technology and the community. Since then, we’ve expanded further in the open source world — creating Grafana Loki and Grafana Tempo and contributing heavily to projects such as Graphite, Prometheus, and Cortex — while building the Grafana Cloud and Grafana Enterprise Stack products for customers.

Introducing the new Open Distro for Elasticsearch plugin for Grafana, also available in Amazon Managed Service for Grafana

Back in December, Amazon Web Services (AWS) and Grafana Labs partnered to launch the Amazon Managed Service for Grafana in a preview to a limited set of customers. Amazon Managed Service for Grafana is a scalable managed offering that provides AWS customers a native way to run Grafana directly within AWS alongside all their other AWS services.

Easily monitor your Tencent Cloud services with the new Grafana plugin

Plugins make it easier for Grafana users to get faster time to value. With a few clicks, you can start tapping into the different data stores you and your business already leverage — and see them all in one place in your Grafana dashboard. I’m a huge fan of partner-developed plugins for a few reasons, with my favorite being subject matter expertise. Who better to develop your plugin than the team that knows the product inside out?

How to send traces to Grafana Cloud's Tempo service with OpenTelemetry Collector

As an open source company, we understand the value of open standards and interoperability. This holds true for Grafana Cloud and our managed Tempo service for traces, which is currently in beta. The Grafana Agent makes it easy to send traces to Grafana Cloud, but it is not required. In fact, Grafana Cloud’s Tempo service is exposed as a standards-compliant gRPC endpoint that conforms to the Open Telemetry TraceService with HTTP Basic authorization.

How to troubleshoot remote write issues in Prometheus

Prometheus’s remote write system has a lot of tunable knobs, and in the event of an issue, it can be unclear which ones to adjust. In this post, we’ll discuss some metrics that can help you diagnose remote write issues and decide which configuration parameters you may want to try changing. First, let’s discuss how remote write is implemented. In the past, remote write would duplicate samples coming into Prometheus via scrape.

How we use metamonitoring Prometheus servers to monitor all other Prometheus servers at Grafana Labs

One of the big questions in monitoring can be summed up as: Who watches the watchers? If you rely on Prometheus for your monitoring, and your monitoring fails, how will you know? The answer is a concept known as metamonitoring. At Grafana Labs, a handful of geographically distributed metamonitoring Prometheus servers monitor all other Prometheus servers and each other cross-cluster, while their alerting chain is secured by a dead-man’s-switch-like mechanism.

Using Telegraf plugins to visualize industrial IoT data with the Grafana Cloud Hosted Prometheus service

One of the biggest challenges with data visualization for complicated software systems is getting quick access to the underlying data and connecting it to some form of cloud-hosted solution. Traditionally it has required quite a bit of middleware and upfront setup with additional tooling.

You should know about... these useful Prometheus alerting rules

Setting up Prometheus to scrape your targets for metrics is usually just one part of your larger observability strategy. The other piece in the equation is figuring out what you want your metrics to tell you and when and how often you should know about it. Thankfully, Prometheus makes it really easy for you to define alerting rules using PromQL, so you know when things are going north, south, or in no direction at all.