Each year of the SRE Report, there’s a trend or anti-pattern that leaps out and makes us pause and reflect. Last year, for example, we found a huge drop in global toil levels. With the whole world working from home for a full year, it made sense that global toil levels would drop, right? But this year, despite the great reopening underway, toil levels dropped even further - it's a paradox, one which no doubt will require its own scrutiny.
Grafana Loki is designed to be cost effective and easy to operate for DevOps and SRE teams, but running queries in Loki can be confusing for those who are new to it. Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It doesn’t index the content of the logs, but rather a set of labels for each log stream.
We Just celebrated 10 year birthday to Prometheus last month. Prometheus was the second project to join the Cloud Native Computing Foundation after Kubernetes in 2016, and has quickly become the de-facto way to monitor Kubernetes workloads. The plug-and-play experience, just putting Prometheus server and starting to see metrics flowing in tagged with Kubernetes labels, was a compelling offer.
A new year has started and I've been pondering my hopes and dreams for the year to come. In the world of SRE, observability is the most prominent pillar of my work. So, I decided to drill into the topic of observability and what I'd like to see happen in the industry in 2023. Rather than focusing on any tool, technology, or methodology, I'lll be exploring concepts that can be broadly applied in any organization.
Real User Monitoring (RUM) is a method of web performance monitoring that captures user experience metrics on visitors to your website. It is also known as real user metrics, end-user experience monitoring, or simply user monitoring. You can think of Real User Monitoring as an automated way to get user feedback on your website. Not every user will complete a survey or fill out a feedback form, but RUM listens to each one of your users.
Without an active SSL certificate, user contact with the website is no longer secured, making it possible for any malicious entity to access private user information. Users are unlikely to return to the website after viewing a security notice, though. The simplest way to monitor the expiration of your site certificates is to use an efficient, automatic SSL certificate expiry monitoring solution.
Only two days into the new year, and we had our first BGP routing leak. It was followed by a couple more in subsequent days. Although these incidents were brief with marginal operational impact on the internet, they are still worth analyzing because they shed light on the cracks in the internet’s routing system.
Elastic Observability 8.6 introduces a set of capabilities improving production operations through the introduction of host (EC2/GCP compute/Azure compute) observability, application dependency operations views (insights into databases, caches, etc), and a new connector for Opsgenie. These new features allow customers to: Elastic Observability 8.6 is available now on Elastic Cloud — the only hosted Elasticsearch offering to include all of the new features in this latest release.