Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Bootstrapping a multi DC cloud native observability stack by Bram Vogelaar

An introduction to Observability and how to setup a highly available monitoring platform, across multiple data centers. During this talk we investigate how to config a monitoring setup across 2 DCs using Prometheus, Loki, Tempo, Alertmanager and Grafana. Bram Vogelaar spent the first part of his career as a Molecular Biologist, he then moved on to supporting his peers by building tools and platforms for them with a lot of Open Source technologies. He now works as a DevOps Cloud Engineer at The Factory.

Tales of A11y In Grafana OS: Introducing Pa11y CI into our pipeline by Alexa Vargas

We want to make Grafana accessible to everyone! In this talk, Alexa will share how Grafana recently introduced Pa11y CI into the Grafana Continuous Integration pipeline. The library supports our developers and contributors to highlight a11y issues. And more importantly, it acts as a gatekeeper, stopping new A11y issues from making it into the project. You will additionally hear about the alternatives that were considered and their challenges. This talk will have everything!

Self Healing Kubernetes at the edge

As developers and businesses are shifting their attention to the edge, everyone wants to build their own edge clusters and manage them. However, building a highly available edge cluster is not easy. Kubernetes simplifies container deployments by abstracting the resource management details from the users, allowing them to deploy using standard CLI or templates.

What SREs Can Learn from Facebook's Largest Outage

Facebook’s October 2021 outage was the type of event that gives SREs nightmares: A series of critical business apps crashed in minutes and remained unavailable for hours, disrupting more than 3.5 billion users around the world and costing about 60 million dollars. As incidents go, this was a pretty big one.

Announcing HAProxy Kubernetes Ingress Controller 1.7

We’re proud to announce the release of version 1.7 of the HAProxy Kubernetes Ingress Controller! In this release, we added support for custom resource definitions that cover most of the configuration settings. Definitions are available for the global, defaults and backend sections of the configuration. This promotes a cleaner separation of concerns between the different groups of settings and strengthens validation of those settings.

10 years of cloud infrastructure with Eric Brewer

In this video, Google Cloud Developer Advocate, Stephanie Wong, speaks with Google Fellow, Eric Brewer, about his experience building infrastructure, including Kubernetes, over the last decade at Google. You’ll get a window into what it was like to help propel Kubernetes into one of the largest open source projects today.

GitLens for Visual Studio Code, and its Creator Eric Amodio, Join GitKraken

For those of you who don’t know me, I’m Eric Amodio, creator of GitLens. I’m an innovator, leader, architect, and seasoned full-stack developer. I started developing GitLens way back in 2016 when I fell in love with Visual Studio Code and wanted to play with what was then newly released extension support. It all started with a simple question: could I add Git insights via CodeLens (hence GitLens) to any document? Which of course was yes, and a whole lot more.

What Is Kubernetes Pod Disruption?

Kubernetes pods are the smallest deployable units in the Kubernetes platform. Each pod signals a single running process within the system and functions from a node or worker machine within Kubernetes, which may take on a virtual or physical form. Occasionally, Kubernetes pod disruptions may occur within a system, either from voluntary or involuntary causes.

An intro to Infrastructure as Code

Infrastructure as Code (IaC) is the practice of recording the desired state of your infrastructure using a declarative language. In this article, I’m going to assume that your team is starting from scratch. Maybe some of your build process has been scripted, and maybe there is some manual testing and quality assurance work happening. Many readers will find that they are midway through the IaC adoption journey I’ll describe, or that they have missed some steps.