Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Cross-tenant monitoring with Azure Lighthouse and Datadog

Azure Lighthouse is a new feature that provides improved access management for users and applications across different Azure tenants. With Azure Lighthouse, managed service providers (MSPs) can manage their customers’ environments more easily and efficiently than ever before. Datadog is proud to announce support for Azure Lighthouse, which ensures that MSPs can implement a streamlined, scalable approach to monitoring their customers’ Azure environments.

A Guide to Open Source Monitoring Tools

Open source is one of the key drivers of DevOps. The need for flexibility, speed, and cost-efficiency, is pushing organizations to embrace an open source-first approach when designing and implementing the DevOps lifecycle. Monitoring — the process of gathering telemetry data on the operation of an IT environment to gauge performance and troubleshoot issues — is a perfect example of how open source acts as both a driver and enabler of DevOps methodologies.

What do these error codes mean?

The other day whilst using a very popular website I came across a series of 404 unavailable page messages. I didn’t think much about it at the time but on reflection it made me wonder how many people actually understand what different error codes mean? Hands up, I only know a few and I work in the website monitoring sector. To most, it just means a weird IT message when things go wrong.

Prometheus v2.11 Released

Since graduating within CNCF last August, Prometheus has adopted a new schedule for releases every six weeks. The latest release, v2.11, arrived on July 9. Prometheus 2.11 includes a new option to compress WAL records using Snappy, query performance improvements, the option to use Alertmanager API v2, and more. You can download the latest version here. prometheus_tsdb_wal_reader_corruption_errors is now renamed to prometheus_tsdb_wal_reader_corruption_errors_total.

An Introduction to Python List Comprehensions

Python list comprehensions offer a concise method of interacting with each element of a list. Even though they’ve been available since Python 2.0, their syntax often demotivates people from using them. This article aims to introduce List Comprehensions in a friendly way and offer you one more Python feature to add to your scripting toolbox.

Development workflow for serverless applications

Serverless applications require a whole new approach to development workflow. In this article, Lumigo Director of Engineering Efi Merdler-Kravitz details the guiding principles and tools used at a 100% serverless company to ensure the most efficient workflow possible. We are not going to talk about product development flow (no product managers were harmed during the making of this post!).

5 Best Practices for Using AI to Automatically Monitor Your Kubernetes Environment

If you happen to be running multiple clusters, each with a large number of services, you’ll find that it’s rather impractical to use static alerts, such as “number of pods < X” or “ingress requests > Y”, or to simply measure the number of HTTP errors. Values fluctuate for every region, data center, cluster, etc. It’s difficult to manually adjust alerts and, when not done properly, you either get way too many false-positives or you could miss a key event.

How to use ApacheBench for web server performance testing

When developing web services and tuning the infrastructure that runs them, you’ll want to make sure that they handle requests quickly enough, and at a high enough volume, to meet your requirements. ApacheBench (ab) is a benchmarking tool that measures the performance of a web server by inundating it with HTTP requests and recording metrics for latency and success.