Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Common Kafka Cluster Management Pitfalls and How to Avoid Them

Managing a Kafka cluster is no small feat. While Kafka’s distributed messaging system is incredibly powerful, keeping it running smoothly takes careful planning and a keen eye on the details. Small mistakes in Kafka management can quickly add up, leading to bottlenecks, unexpected downtime, and overall reduced performance. Let’s explore some common Kafka management pitfalls and, more importantly, how to steer clear of them.

Anatomy of an OTT Traffic Surge: The Fortnite Chapter 2 Remix Update

On Saturday, November 2, the wildly popular video game Fortnite released its latest game update: Fortnite Chapter 2 Remix. The result was a surge of traffic as gaming platforms around the world downloaded the latest update for the seven-year-old game. Doug Madory looks at how the resulting traffic surge can be analyzed using Kentik’s OTT Service Tracking.

Monitoring domains and DNSSEC properly

First of all, if you own a domain, the following text is for you. In production you obviously want to reduce outages. And an outage of a DNS domain as such takes down all services under that domain, no matter whether your LAMP components are all up and running. At least from users’ perspective. As usually, roughly speaking, monitoring has to “play end user” to properly discover failures end-to-end. At best you have an Icinga satellite (e.g.

Prometheus 3.0 and OpenTelemetry: a practical guide to storing and querying OTel data

Over the past year, a lot of work has gone into making Prometheus work better with OpenTelemetry—a move that reflects the growing number of engineers and developers that rely on both open source projects. Historically, Prometheus users have faced a number of challenges when trying to work with OpenTelemetry (and vice versa).

Orchestrated vs. Unorchestrated Data? A Simplified Guide

In today’s data-driven world, data is the lifeblood of businesses. As organizations strive to extract maximum value from their data, the concept of “data orchestration” has emerged as a critical tool. Data orchestration is the process of automating and streamlining data flows between various systems and applications. It involves coordinating data ingestion, transformation, and delivery to ensure data consistency, reliability, and accessibility.

Datadog on Building Reliable Distributed Applications Using Temporal

Temporal is an open source platform to build resilient and reliable distributed systems. Datadog started using Temporal in 2020 as the foundation for our internal software delivery platform. Since then, its usage has been widely adopted as a platform that any engineering team can use to build their systems. In this Datadog on episode, Ara Pulido chats with Loïc Minaudier, Senior Software Engineer in the Atlas team, responsible for providing a developer platform on top of Temporal, and Allen George, Engineering Manager in the Datadog Workflows team.

What is a file system?

A file system determines how the operating system stores, organizes, manages, and retrieves data from a storage device. With a file system in place, files are systematically stored and accessed. File systems should not be confused with storage devices like hard disks, SSDs, or USB drives. Let's learn what file systems are, their types, and why they are critical in enterprise environments.

What's new in .NET 9: System.Text.Json improvements

.NET 9 is releasing in mid-November 2024. Like every.NET version, this introduces several important features and enhancements aligning developers with an ever-changing development ecosystem. In this blog series, I will explore critical updates in different areas of.NET. For this post, I will look through advancements in System.Text.Json.