Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Using a transformer-based text embeddings model to reduce Sentry alerts by 40% and cut through noise

Sentry uses Issue Grouping to aggregate identical errors and prevent duplicate issues from being created, and duplicate alerts being sent. One of the chief complaints we’ve heard from our users is that in some cases the existing algorithm did not sufficiently group similar errors together, and Sentry would create separate issues and alerts, causing unnecessary disruption–or at least annoyance–to developers.

AWS CloudWatch Custom Metrics: Types & Setup Guide [With Examples]

Amazon CloudWatch is a monitoring and observability service that provides real-time insights into AWS resources and applications. While CloudWatch provides many default metrics, sometimes you need custom metrics to monitor specific aspects of your infrastructure or applications. This guide covers everything you need to know about CloudWatch custom metrics, from basics to advanced use cases.

Getting Started with OpenTelemetry Java SDK

Understanding how your applications perform is crucial. OpenTelemetry has emerged as a powerful observability framework, offering a standardized approach to collecting telemetry data such as metrics, logs, and traces. For Java developers, the OpenTelemetry Java SDK provides the tools necessary to instrument applications effectively. This guide is all about the OpenTelemetry Java SDK, exploring its components, configuration, and advanced features to help you harness its full potential.

AppSignal Now Offers Support for Long-Running Streaming Rack Responses in Ruby

We're excited to announce that AppSignal now offers improved monitoring for long-running streaming Rack responses. Our improved Rack response monitoring means you can gain deeper visibility into the health of your Ruby application's long-running responses, allowing you to catch errors that may arise minutes or even hours after a request's body is served. This new layer of observability results from a valuable contribution from Julik Tarkhanov, Director of Engineering at Cheddar Payments.

5 Ways to Prevent CPU Overload on Linux Servers

Every server administrator’s nightmare starts with a message: “CPU usage at 100%” It’s that critical moment when your Linux server transforms from a reliable workhorse into a sluggish mess, taking your applications and user experience down. We’ve all been there… staring at a terminal, watching load averages climb, while frantically trying to figure out which process decided to throw a CPU-hungry party on our server.

How to Stop Memory Leaks Before they Crash Your Linux System

Imagine you’ve got a leaky faucet in your kitchen. At first, it’s just a drip here and there—annoying, sure, but not enough to ruin your day. But leave it unchecked, and soon that drip turns into a steady trickle. Your water bill skyrockets, the sink overflows, and before you know it, you’re ankle-deep in chaos. Now, replace that faucet with a Linux system, and you’ve got a memory leak.

Full Guide to Linux Disk IO Monitoring, Alerting and Tuning

Disk IO (Input/Output) is a core aspect of system performance. Whether you’re managing a database, a web application, or a cloud server, how efficiently your system reads and writes data affects everything from response times to stability. Unlike high CPU usage or memory bottlenecks that often manifest immediately, disk IO issues tend to creep up silently—until they slow down critical processes.

Why Observability 2.0 Is Such a Gamechanger

One of the hardest parts of my job is to get people to appreciate just how much of a difference Honeycomb/observability 2.0 is compared to their current way of working. It’s not just a small step up or a linear improvement. Rather, it’s an entire step change in the way that you write, deploy, and operate software for your customers.