Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How sum_over_time Works in Prometheus

The sum_over_time() function in Prometheus gives you a way to aggregate counter resets, gauge fluctuations, and histogram samples across specific time windows. Instead of seeing point-in-time values, you get the cumulative total of all data points within your chosen range—useful for calculating totals from rate data, tracking accumulated errors, or understanding resource consumption patterns over custom intervals.

Tracking planes with Grafana in real time: How to visualize the aircraft overhead with your own dashboard

Ever since I was little, I’ve been fascinated by airplanes. Whether it was the excitement of boarding a flight for a holiday or the wonder of admiring them from the ground, there’s always been something magical about these incredible machines. Fast forward a few years, and now we have the ability to track aircraft in real-time from the palm of our hands using a variety of apps.

Observability Data: Ingestion Pipeline Best Practices

Great data is a prerequisite to all things AIOps and observability. Great observability data results in fewer observability gaps, better analysis and insights, and more confidence within teams that rely on the power of modern AIOps and observability technologies. Goals for improved automation, IT efficiencies, intelligent triage and remediation all become more achievable with better data.

StatusGator now supports Microsoft Teams Workflows

We’ve updated our Microsoft Teams integration to support workflows — Microsoft’s new and recommended approach to incoming webhooks. As Microsoft evolves its platform, it is phasing out the legacy Connectors feature in favor of Workflows. At StatusGator, we’re committed to keeping up with these changes so your integrations remain reliable and future-proof.

How Sentry could stop npm from breaking the Internet

Caching is great! When it works… When it fails, it puts a big load on your backend, resulting in either a self-inflicted DoS, increased server bills, or both. This article is inspired by a real-world incident that happened to npm back in 2016. In the next part, Ben recounts his personal experience responding to the incident while working at npm.

Why continuous profiling is the fourth pillar of observability

Developers have long used profilers to diagnose performance bottlenecks and improve the efficiency of their code. But a modern version of profiling, continuous profiling, is quietly redefining what profiling is and what it can do. By running nonstop in production with very low overhead, continuous profilers give teams always-on visibility into how their code behaves in the real world.

How to Build Resilient Telemetry Pipelines with the OpenTelemetry Collector: High Availability and Gateway Architecture

Let’s bring that back. Today you’ll learn how to configure high availability for the OpenTelemetry Collector so you don’t lose telemetry during node failures, rolling upgrades, or traffic spikes. The guide covers both Docker and Kubernetes samples with hands-on demos of configs. But first, let’s lay some groundwork.

7 Clear Signs Your Team Needs Centralized Monitoring

Managing multiple systems without centralized monitoring is like trying to watch security footage from 20 different screens simultaneously. You might catch some issues, but you'll inevitably miss critical problems until they explode into major incidents. If your team is struggling with scattered monitoring tools, delayed incident responses, or constant firefighting mode, it's time to evaluate whether you need a centralized monitoring solution. Here are the key warning signs to watch for.