Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Monitor Everything is an Anti-Pattern!

Bullshit and nonsense. But let’s take it from the beginning. The industry’s story goes something like this: Then, in the same breath: You see the contradiction already, right? The same industry that tells you “collect less, simplify, trust the experts” is also the industry where: This isn’t an observability strategy. It’s observability by hindsight. Right. Good. Now we’re having fun.

How to Write a Cover Letter That Actually Helps You Get the Job

Cover letters are supposed to help you shine, but most of them blur together into the same polite, forgettable paragraphs. The intention is good (“I want them to notice me!”), but the execution… not so much. So, here’s a simple, honest guide to writing a cover letter that actually works, especially if you’re applying to Checkly. Spoiler: shorter is better. And authenticity in this AI era is better than perfect polished perfection.

How To Migrate Away From DogStatsD Using Telegraf

Datadog is a popular monitoring platform, and one of its key components is DogStatsD which is a customized extension of the original open-source StatsD protocol. DogStatsD adds powerful features like tagging, histograms, and distributions, but it also introduces vendor lock-in. This is because DogStatsD metrics follow a specific wire format that many other monitoring platforms do not natively support.

Honeycomb Frontend Observability - See Everything

Chapters: In this video we take a tour through Honeycomb's Frontend Observability offerings for Web and Mobile. We see how the launchpads can help spot performance errors, how errors that occur in the frontend can be traced all the way to their cause in other backend services easily with the error investigations feature, and how easy it is to find differences between traces across various devices.

All Is Calm, All Is Compliant: Staying Audit-Ready Through the Year-End Rush

As the year winds down, I find that most cybersecurity and compliance teams are focused on closing projects, hitting targets, and maybe even planning a well-earned break. But regulators? They don’t take holidays. FCA, PRA, GDPR – they remain vigilant, and so should you. For IT leaders, this season often feels like walking a tightrope: balancing operational demands with the relentless need for compliance.

Grafana Service Center: Simplify Service Reliability in One Place

Grafana Service Center gives engineers and stakeholders a single place to ensure service reliability. In this video, Staff Product Manager Ryan Kehoe walks through how Service Center ties together alerts, SLOs, dashboards, incidents, and metadata for each service. Learn how to centralize reviews, speed up investigations, and improve visibility across your teams—all within Grafana Cloud.

Improve service reliability and ops culture with Grafana Cloud Service Center

Today’s engineering organizations are built around service ownership. Service owners are accountable for keeping their services reliable, performant, and ready to scale. But no service operates in isolation; every team depends on others, and those dependencies form a complex web that can be hard to see, let alone understand. To truly deliver reliable systems, you need visibility not only into how your own service performs, but also how it affects others.

AI Agent for Business SLA Predictions: Safeguarding Business Continuity with Predictive Intelligence

Modern business functions are based on the promise of smooth and seamless experience, without the need for downtime or long waits for backend processes to finish. For such digital operations, timely execution of business processes—like financial closings, order fulfilment, report generation—is non-negotiable.