Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How we designed empathetic alert sounds for on-call engineers

Being on call is an essential part of operating reliable distributed systems, but it comes with real human costs such as alert fatigue, sudden wakeups in the middle of the night, and the ongoing anxiety of what the next notification might bring. Many engineers know the feeling: Your phone lights up, a sound cuts through the silence, and your heart rate spikes before you’re even fully awake.

Monitor ClickHouse query performance with Datadog Database Monitoring

ClickHouse is widely used for large-scale analytics, but once it is running in production, it can be difficult to understand how query activity translates into resource usage. Engineers investigating performance issues often struggle to determine which queries consume the most memory, run most frequently, or cause spikes in load. In practice, engineers are left querying system.query_log, tailing server logs, and piecing together information after an incident.

What's New in InfluxDB 3.9: More Operational Control and a New Performance Preview

We’ve spent the last few months listening to how teams are running InfluxDB 3 in the wild. The feedback was clear: as you scale, you need less “guesswork” and more control. Today’s release of InfluxDB 3.9 is our answer to that. As more teams move InfluxDB 3 into production, our focus has shifted toward the operational experience: how you manage the database at scale, how you ensure it remains secure, and how you provide a seamless experience for users.

KubeCon Europe 2026: OpenTelemetry Recap from Amsterdam

The reason why I like writing recap articles is because AIs don’t have enough context to write them for us. You have to be there, in person, listen to sessions, interact in the hallways with the community, and absorb as much new knowledge as possible. That’s what I did last week in Amsterdam at KubeCon + CloudNativeCon Europe ‘26. Well, at least I tried to. Let me break down what I consider the most interesting topics were last week.

Status Page Subscriber Management: Notification Groups, Components, and Templates

Your status page is only useful if the right people get the right notifications at the right time. A page that blasts every incident to every subscriber will train people to ignore your emails, or worse, unsubscribe entirely. A page that notifies too slowly will leave customers finding out about your outages from Twitter before they hear from you. I'm Leo, founder of Hyperping.

On-Call Scheduling for Small Teams: Skip the Enterprise Complexity

Updated April 02, 2026 Most on-call guides are written for companies with 50+ engineers, dedicated SRE teams, and budgets for tools that cost $21 per user per month before you even add a second escalation tier. If you have 5 people and a product that needs to stay up, that advice doesn't apply to you. I'm Leo, founder of Hyperping.

BIND 9 CVE-2026-1519: The NSEC3 DoS Vulnerability Putting DNS Resolvers at Risk

On March 25, 2026, the Internet Systems Consortium (ISC) released patches for three vulnerabilities in BIND 9, the most widely deployed DNS server software in the world. The headline flaw — CVE-2026-1519 — carries a CVSS score of 7.5 and is remotely exploitable with no authentication required. An attacker who controls a maliciously crafted DNS zone can trigger the vulnerability by forcing a BIND resolver to process excessive NSEC3 iterations during DNSSEC validation of an insecure delegation.

Operational Truth: The KPI Every C-Suite Will Rely On Next

C-suite leaders are redefining how they measure digital performance. Reliability, customer experience, resilience, and cost efficiency still matter, yet these indicators only hold value when they reflect what is actually unfolding inside the environment. Digital ecosystems have reached a level of complexity where small deviations influence outcomes, and leaders increasingly recognize that traditional metrics cannot be trusted without contextual grounding.

Send your existing OpenTelemetry traces to Sentry

You spent months instrumenting your app with OpenTelemetry. The idea of ripping it out to adopt a new observability backend is not an option. Sentry's OTLP endpoint means you don't have to. In fact, two environment variables are all you need and your existing traces start showing up in Sentry's trace explorer. Sentry's OTLP support is currently in open beta. This means you can start using it today, but there are some known limitations we'll cover later.

AI Didn't Kill the SDLC. It Made It Harder to See

Whilst AI has compressed the visible stages of software delivery; requirements, validation, review and release discipline have not disappeared. They have been pushed into automation, runtime and governance. The real risk is not that the lifecycle is dead, but that organisations start acting as if accountability died with it.