Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Fixing Broken Traces in GCP Cloud Run: A Custom OpenTelemetry Propagator

GCP's load balancer silently rewrites your traceparent header, orphaning spans in any OTLP backend. Here's the custom propagator that fixes it. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

The Hidden Cost of DIY DevOps: Why Growing Companies Bring in the Experts

Companies are scaling faster than ever, but infrastructure rarely keeps up with the product. When developers take on operational work on top of everything else, it feels like a smart way to cut costs. In practice, it's one of the most expensive mistakes a growing software team can make. This article breaks down what DIY DevOps actually costs and how a structured approach changes the equation.

Top tips: When leaders leave, here's how to keep your IT systems stable

Top Tips is a weekly column where we look at what’s shaping the tech world and share practical ways teams can stay prepared for what’s next. This week, we’re focusing on a situation many teams underestimate—what happens to your IT systems when a key leader steps away, and how you can build stability that doesn’t rely on any one person. Some problems don’t show up when things are running smoothly. They show up when someone leaves.

Beyond Uptime: Building a Self-Healing OpenClaw Observability Stack

The allure of OpenClaw is undeniable. You deploy a highly autonomous, self-hosted AI agent, give it access to your repositories and inboxes, and watch it reason through complex workflows while you sleep. It is the dream of the ultimate 10x developer tool realized. But as any veteran DevOps engineer will tell you: running an LLM-backed Node.js agent in production is vastly different from testing it on your local machine.

AI agents are only as smart as the data you feed it

AI is only as useful as the context you give it. An autonomous observability agent can unlock serious value from your telemetry, but only when the foundation is right: good telemetry, a strong data layer, and efficient access to the data. Annie Freeman and Lewis Isaac had a lot to say about this at AWS Summit London this week! hashtag#Observability hashtag#AI hashtag#AWSSummitLondon hashtag#DevOps hashtag#OpenTelemetry.

VictoriaMetrics Virtual Meetup Q1 2026 - VictoriaMetrics Updates

VictoriaMetrics continues to enhance usability and developer experience with new built-in capabilities. A lightweight UI now provides clear client setup instructions, simplifying onboarding, while an integrated inspector offers powerful debugging tools directly within the platform. Default tenant configuration further streamlines initial setup, reducing friction for new deployments. In addition, the MCP Server is now included by default in VictoriaMetrics Cloud deployments, eliminating the need for manual installation and making advanced monitoring workflows more accessible out of the box.

The product signal latency gap slowing your growth

Organizations often call product managers the CEOs of the product. But PMs know that’s a myth. When a CEO wants a status report, they get one immediately. They don’t need to negotiate for engineering time, reconcile conflicting project priorities, or wait for a data scientist to find a gap in their schedule. For most PMs, simply understanding the state of the product is where growth can stall.

Test network paths with TCP, UDP, and ICMP in Datadog

When developers and SREs design application tests, they often prioritize user workflows and API availability. Extending that suite with network tests that match your app’s traffic protocols can reveal whether issues originate in the network or application layer. In this post, we’ll explore how you can design effective network tests using the Transmission Control Protocol (TCP), User Datagram Protocol (UDP), or Internet Control Message Protocol (ICMP), including.

Announcing Icinga 2.16.0 and 2.15.3

We are happy to announce the release of two new versions of Icinga 2 today, 2.16.0 and 2.15.3. The first one includes some new features highlighted below, as well as a number of bug fixes and other improvements. The latter one is a small bug fix release that brings some of the other fixes included in 2.16.0 to the 2.15.x branch as well.

What Is Wrong With PaaS Today?

In the wake of 2010s, PaaS felt like magic. You focused on the code, and the platform did the rest. You could ship a production app without knowing anything about networking or, heck, even what a load balancer is. Heroku in particular made deployment a lost thought, especially for early-stage companies. That era is somewhat over, not because platforms got worse overnight, but because the assumptions underneath them quietly stopped being true.