Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Best Elixir APM Tools in 2026: A Developer's Guide

Last updated: May 2026 Elixir applications have performance characteristics that are genuinely different from Ruby or Python. The BEAM virtual machine handles concurrency through lightweight processes, supervision trees restart failed processes automatically, and Phoenix channels can hold tens of thousands of persistent connections on a single node. These are strengths, but they also mean that the performance problems you encounter are different from what most APM tools were built to detect.

The Best Kubernetes Monitoring Tools of 2026

Effective Kubernetes monitoring in 2026 is critical due to increased cluster scale and microservices complexity, demanding a shift toward unified observability (logs, metrics, and traces). The core focus is leveraging AI-driven features to automate anomaly detection, correlate diverse data, and significantly reduce Mean Time to Recovery (MTTR).

Why Alert Fatigue Solutions Still Miss the Root Cause

Alert fatigue solutions have never been better, but on-call engineers are still burning out. Threshold tuning, AI triage, and alert correlation reduce the noise, but every alert that clears filtering lands with the same incomplete telemetry and triggers the same manual investigation cycle. This post explains why the evidence gap survives every fix, and how runtime context changes that.

From vibe code to production-ready: observability for Next.js and Supabase apps

The way we build software has drastically changed over the past few years. What hasn’t changed is that this software ends up in front of real people: you, me, my mom. And when those users inevitably run into something broken, you as the application’s developer need to be equipped with the right tools, context and understanding of what broke, where it broke, and how to fix it as quickly as possible. Every day we’re inching closer to self-healing software.

Migrating Your DX NetOps Integrations from OData 2 to OData 4

If you integrate DX NetOps with external dashboards, reporting engines, or IT service management tools, you likely rely on our API framework. We are currently migrating this framework from OData 2 to OData 4. This transition requires you to update your existing integrations so they continue to function properly. Let me walk you through exactly what is changing, how to identify your active API queries, and the specific adjustments you need to make to your setup.

Easily connect any AI assistant (Claude, Codex, ...) to your Oh Dear data

Oh Dear keeps a watchful eye on your websites: uptime, performance, SSL certificates, broken links, DNS, cron jobs. If something can quietly break, we're already checking it for you. Today we're connecting that data to a new place: your AI assistant. We just shipped an MCP integration. If you use Claude, Cursor, or any other client that speaks the Model Context Protocol, you can now ask questions like "any broken links on my site?" or "when does my certificate expire?" in plain language.

Making Semantic Conventions Work for You With OpenTelemetry Weaver

Your dataset has hundreds of attributes. Some are self-explanatory: http.response.status_code, server.address. Others are not: meta.refinery.reason, dataset.slug, sli.latency_target_ms. If you don't know what an attribute means, you can't write a good query. And if an AI agent doesn't know what it means, it guesses.

What is an Enterprise Knowledge Graph? Definition, Benefits, and Use Cases

Are your AI systems giving answers your teams cannot trust? Most enterprises deploy LLMs expecting reliable outputs, but the results often feel inconsistent or incomplete. The problem is the missing structure behind it. Enterprise data is usually fragmented across multiple systems, teams, and tools. Your AI does not understand how customers, products, policies, and operations connect. Without that context, it fills gaps with assumptions, which leads to unreliable results.

What is AI Agent Orchestration? Concept + How It Works

Have you tried using AI at work and felt it works well for small tasks, but not beyond that? It can handle simple things like creating a summary, writing a draft, or answering a question. This works because the task is clear. But most tasks are not that simple. They involve multiple steps. One step depends on another. Data comes from different systems, and some decisions need checks before moving ahead. This is where a single AI system starts to struggle.

Managing OpenTelemetry at Scale: Why OTel Pipelines Need a Control Plane

OpenTelemetry made telemetry possible everywhere – turning observability pipelines into distributed production infrastructure. Distributed infrastructure requires a control plane for inventory, governance, and safe change. At 500 collectors across hybrid environments, operational overhead becomes a production risk. The moment telemetry pipelines become a distributed infrastructure, they inherit the operational problems of one.