Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Building Real-Time Telemetry Pipelines for IRIG 106 compliance

Every second of a flight test produces a torrent of telemetry from engines, sensors, and control systems. Aerospace teams have captured this data for decades to verify performance and maintain safety, yet analysis often happens long after the mission ends. Engineers wait for downloads, conversions, and compliance checks before they can interpret results. That delay turns telemetry into a historical record instead of a feedback loop.

When your agents hallucinate at 2 am, it is not a model problem

The first time an AI assistant suggests "restart the service" during a live incident and nobody on the bridge can tell whether that suggestion came from a current runbook, a stale wiki page, or thin air, you stop caring about model benchmarks. You start caring about what the agent actually knew, where that knowledge came from, and whether you can trust the chain of reasoning behind it.

ITSM Maturity Playbook Live, Episode 1: Incident Management Masterclass

Join this 5-part series designed to help IT teams move from reactive, fragmented processes to a more structured, connected way of working. Each session focuses on a core area, from incident resolution and CMDB visibility to employee experience, service catalog design, and change governance, giving you practical frameworks you can apply right away. You’ll walk away with: Faster, more consistent incident resolution.

How to Identify LAN Issues (Local Area Network Problems)

Here is a reality that every network admin eventually runs into: users report slow apps, dropped calls, and broken connections, and the first instinct is to blame the ISP or the cloud provider. The ticket gets escalated, the ISP pushes back, and hours later, you find out the problem was sitting inside your own building the whole time. A saturated switch port. A misconfigured VLAN. A flaky patch cable in the server room.

Proactive vs Reactive Monitoring: What are the Differences?

A single hour of unplanned downtime can cost a mid-sized enterprise more than $300,000, according to ITIC report. Most of that cost comes from one place: teams find out about the problem after users do. That is the core limitation of reactive monitoring. It tells you something has failed, but doesn't tell you something is about to fail. This guide is for IT operations leads, platform and SRE engineers, and IT directors deciding how to evolve their monitoring practice.

Action trails: The missing link between AI and human trust

When people talk about trusting AI, they usually focus on the interface. It summarizes and uses confident language with a level of clarity that feels reliable. But that’s all window dressing. None of it builds trust. Trust doesn’t come from what the AI says. A verifiable record of what the AI did makes it trustworthy.

How to embed Grafana dashboards into web applications

Note: This post originally published in October 2023 and was updated in May 2026 to include new methods and options for embedding Grafana dashboards. Grafana dashboards are powerful and flexible tools for observing applications and infrastructure, so it’s no surprise we get a lot of questions from the community about how to embed them into their web applications.

Web API: your complete guide for custom integrations

Data is almost always scattered across too many tools. Usually, if you want to see it all in one place, you're stuck building messy pipelines or paying for a warehouse you don't really want. SquaredUp is a window into all those tools. It lets you see what’s happening across your entire stack in real time without moving any of the data. Think of it as a universal translator that lets your tools talk to each other so you can stop the manual digging and just see the big picture.

7 Proven Steps to Maintain Operational Continuity During S/4HANA Migration

Migrating to SAP S/4HANA is one of the most consequential system changes your organization will undertake. The technical complexity alone is significant. But the real risk is operational: maintaining uninterrupted service delivery while transforming the core systems your business depends on. Failure to manage this well causes outages, data inconsistencies, user disruption, and cost overruns. None of those are acceptable outcomes. The good news is these risks are manageable.