Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Any Apple update can break our app. Here's how we find out first.

This is a guest post by Dan Mindru, a Frontend Developer and Designer who is also the co-host of the Morning Maker Show. Dan is currently developing a number of applications including PageUI, Clobbr, and CronTool. It feels like with every release, we are walking a tightrope. We need to keep our app lightweight, stable, and performant, all the while depending on APIs that can shift at any moment (without warning, too!).

Self-Healing ITOps: Close the Loop From Detection to Resolution

Self-healing ITOps helps restore services faster by combining AI-driven analysis, automation, and recovery validation. Organizations have invested heavily in monitoring, observability, and AIOps. These platforms are effective at identifying issues, but incident resolution is often still a manual process. Engineers still need to investigate alerts, determine the appropriate remediation, and verify that services have recovered.

Overview of Alerts, Real-Time Analysis, & Traceroute

Learn how Uptime.com alerts you the moment a check goes Up or Down, complete with technical details and root cause analysis for API and Transaction checks. Dive into Real-Time Analysis to track outage timelines and get detailed insight into every alert. Plus, see how Traceroute from global or private probe servers helps identify connection issues quickly and accurately. Stay informed. Respond faster. Resolve smarter.

When One Agent Plans and Another Executes, the Planner's View Decides Everything

Split network operations into a planning agent and an executing agent and you have an elegant design on paper. One agent reasons about what should change and validates it. The other carries it out. The elegance is real, and so is the structural consequence: the split puts the entire weight of judgment on the planner. A plan built on a partial view, then executed precisely and at machine speed, is more dangerous than a cautious human who would have hesitated at the part that did not add up.

New in Skylar One - Kyoto: Better Context for Faster, More Confident IT Operations

Modern IT environments do not fail in neat, isolated ways. A network issue in one location can affect a business service somewhere else. A device alert may be the first sign of a larger dependency problem. And when teams are managing infrastructure across data centers, cloud, branches, campuses, and edge environments, the first challenge is often knowing where to look first. The issue is not alert volume alone. It is the missing context between telemetry, service impact, probable cause, and action.

What Is NetFlow, and How Does It Reveal Where Traffic Goes?

In this video, learn what NetFlow is and why it's one of the most effective technologies for understanding network traffic. Discover how NetFlow goes beyond basic bandwidth monitoring by showing who is using your network, what applications are consuming bandwidth, and how traffic patterns change over time. Whether you're a network administrator, IT operations engineer, or infrastructure manager, this video explains NetFlow in simple terms and shows how it helps identify bandwidth hogs, troubleshoot slow networks, and make smarter capacity planning decisions.

You Can't Detect What You Never Collect: Telemetry Coverage in the Agentic SOC

Every detection rule, every threat hunt, every AI agent you deploy rests on one silent assumption: that the data describing an attack actually reached your tools. When it doesn’t, nothing above it can save you, and no one gets an alert that the data was missing. Security teams invest heavily in the sharp end of the stack: detection content, threat intelligence, response playbooks, and increasingly, AI agents to triage and investigate at machine speed.

Could vs. Should: The First Year Managing an SRE Team

As of today, I’ve drafted this post upwards of 10 times – it’s old enough that the version I first started working on was called “Reflections on 1 Year of SRE Management” (I’m currently at 2.5 years). But everything I learned during that first year became critical for the next.

Monitor Your PHP Applications with AppSignal

Good news for PHP developers: AppSignal monitoring is now available for PHP applications. Our new package brings traces, metrics, and logs from your PHP app into AppSignal, with auto-instrumentation for frameworks like Laravel and Symfony and a foundation built on OpenTelemetry. Already using AppSignal's PHP package and want the latest updates? Migrating is straightforward: remove your current OpenTelemetry setup and follow our new install guide.