Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

What Leading Engineering Teams Teach Us About Operational Truth

Modern operational environments are intricate ecosystems shaped by distributed architectures, accelerating change cycles, and a constant influx of telemetry. The complexity itself is not the issue. The issue is how teams construct understanding inside that complexity. After years of expansion across cloud, edge, third-party services, and internal modernization efforts, many organizations now have abundant data but limited confidence in the meanings behind it.

Getting Started with XcodeBuildMCP: Let AI Agents Debug Your iOS Apps

XcodeBuildMCP gives AI agents the ability to build, test, and debug native iOS and macOS apps. In this hands-on workshop, we show you how to use the open source MCP server to unlock the full developer loop — build, run, debug, interact, and verify — without leaving your preferred AI coding environment.

Innovation Week Day 1: The SDLC Is Collapsing, and Observability Has Never Mattered More

The software development lifecycle is collapsing. The multi-stage pipeline that defined how software got built and shipped for decades is compressing into rapid loops of intent and validation, with agents now part of the teams building and running it. Day 1 of Innovation Week was about what that shift means for how software gets validated, where observability fits, and the problems that have always been hard but are now genuinely urgent.

Contributing Distributed Partition Ownership to the Azure Event Hub Receiver

If you're running OpenTelemetry collectors against Azure Event Hubs, distributed partition ownership and checkpointing just got significantly better. Your fleet now self-organizes. Failover is automatic. Restarts don't lose data. Here's how we got here.

AI-assisted testing, extensions updates, and more: k6 2.0 is here

For years, teams have relied on k6 to take a more proactive approach to performance testing, ensuring they can catch issues early and deliver more reliable user experiences. That approach has helped make k6 one of the most widely used performance testing tools in the open source community today, with more than 30k stars on GitHub. Last year, we introduced k6 1.0, a major release that brought TypeScript support, native extensions, revamped test insights, and production-grade stability guarantees.

Why the Operational Complexity of E-Commerce Reaches a Critical Point in 2025

Modern webshops no longer run on a single system. Behind the digital storefront lies an architecture made up of dozens of components: from product information management to caching layers, from search engines to payment providers. For operations teams, this means the classic LAMP stack from 2010 is now a distant memory.

Monitoring Your Azure to Azure Local Migration: One Dashboard for Both Sides

More organizations are moving workloads from Azure public cloud to Azure Local (formerly Azure Stack HCI) than most people realize. The reasons vary: data sovereignty requirements, latency-sensitive workloads that need to be closer to the edge, cost optimization for predictable workloads where reserved cloud capacity doesn’t make financial sense, or regulatory constraints that require data to stay on-premises.

AURA in Practice: Mezmo's SRE bot, demo walkthrough

A walkthrough of the Slack-based SRE bot Mezmo's engineering team built on AURA, the open-source agent harness, running against Mezmo's own production tooling. Adrian Furlong shows the bot answering questions in a DM with tool calls visible inline, then in a shared channel where it reads the conversation before responding. He opens a fresh PagerDuty incident on camera. The webhook fires AURA, and within seconds, the agent posts a triage note back on the incident and a structured analysis in the dedicated incident channel.