Operations | Monitoring | ITSM | DevOps | Cloud

Measuring engineering organizations in the age of AI

Engineering leadership is in the middle of a real transition, and most of the leaders I talk to know it. AI has reshaped how software gets built quickly enough that the operating models many of us spent a decade refining no longer fit cleanly, and there is a great deal of serious work happening across the industry to figure out how these models should evolve. The teams I find most impressive right now are the ones treating their operating model as an open question rather than a settled one.

IsDown is joining UptimeRobot

Today I'm sharing some big news. IsDown is joining UptimeRobot When I started IsDown, the idea was simple. Keeping track of outages across dozens of vendor status pages was painful, and I wanted to make it easy to see, in one place, when the services you depend on go down. Thousands of teams now rely on IsDown to do exactly that. Joining UptimeRobot is the natural next step.

Visibility Isn't Reliability: Why Observability Alone Cannot Protect SLAs

Over the past decade, enterprises have invested heavily in observability platforms designed to deliver comprehensive insight into increasingly complex environments. Modern systems generate continuous telemetry across infrastructure, applications, networks, cloud services, and third-party dependencies. Metrics, logs, traces, and topology maps now provide a level of technical transparency that would have been difficult to imagine only a few years ago.

GitKraken: The Code Flow Company

From plan to main. Software is no longer just a tool. It is the infrastructure of modern life. Software keeps airplanes in the sky and power flowing into our homes. It helps doctors save lives, scientists discover cures, farmers feed cities, and astronauts navigate space. It powers economies, protects supply chains, and connects billions of people across the world. Every major system humanity depends on now depends on software. Which means developers are no longer just building applications.

Track Deployment status for your PRs (Beta)

You shouldn’t have to leave your PR list to know where your code is deployed. Yet, developers constantly lose time context-switching just to see if a change hit staging or production. To solve this, we are launching the Beta version of Deployment Status Tracking for your PRs. This feature surfaces live deployment statuses directly within your PR list view as code moves through your pipeline.

Un-observable AI is Un-trustworthy AI

Recently, someone talked Chipotle’s customer support agent into reversing a linked list – a task completely unrelated to burritos in any way. Screenshots circulated, people laughed, but underneath the joke sat a sharper question. If a production support agent will do that on a public channel, what else will it do that nobody is screenshotting? The bug is funny. The trust gap behind it is not.

Deep AI Investigation for ITOps: What It Is and Why It Matters

Investigation is the most time-consuming and cognitively demanding phase of incident response, and it’s the phase least served by existing tooling. Modern ITOps teams have spent years investing in better detection and alerting. The tools are faster, the dashboards are richer, and anomaly detection keeps improving.

Use This OTel Processor to Prevent Your Dashboards From Breaking

A semantic-convention rename (http.method → http.request.method) can silently break your RED metrics — no errors, just gaps in dashboards and alerts. The OpenTelemetry Collector's schema processor fixes it: put it first in your pipeline and it normalizes attribute names no matter what each service emits. Migration mode writes BOTH the old and new names, so you get zero-downtime upgrades while queries keep working.

Eight best practices for a successful cloud migration strategy

Moving to the cloud is one of the most consequential decisions an IT organization makes. A successful cloud migration strategy sets the foundation for how your business scales, innovates, and competes. But too often, cloud migration initiatives stall, underperform, or force organizations to repatriate applications back on-premises because the groundwork wasn’t laid correctly.

Alibaba Cloud monitoring: What changes when scale, speed, and cost collide

Alibaba Cloud monitoring isn't AWS or Azure monitoring with a different logo. The way its services scale, absorb load, and send early warning signals follows its own logic and if you're watching the wrong things, you'll find out too late. Cloud monitoring conversations often follow patterns set by AWS and Azure. The metrics are familiar, dashboards look the same, and operational playbooks are built around expected infrastructure behavior.