Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Designing for Failure: Choosing the Right Level of Redundancy, Resilience, and Control

Outages don't care how many zones you have. Power failures, software updates, and backbone disruptions all have one thing in common: they do not respect architecture diagrams. Redundancy only works if it is designed at the correct layer. Every team believes they are covered, and yet, when something breaks, the failure reveals that what looked like protection was only an illusion.

Eng Intelligence | Cortex

Unlock the power of Engineering Intelligence with Cortex. In this video, we explore how engineering teams can leverage data, insights, and best practices to improve developer productivity, accelerate software delivery, and scale platform engineering. What you’ll learn in this video: Cortex delivers visibility across your entire ecosystem—helping teams adopt best practices, reduce bottlenecks, and align engineering with business outcomes.

Introducing Magellan: The AI data engine that builds your IDP

Building a catalog used to be a project. It meant months of tracking down owners, untangling dependencies, and manually piecing together a picture of your architecture. It was a tedious, thankless process that delayed the value of your Internal Developer Portal (IDP) before you even got started. Now, it’s a coffee break. We’re excited to introduce Magellan, our new AI-powered data engine designed to build your catalog and get your IDP live in minutes.

A new era for your developer portal: The Cortex MCP is now generally available

Here's a scenario every on-call engineer knows too well: a critical incident fires for a service you’ve never seen before. Your first ten minutes are a frantic scramble across wikis and Slack channels just to answer the most basic questions: Who owns this? What does it do? Where are the runbooks? By the time you’re oriented, the incident has escalated.

Your Password Reset Workflow Is Wasting Everyone's Time

Let’s not mince words; there’s a special place in hell for the password reset ticket. It’s the most boring, most avoidable, and arguably the most expensive waste of time on your service desk. And yet, in 2025, most enterprises still treat password resets like it’s 2005. They route them through manual queues, bury IT teams, and frustrate users who just want to log back in. Even when the password reset is finally resolved, nobody comes away from the experience feeling like a winner.

Choosing the Right APM for Go: 11 Tools Worth Your Time

If you’re building high-performance systems, Golang has probably earned a spot in your stack. Its speed, lightweight concurrency, and quick compile times make it ideal for scalable APIs, microservices, and distributed systems. But those same qualities that make Go powerful can make performance monitoring tricky. Goroutines run fast and in parallel, which means a simple CPU or memory graph doesn’t always tell you what’s slowing things down.