Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Strategic career decisions ft. Cate Huston, Engineering Director at DuckDuckGo

In this episode of The Confident Commit, Rob Zuber sits down with Cate Huston, Engineering Director at DuckDuckGo and author of "The Engineering Leader," for a deep dive into career ownership and sustainable engineering leadership. Cate challenges the common misconception that career growth equals promotion, introducing the concept of being the "directly responsible individual" for your own career and the crucial difference between "buying" versus "renting" your skills in the marketplace.

How to make Netflix reliable: Address low-hanging fruit

Reliability doesn’t have to be fancy and dramatic. Kolton and his team dramatically improved Netflix reliability by focusing on low-hanging fruit. FULL TRANSCRIPT: My first holiday peak at Netflix, where my VP of engineering came to me and he said, "Kolton, what do you think the chance we make it through the holiday peak without an outage is?"  I thought about it for a minute and I said, "50/50.".

FireHydrant 4-Minute Demo

Get a quick walkthrough of the FireHydrant platform. FireHydrant is the all-in-one incident management platform that helps teams resolve incidents up to 90% faster — and prevent them from happening again. From flexible alerting and powerful automation to retros and AI insights, it brings clarity and control to every step of your response.

DevOps Guide to Monitoring in Serverless Applications

Serverless computing helps teams move faster by removing the need to manage servers. Code runs only when needed, scaling up or down automatically. For DevOps engineers, this means quicker deployments and less infrastructure work. But serverless also brings new challenges. Functions run for short periods, making it hard to track errors, performance, and costs.

Behind Megaport's Network Automation Platform

We’ve teamed up with the Heavy Networking podcast to take you under the hood of Megaport’s resilient, software-driven network. Luke Gollan, Network Automation Engineer at Megaport, joins Heavy Networking hosts Ethan Banks and Drew Conry-Murray to unpack what happens when you click “provision” in the Megaport portal.

Puppet Control Repository: Your Source of Truth for Infrastructure Management

Learn the fundamentals of Puppet's Control Repository with Margaret and Tony in this comprehensive walkthrough. See how Control Repos serve as your single source of truth for managing configuration across your entire infrastructure, driving collaboration and standardization while simplifying code deployments.

AI Reliability Insights: How to Build a Gremlin MCP Server

Gremlin’s Reliability Intelligence helps teams uncover the cause behind failure modes so they can move faster and improve reliability without sacrificing velocity. The new Gremlin MCP Server, part of Reliability Intelligence, gives you new ways to explore your data, giving you access to insights and recommendations to improve reliability and better run your systems using Gremlin. In this webinar, Gremlin CTO Sam Rossoff shows you how to integrate your favorite LLM and use plain language to query data, uncover insights, create dynamic dashboards, and more.

Guardrails and Gains: How Flyway Brings Stability to Cloud Migrations

Avoid the pitfalls of cloud database migration with Redgate Flyway. Learn how automation, schema discipline, rollback strategies, and traceability reduce risk and enable fast, compliant cloud deployments, with these insights from John Q. Martin, Technology Partner Manager at Redgate Software. Cloud migrations promise faster releases and more flexible scaling, but a poorly executed database migration will stop you from exploiting them.