Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

AI Cost Optimization At Scale: How One CloudZero Customer Manages Spend Across 50+ LLMs

AI adoption isn’t just accelerating, it’s compounding. From GPT-5 to Claude to Llama and beyond, engineering teams are integrating diverse LLMs across products, experiments, and services. And finance teams are now grappling with a new kind of cloud complexity: token-based economics and volatile inference costs, often spread across multi-model, multi-cloud, and multi-region architectures. The modern FinOps stack needs to keep up. CloudZero was built for this moment.

Visualize Logs Alongside Metrics: A Complete Guide for Monitoring Slow MySQL Queries

When a service slows down, metrics will tell you that it’s happening but logs tell you why. For MySQL, slow queries can be a silent performance killer, gradually chewing through resources until users start complaining. By enabling MySQL’s slow query log and forwarding it to Loki (via Promtail), you can visualize query-level details right alongside your metrics on Grafana dashboards. This makes it easy to correlate what is slow (metrics) with what is causing the slowdown (logs).

Practicing What I Preach, Just At Scale

I’ve spent most of my career building and optimizing cloud, on-prem, and data platforms for growing companies. It’s been an amazing journey so far. Through it all, FinOps has become more than just a methodology for me (Fred FinOps didn’t just come from my love of the Flintstones, though I do appreciate a good cartoon). It’s a community, a discipline, a tribe I’ve come to call home. Lately, some tough questions have kept me up at night: These challenges got me thinking.

How engineers can improve creativity ft. Corey Latislaw of Trainline

Engineering leadership isn't just about technical execution—it's about unlocking the creative potential that drives individual and team success. CircleCI CTO Rob Zuber sits down with Corey Latislaw, Head of Engineering at Trainline and executive coaching expert, to explore how creativity transforms both careers and team dynamics.

How Experiment Analysis uncovers the cause behind failures

Chaos Engineering has proven itself to be incredibly effective at tracking down failure modes, remediating reliability issues, and preventing risks before they happen. Unfortunately, it can also come with a steep adoption curve. In order to get the most out of Fault Injection testing, a practitioner needs to have a deep knowledge of the service, its expected behavior, and the code behind it. Ultimately, the rewards are worth the time.

RKE2: Enterprise Kubernetes Made Simple & Secure!

Still wasting weeks on complex configs? Meet your new secret weapon — RKE2! Prasun Das from @Infosys reveals how you can go from zero to a hardened Kubernetes cluster in minutes: Upstream Kubernetes, ready for production One-step CIS hardening — no 200-page manuals! Copy. Start. Done. That easy. Why work harder when you can work smarter? Get speed, security & enterprise power without the grind.

Cortex MCP set up

Learn how to set up the Cortex MCP in under 5 minutes. The MCP integrates directly into your IDE, giving instant access to Cortex data without leaving your coding environment. It reduces context switching by enabling natural questions about services and teams, and streamlines workflows with real-time data from Cortex, Jira, GitHub, and more.

QA Testing in 2025: Revolutionize Your Workflow with Preview Environments

Software quality assurance has changed dramatically over the past few years. Today, the velocity of software development demands more than traditional staging and shared QA environments. Releases are expected to be faster, integration cycles shorter, and quality standards higher. These pressures have inspired a growing interest in preview environments—ephemeral, production-like spaces spun up on demand for testing code changes in isolation.