Operations | Monitoring | ITSM | DevOps | Cloud

The Anti-Zombie, Battle-Tested Guide To AI FinOps: 10 Insights

When CloudZero’s CTO Erik Peterson joined the FinOps Weekly podcast in October 2025, he didn’t hold back. Instead of going on about the usual best practices of AI cost optimization, he posed challenges to how we approach AI spending. From “zombie AI experiments” eating your budget to why you should stop apologizing for using AI, these 10 insights from the podcast are worth considering in how we approach AI FinOps. (Watch the full podcast below and keep reading for more!)

How NRP Scales Global Scientific Research with Calico

The National Research Platform (NRP) operates a globally distributed, high-performance computing and networking environment, with an average of 15,000 pods across 450 nodes supporting more than 3,000 scientific project namespaces. With its head node in San Diego, NRP connects research institutions and data centers worldwide via links ranging from 10 to 400 Gbps, serving more than 5,000 users in 70+ locations.

Baking in site reliability with observability and AI: How SpotOn uses Grafana Assistant to keep restaurants running

When you operate a restaurant, the last thing you want to do is shut your doors and turn away guests and staff because of some technology failure. And if you’re the one providing that tech, it’s your job to make sure that doesn’t happen. “For us, observability is about a lot more than just dashboards and alerts.

Console Connect strengthens Google Cloud connectivity with new global locations

Console Connect has expanded its cloud ecosystem with four additional Google Cloud locations, bringing our total to 69 locations worldwide. This growth across three continents gives customers even more options to directly and securely connect to Google Cloud Platform (GCP) from strategic data centre hubs in key international markets, helping enterprises simplify and scale their direct access to GCP wherever they operate. New GCP locations include.

Introducing the ilert × Livewatch native integration

We’re excited to announce that ilert now offers a native integration with Livewatch, unlocking seamless incident escalation from monitoring to response. Starting today, all alerts generated by Livewatch can be automatically ingested, grouped, escalated, and managed from within ilert – closing the loop between detection and resolution.

Implement Distributed Tracing with Spring Boot 3

A slow checkout request. A background job stuck waiting on another service. A log message that looks fine — until performance drops. In a Node.js microservices setup, these are the moments that test your observability. You know something's wrong, but tracing the request across dozens of services feels impossible. Distributed tracing changes that. It connects every span in the request's journey, showing exactly where time is spent and where things start to break down.

The 2025 Guide to Open Source Status Page Software

This is an updated version of the 2024 article. Maintaining transparent communication about service availability is crucial for businesses of all sizes. Status pages are an important part of your communication strategy during times of outages and maintenance events. You can choose to go with a fully managed status page provider or host an open-source one yourself.

Announcing the AI chief of staff for engineering leaders

You see MTTR creeping up, but you don’t know why. You could ask your teams, but that means meetings, pulling people off projects, and waiting days for answers. What if you could just…ask? We’re excited to introduce the new strategic AI chief of staff for engineering leadership, powered by the Cortex MCP. By connecting your Engineering Intelligence data with your scorecards and standards, the MCP allows you to have a strategic conversation about your organization’s performance.

Your metrics, your way: Announcing custom views in Engineering Intelligence

Every engineering organization measures success differently. A dashboard that’s perfect for one team might be meaningless for another. While out-of-the-box views for DORA are a great starting point, leaders need the ability to define and share the specific combination of metrics that matter most to their business. Without this, you're either forcing your teams to conform to generic reports or wasting time rebuilding the same views every week.

Understand how AI is affecting your engineering team with Cortex's AI Impact Dashboard

Rolling out a powerful AI tool like GitHub Copilot is a big win for any engineering leader. But because it’s such a significant investment, leadership will inevitably ask if it was worth the cost. Until now, answering this was nearly impossible. While GitHub provides adoption stats, connecting that data to real-world performance metrics like cycle time or code quality has been a manual, frustrating process. We built the Cortex AI Impact Dashboard to provide a clear answer.