Operations | Monitoring | ITSM | DevOps | Cloud

The Observability Journey: Getty Images and Cribl

I recently sat down with Simon Overbey and Lovepreet Singh - the Engineering Manager and systems engineer (respectively) at Getty Images to talk about their experiences implementing Cribl. After getting a rundown of the pre-Cribl environment (described above) I asked to jump straight to the end, the net benefits. If the "before" was a terrifying tidal wave of cost and complexity, what did the "after" look like?

What's new in Calico: Spring 2026 Release

Kubernetes has come a long way since its debut in 2014. It’s gone from running a couple of containerized microservices to orchestrating fleets of production workloads spanning everything from AI agents to full scale VMs running in pods. As Kubernetes adoption grows, and its use cases stretch to cover more ground, managing its increasingly complex networking and security landscape demands operational maturity and a platform that supports it.

Lightweight Server Monitoring - One Binary, No Stack

Monitoring a single server should not require running four daemons. Yet the default open-source recipe for “I just want to watch this one box” still looks like this: install node_exporter, stand up a Prometheus server to scrape it, add Grafana to draw the graphs, and bolt on Alertmanager so you actually hear about a full disk. That is a lot of moving parts — and a lot of YAML — for one machine. This post shows a lighter path.

You don't need a paid plan to use AI Root Cause Analysis

When an error appears in production, the hardest part often isn’t seeing what broke. It’s understanding why. That’s why we built Root Cause Analysis (RCA). It helps connect the dots between an error and its likely cause, so you can spend less time investigating and more time moving forward. Until now, RCA was only available through plans that included AI credits. Starting today, free plan users can purchase an AI credit subscription and use RCA without changing plans.

Splunk Observability at Cisco Live: Agentic Observability for the AI Era

Observability has always been about seeing clearly under pressure. But the pressure has changed. Applications are more distributed. Kubernetes environments keep expanding. Digital experiences depend on services, APIs, networks, third-party providers, and now AI models and agents that can make decisions faster than a human team can review every signal.

From Detection to Resolution: Why ServiceNow + xMatters Is the Fastest Path to Incident Resolution

AI is changing incident management, but not in the way most people think. For years, operations teams focused on getting better at detecting problems. Monitoring improved. Observability improved. AI is now helping teams correlate signals, reduce noise, and identify issues faster than ever before. That’s all valuable, but many organizations are discovering that finding the problem is no longer the hardest part. The harder part is everything that happens next. Who owns the issue?

The AI ROI Company's new groove: CloudZero's new UI, and what it means for customers

Customizability. Feature velocity. Performance. Capabilities that are critically important to all B2B software users. And capabilities in which CloudZero’s brand-new platform specializes. Pitching a total frontend overhaul didn’t necessarily make me CloudZero’s most popular new PM. But it’s made CloudZero faster, more customizable for a wider range of personas, and easier to update with the new features that matter most to our customers. And, if I may say, it also looks beautiful.

Claude Opus 4.8: Pricing, benchmarks, and which model to actually run

Anthropic shipped Claude Opus 4.8 on May 28, 2026, exactly 41 days after Opus 4.7. The SERP was empty for two days after launch. Not because nobody cared. Because engineering managers and finance teams were doing the math on whether the bill changes.

Observability Summit NA 2026: What the Community Is Thinking About

Two days in Minneapolis with the OpenTelemetry community, talking about where telemetry pipelines are headed and what the AI wave is doing to them. Two topics dominated everything: AI and cost reduction. Not as separate conversations, either. The more the community talked about AI telemetry, the more the cost question followed right behind it. I joined Diana Todea from VictoriaMetrics and Antonio Jimenez Martinez from Cisco ThousandEyes on the Telemetry That Matters panel.