Operations | Monitoring | ITSM | DevOps | Cloud

4 foundations you need to scale AI in engineering

As a baseline, engineering leaders need their teams to adopt AI tools to speed up velocity and ship faster. Most organizations have already rolled out AI coding assistants or are evaluating them, but there's a really big difference between buying a tool and successfully scaling it across an engineering organization. If you layer AI on top of a chaotic codebase or a disorganized service catalog, you accelerate the creation of legacy code.

Deploy your Spring Boot application to production

In a previous article, we covered how easy it is to create Spring Boot containers with Rockcraft. So the next logical step is to deploy and operate your application in a production environment. The Juju ecosystem is the key to making this process straightforward. In this article we walk through the steps required to deploy a Spring Boot application to production using Juju and Kubernetes.

OpAMP Explained: Why OpenTelemetry Needed an Agent Management Protocol (and How We Use It)

OpenTelemetry makes it easy to produce and transmit any type of telemetry. In production environments, this often means deploying the OpenTelemetry Collector as an intermediary to process, enrich, and route telemetry data. As systems scale, so does this infrastructure—sometimes to hundreds or thousands of Collectors spread across environments.

Announcing HAProxy Kubernetes Ingress Controller 3.2

We’re excited to announce the simultaneous releases of HAProxy Kubernetes Ingress Controller 3.2 and HAProxy Enterprise Kubernetes Ingress Controller 3.2! All new features described here apply to both products. These releases introduce user-defined annotations, a new frontend CRD, and other minor improvements, and we’ll cover these in detail below. Visit our documentation to view the full release notes.

Optimizing BESS Operations: Real-Time Monitoring & Predictive Maintenance with InfluxDB 3

For IT and OT engineers managing Battery Energy Storage Systems (BESS) and other distributed energy resources (DER), the challenge isn’t just dealing with energy. It’s a data problem, or managing the massive stream of real-time telemetry these systems generate. For example, a BESS site produces a constant stream of time-series data from BMS, PCS, SCADA, EMS, and more, and operating it means ingesting, correlating, and acting on that data in real time. And this challenge changes with scope.

Bindplane + Oodle.ai: AI-Native Observability Meets AI-Driven Telemetry Pipelines

Today, we’re excited to announce a new integration between Bindplane and Oodle.ai — combining an AI-driven, OpenTelemetry-native telemetry pipeline with an AI-native observability platform built for extreme scale. With Bindplane acting as the control plane for telemetry and Oodle.ai providing AI-powered analysis across logs, metrics, and traces, you get a single, intelligent, vendor-neutral pipeline from raw telemetry to actionable insight.

Continuous Profiling Explained: Master Performance in Production

Backend systems rarely fail in obvious ways. More often, they degrade over time. CPU usage slowly increases, request latency creeps up, and costs rise without a clear explanation. Metrics tell you something is wrong, traces show where requests go, but neither explains why your code behaves the way it does under real load. Continuous profiling fills that gap. Atatus continuous profiling runs automatically in production with minimal overhead.

Not everything that breaks is an error: a Logs and Next.js story

Stack traces are great, but they only tell you what broke. They rarely tell you why. When an exception fires, you get a snapshot of the moment things went sideways, but the context leading up to that moment? Gone. That's where logs come in. A well-placed log can be the difference between hours of head-scratching and a five-minute fix. Let me show you what I mean with a real bug I encountered recently.

The API Metrics Every SaaS Team Must Track In 2026

API metrics have long been a core part of building and operating reliable SaaS products. Teams track the likes of request volume, latency, and uptime to ensure APIs perform as expected under load. First: API cost intelligence metrics measure how API usage translates into cloud, AI, and third-party spend — and attribute that cost to customers, features, workflows, and teams so SaaS businesses can protect margins as usage scales. But today, the API metrics that matter most go beyond performance.

Your Cloud Economics Pulse For January 2026

Welcome to January’s Cloud Economics Pulse, CloudZero’s monthly look at cloud spend as AI moves from vibe to prod. And this related news flash — AI spend keeps hitting new highs. pilots to production. In last month’s Pulse, we explored the compounding effect of AI becoming part of everyday cloud operations. This month, we see that pattern harden into year-end results.