Operations | Monitoring | ITSM | DevOps | Cloud

The AI-Empowered Site Reliability Engineer: Automating the Balance of Risk and Velocity

You might expect an AI-SRE agent to target 100% reliable services, ones that never fail. It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a non-linear cost: maximizing stability limits how fast new features can be developed, dramatically increases the operational cost, and reduces the features a team can afford to offer.

Agentic AI Essentials: The Dashboard and Changing IT Roles

Dashboards provide a useful prism through which we can study the broader evolution of the IT professional’s role in the era of agentic AI. For years, dashboards have been the centerpiece of IT work, serving as the interface where teams interpret system behavior, diagnose issues, and plan actions. Dashboards epitomize the relationship between humans and their systems: humans observe, interpret, and act. As agentic AI enters the picture, that relationship begins to change. Let’s explore how.

From Blueprint to Production: Building a Kubernetes MCP Server

As Large Language Models (LLMs) evolve from simple chatbots into agentic workflows, the need for a standardized way to connect them to external data and infrastructure has become critical. In a recent workshop hosted by Nir Adler, Innovation Engineer at Komodor, we explored how to bridge this gap using the Model Context Protocol (MCP).

Why MCP is becoming part of your product surface

AI assistants are quickly becoming a primary interface for how people interact with software. Developers ask them how to integrate APIs. Users ask them how products work. Buyers ask them how tools compare. Increasingly, the first explanation someone receives about your product does not come from your website, your documentation, or your sales team. It comes from an AI assistant. That shift has an important consequence that many organizations are only starting to notice.

Upsun's AI story: the 5% path from pilots to production value at scale

Here’s the uncomfortable truth: most companies do not have an AI problem. They have a delivery problem wearing an AI costume. MIT’s Project NANDA research has been widely cited for a brutal headline statistic: roughly 95% of corporate generative AI pilots fail to produce measurable business impact or returns, while only about 5% break through to meaningful outcomes. (Yahoo Finance) The models are impressive. The demos are dazzling. The budgets are real.

Intelligent FinOps: AI-Informed, AI-Enabled

AI is the new frontier for FinOps maturity. It introduces fresh spend patterns and new opportunities for value. As GPUs, inference, and retraining reshape costs, FinOps maturity grows through visibility, forecasting, and shared mindset about how these workloads drive business impact. In this 2025 post, I gave my guidelines for implementing AI tagging to give business context and clarity to vague AI invoices. Now, I’m sharing the next level up: how to drive FinOps in AI with AI.

(Tech Talk) Shipping with Context Knowledge Graphs as the Backbone of AI-First Software Delivery

Knowledge graphs are essential to solving the context bottleneck in AI-First software delivery, which occurs because workflows, policies, and dependencies are siloed and invisible to AI agents. In this Tech Talk, Prateek Mittal ((Product Director of AI Core and Data Platform at Harness)) discusses the key concepts: Knowledge Graphs vs. Observability: Observability tells you "what is happening," while knowledge graphs tell you "what does that mean" by modeling structured relationships. They work together to link live signals to affected services or SLAs.

We Built an MCP Server

When I joined Kubex last year, the company was already well aware of the growing power of Large Language Models. As a company focused on intelligent resource optimization for Kubernetes, GPUs, and cloud infrastructure, generative AI didn’t feel like a threat so much as a natural extension of where the industry was heading. Kubex had already invested heavily in machine learning, but it was becoming clear that foundation models could unlock an entirely new class of capabilities for our customers.