Operations | Monitoring | ITSM | DevOps | Cloud

90% AI Adoption. Still Failing. DORA Explains Why.

AI adoption is nearly universal. So why are most teams still struggling? In this session from GitKon, Nathen Harvey, head of DORA at Google Cloud, shares findings from the 2025 DORA State of AI-Assisted Software Development report, drawing on data from nearly 5,000 developers worldwide. The answer isn't more AI. It's what surrounds it.

That's Not a Job for an LLM: The Right Way to Apply AI to Network Operations

LLMs have sucked all the oxygen out of the AI conversation — but AI is much more than just LLMs, and network engineers have been using AI techniques (machine learning, statistics, fuzzy logic, expert systems, neural networks) for decades. So what should LLMs be doing in network operations, what shouldn't they be doing, and how do agentic AI architectures fit in?

Why Your Agentic AI Aspirations Need to Evolve from Models to a Workflow Data Fabric

Enterprise conversations today are dominated by one phrase: Agentic AI. Across boardrooms and innovation labs, organizations are experimenting with copilots, autonomous agents, and AI bots capable of resolving tickets, recommending actions, and orchestrating complex processes. The promise is real — AI that doesn't just generate insights, but takes meaningful action. Here's the uncomfortable truth: most enterprises are architecturally unprepared for the agentic future they're trying to build.

Understanding disaggregated GenAI model serving with llm-d

llm-d is an open source solution for managing high-scale, high-performance Large Language Model (LLM) deployments. LLMs are at the heart of generative AI – so when you chat with ChatGPT or Gemini, you’re talking to an LLM. Simple LLM deployments – where an LLM is deployed to a single server – can suffer from latency issues, even with just one user. This can be because of lack of memory-bandwidth on the server, or because of KV cache pressure on system memory.

SRE agent vs. traditional engineer: 7 key differences

The role of a Site Reliability Engineer (SRE) is evolving. The focus has shifted from simply working harder during an outage; A new kind of teammate is here to help: the SRE Agent. But what are the key differences when you compare an SRE agent versus a traditional site reliability engineer? This isn’t just a superficial change. It signifies a fundamental alteration in how teams construct and sustain dependable services.

What is AI SRE? The Complete Guide to AI-Assisted Site Reliability Engineering

It's 2:47 AM. PagerDuty fires. You open a Slack alert and see: p99 latency spike on checkout-service. You SSH into the host, check dashboards in four tabs, grep logs for the last 20 minutes, and eventually find a slow query introduced in a deploy six hours ago. It took 34 minutes. You resolved it, w Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Code Agents Need Observability

For those of us using tools like Claude Code, Codex, or Gemini, we already know they’re powerful. They can write code, refactor functions, open PRs, even run commands. For a lot of developers, they’re already part of the daily workflow. But once you zoom out beyond the individual developer, the biggest problem isn’t productivity. It’s control. AI coding tools are powerful, but they introduce a new, unpredictable cost layer that most teams don’t fully understand.

How AI Is Reshaping Bill of Materials Management

Most of what gets written about AI in manufacturing is hype. I've sat through enough vendor demos to recognize the pattern: a slick interface, cherry-picked examples, and a vague promise that machine learning will "transform" something. Half the time the underlying problem could have been solved with a structured database and a junior analyst.