%term

The latest News and Information on Service Reliability Engineering and related technologies.

4 Cloud-Native Challenges AI SRE Is Solving in 2026 and the 3 New Ones to Look Out For

Jul 30, 2026 By Komodor In Komodor

AI SRE is making real strides in resolving some of the greatest pains related to incident response, troubleshooting, and complex root cause analysis. The on-call rotation, the war room, the week-long RCA, and the ticket queue that ate a third of every platform engineer’s week all look different now than they did two years ago.

Read Post

Komodor

Read more about 4 Cloud-Native Challenges AI SRE Is Solving in 2026 and the 3 New Ones to Look Out For

Install AURA to Debug Incidents Using an Open Source SRE Agent

Jul 28, 2026 By Mezmo In Mezmo

AURA is a fully open-source agentic harness built for SRE and production operations work. In this walkthrough, Mezmo forward deployed engineer Jeff iinstalls AURA on a local desktop, runs `aura init` to generate the config and connect it to an Anthropic Sonnet model, then wires in a Grafana MCP server pointed at his homelab. He hands AURA a live incident: a set of addressable LED lights that stopped responding to Home Assistant.

View Video

Mezmo

Read more about Install AURA to Debug Incidents Using an Open Source SRE Agent

What SREs Can Learn from Revenue Operations (and Vice Versa)

Jul 28, 2026 By OpsMatters In OpsMatters

Site reliability engineers and revenue operations teams rarely sit at the same desk. Software engineers look after cloud infrastructure while operations professionals look after data pipelines and sales funnels. Yet both teams spend their days managing complex systems that can't afford to crash. When you look past the different tools they use, the underlying principles of both roles are almost identical. Let's examine how these two technical worlds can share practical insights to build better business systems.

Read Post

OpsMatters

Read more about What SREs Can Learn from Revenue Operations (and Vice Versa)

Your AI agents are lost: give them a graph

Jul 21, 2026 By Rootly In Rootly

The biggest limitation facing enterprise AI agents may not be the model. It may be the context surrounding it. Anthony Alcaraz, Senior AI/ML Portfolio Growth Manager at AWS and co-author of O'Reilly's *Agentic GraphRAG*, joins Humans of Reliability to explain why reliable agents need more than a vector database and a large context window. They need structured knowledge they can navigate, memory they can prune, constraints they can follow, and feedback loops that help them improve.

View Video

Rootly

Read more about Your AI agents are lost: give them a graph

Better Together: Last9 + Altinity

Jul 21, 2026 By Last9 In Last9

Last9 and Altinity now run observability entirely in your own cloud, metrics, logs, traces, and profiles on an open-source ClickHouse stack, priced on capacity instead of ingestion, with Altinity operating the database so your team doesn't have to. Last9 is an observability platform built for high-cardinality telemetry. It unifies logs, metrics, and traces with native OpenTelemetry and Prometheus support, real-time alerting, and long-term retention.

Read Post

Last9

Read more about Better Together: Last9 + Altinity

View Kubernetes events in Last9

Jul 20, 2026 By Last9 - Monitoring for AI Native SDLC In Last9

View Kubernetes events in Last9 — across clusters, deployments, statefulsets, and even correlated with services.

View Video

Last9

Read more about View Kubernetes events in Last9

Keyboard Shortcuts for Log Viewer - ASMR Style Video

Jul 16, 2026 By Last9 - Monitoring for AI Native SDLC In Last9

Keyboard Shortcuts for Log Viewer - ASMR Style Video https://last9.io/changelog/changelog-january-2026/

View Video

Last9

Read more about Keyboard Shortcuts for Log Viewer - ASMR Style Video

Self Improving Agents in Software Engineering

Jul 16, 2026 By Last9 - Monitoring for AI Native SDLC In Last9

How are you teaching your agents? Are they learning on their own? Does that lead to better results? Listen to @prathameshsonpatki7217 talk about our experience running agents in production for the last 8 months that self improve!

View Video

Last9

Read more about Self Improving Agents in Software Engineering

AI vs. AI: from alert fatigue to agentic cybersecurity

Jul 15, 2026 By Rootly In Rootly

AI is transforming cybersecurity on both sides of the battlefield. Attackers can now launch highly personalized phishing campaigns at scale and build malware capable of making autonomous decisions. At the same time, security teams are using AI agents to investigate alerts, reduce noise, and respond to threats faster. In this episode of Humans of Reliability, we speak with Nir Soudry, Head of R&D at 7AI, about the shift from alert fatigue to agentic cybersecurity.

View Video

Rootly

Read more about AI vs. AI: from alert fatigue to agentic cybersecurity

An SRE agent for production

Jul 14, 2026 By Mezmo In Mezmo

AI has changed how software gets built. It hasn't changed how software gets run. Most of the AI money in software has gone into the IDE: code generation, copilots, developer assistants, faster pull requests. That work matters. But writing software is one slice of the lifecycle. The harder problem, and the more expensive one, is running that software in production. Production is where systems fail in ways nobody predicted. Incidents don't stay inside one service.

Read Post