Operations | Monitoring | ITSM | DevOps | Cloud

{unscripted} AI in Chaos Engineering

Harness AI enhances your chaos engineering capabilities by leveraging artificial intelligence to automate and optimize reliability testing and analysis. One of the challenges of scaling up the Chaos Engineering practice within the organization is skilling up the users to create or run chaos experiments and to come up with solutions to mitigate the risks that are identified during the chaos experiment execution. The Chaos Engineering module comes with an AI Agent called "AI Reliability Agent" that helps in these aspects.

AI-Powered Chaos Engineering with Harness MCP Server and Cursor

The Harness MCP Server integration with Cursor transforms chaos engineering from a complex, specialized discipline into an accessible, conversational workflow that any developer can leverage directly within their AI-powered IDE. By combining natural language prompts with comprehensive resilience testing tools, teams can discover, execute, and analyze chaos experiments without vendor-specific expertise, democratizing system reliability across DevOps, QA, and SRE functions.

Security vs. ops: the two sides of reliability

Security and ops work together to keep your systems reliable, but why do we treat them so differently? Reliability results start when you proactively take charge of your infrastructure and application risks. Transcript: When we talk about reliability in the software space and the digital operations space, you really end up falling into these two different mindsets.

Reliability means smooth on-call and a strong team

True reliability is when your engineers have confidence in their systems and their teams. Full transcript: Reliability to me means my on-call shift is gonna be smooth because everybody is making the attempts to be smart about the type of code that we're writing. And we're regularly testing to make sure that our system has redundancy and can withstand latency spikes, it can withstand resource spikes.

AI Reliability Insights: How to Build a Gremlin MCP Server

Gremlin’s Reliability Intelligence helps teams uncover the cause behind failure modes so they can move faster and improve reliability without sacrificing velocity. The new Gremlin MCP Server, part of Reliability Intelligence, gives you new ways to explore your data, giving you access to insights and recommendations to improve reliability and better run your systems using Gremlin. In this webinar, Gremlin CTO Sam Rossoff shows you how to integrate your favorite LLM and use plain language to query data, uncover insights, create dynamic dashboards, and more.

How to make Netflix reliable: Address low-hanging fruit

Reliability doesn’t have to be fancy and dramatic. Kolton and his team dramatically improved Netflix reliability by focusing on low-hanging fruit. FULL TRANSCRIPT: My first holiday peak at Netflix, where my VP of engineering came to me and he said, "Kolton, what do you think the chance we make it through the holiday peak without an outage is?"  I thought about it for a minute and I said, "50/50.".

Proactive testing means less stress and better results

Proactive reliability not only prevents costly outages, it also means your engineers are less stressed so they do their best work. Full transcript: It's not only helping when outages occur, but it's also helping reduce outages. It's this whole culture of blamelessness, right? And oftentimes, when you're in an environment where people are pointing fingers and saying, "Whose fault was it? And why is this thing broken?" and all these other things that are stressing you out.

Reliability results require visibility & accountability

Reliability doesn’t just happen if you build a good tool. It takes visibility and accountability to get results. FULL TRANSCRIPT:  One of the things I've observed over the last 10 years in the software engineering culture is this idea of kind of Field of Dreams DevOps. If you build it, they will come. And there's a lot of this in the developer tool space in particular. "Hey, if we just build a great tool and we make it easy to use, engineers will use it because they want to do the right thing.".