%term

AI Reliability Insights: How to Build a Gremlin MCP Server

Sep 11, 2025 By Gremlin In Gremlin

Gremlin’s Reliability Intelligence helps teams uncover the cause behind failure modes so they can move faster and improve reliability without sacrificing velocity. The new Gremlin MCP Server, part of Reliability Intelligence, gives you new ways to explore your data, giving you access to insights and recommendations to improve reliability and better run your systems using Gremlin. In this webinar, Gremlin CTO Sam Rossoff shows you how to integrate your favorite LLM and use plain language to query data, uncover insights, create dynamic dashboards, and more.

View Video

Gremlin

Read more about AI Reliability Insights: How to Build a Gremlin MCP Server

How to make Netflix reliable: Address low-hanging fruit

Sep 11, 2025 By Gremlin In Gremlin

Reliability doesn’t have to be fancy and dramatic. Kolton and his team dramatically improved Netflix reliability by focusing on low-hanging fruit. FULL TRANSCRIPT: My first holiday peak at Netflix, where my VP of engineering came to me and he said, "Kolton, what do you think the chance we make it through the holiday peak without an outage is?" I thought about it for a minute and I said, "50/50.".

View Video

Gremlin

Read more about How to make Netflix reliable: Address low-hanging fruit

The 5Rs of reliability

Sep 9, 2025 By Gremlin In Gremlin

Ganesh Seetharaman, Managing Director at @Deloitte, talks about the five Rs of reliability: Readiness, resilience, recovery, responsiveness, and reinforcement.

View Video

Gremlin

Read more about The 5Rs of reliability

Proactive testing means less stress and better results

Sep 5, 2025 By Gremlin In Gremlin

Proactive reliability not only prevents costly outages, it also means your engineers are less stressed so they do their best work. Full transcript: It's not only helping when outages occur, but it's also helping reduce outages. It's this whole culture of blamelessness, right? And oftentimes, when you're in an environment where people are pointing fingers and saying, "Whose fault was it? And why is this thing broken?" and all these other things that are stressing you out.

View Video

Gremlin

Read more about Proactive testing means less stress and better results

Reliability results require visibility & accountability

Sep 4, 2025 By Gremlin In Gremlin

Reliability doesn’t just happen if you build a good tool. It takes visibility and accountability to get results. FULL TRANSCRIPT: One of the things I've observed over the last 10 years in the software engineering culture is this idea of kind of Field of Dreams DevOps. If you build it, they will come. And there's a lot of this in the developer tool space in particular. "Hey, if we just build a great tool and we make it easy to use, engineers will use it because they want to do the right thing.".

View Video

Gremlin

Read more about Reliability results require visibility & accountability

Reliability is more than just numbers

Sep 2, 2025 By Gremlin In Gremlin

Anish Behanan from @CapgeminiGlobal talks about how reliability is more than just numbers—it’s about trust. Full transcript: Reliability is a little bit more emotional than just numbers, right? How does a customer experience remain the same and is trustworthy?

View Video

Gremlin

Read more about Reliability is more than just numbers

How to get fast, easy insights with the Gremlin MCP Server

Aug 28, 2025 By Gavin Cahill In Gremlin

Chaos Engineering and reliability testing give you visibility into the actual reliability of your services by simulating real-world failure conditions. But what if you could dig into the testing and results data using AI to quickly uncover new insights? That’s the logic behind the Gremlin MCP Server. Released as part of Reliability Intelligence, the Gremlin MCP Server allows you to bring your LLM of choice to explore your Gremlin data and find opportunities to get more out of Gremlin.

Read Post

Gremlin

Read more about How to get fast, easy insights with the Gremlin MCP Server

You don't have to live with outages and late nights

Aug 28, 2025 By Gremlin In Gremlin

Outages don’t have to be part of your life and engineers don’t have to burn out being a hero. Spread out your effort and build reliability without the drama. Transcript: You should be great at dealing with outages, but your customers don't care. There's no medals here. No one should have incentive to be paged. There's nothing good about being in a war room for 10 days or in the holiday season in 12 hour shifts around the clock just in case something happens.

View Video

Gremlin

Read more about You don't have to live with outages and late nights

Failover and cloud aren't enough for reliability

Aug 26, 2025 By Gremlin In Gremlin

Amin Momin of @CapgeminiGlobal talks about reliability takes dedicated effort beyond just using the cloud and setting up failover. Full transcript: There are two misconceptions about reliability. One is people only think failover is reliability. Just doing the failover, that will be enough from the reliability point of view. That's the first one. And the second one: we are deployed into the cloud, so it is the service provider's responsibility to provide the reliability.

View Video

Gremlin

Read more about Failover and cloud aren't enough for reliability

Fix issues faster with Recommended Remediations

Aug 22, 2025 By Gavin Cahill In Gremlin

You’ve successfully run a Fault Injection test and uncovered a new failure mode before it impacted customers. And the failure could have taken down your whole system if it had happened in production. Now what? Since this is a potential P1 outage, you absolutely need to address the issue, but that’s going to take some time as you dig through the service to track down the problem. Unfortunately, this is a common conflict.

Read Post

Gremlin

Read more about Fix issues faster with Recommended Remediations

Operations | Monitoring | ITSM | DevOps | Cloud

AI Reliability Insights: How to Build a Gremlin MCP Server

How to make Netflix reliable: Address low-hanging fruit

The 5Rs of reliability

Proactive testing means less stress and better results

Reliability results require visibility & accountability

Reliability is more than just numbers

How to get fast, easy insights with the Gremlin MCP Server

You don't have to live with outages and late nights

Failover and cloud aren't enough for reliability

Fix issues faster with Recommended Remediations

Monthly Archive

Follow Us