%term

AI reliability needs system reliability

Jan 22, 2026 By Gremlin In Gremlin

AI operates on the same systems and infrastructure as every application, which means if you want to keep it reliable, you have to keep the systems underneath it reliable. Gremlin CEO Kolton Andrus explains more in this clip from an AI reliability roundtable with @nobl9inc and @Pagerduty.

View Video

Gremlin

Read more about AI reliability needs system reliability

Reliability Resolutions: How to build effective reliability programs that won't fade away

Jan 21, 2026 By Gavin Cahill In Gremlin

Did you know the third week of January is the most common time for people to fail New Year’s Resolutions? It doesn’t matter whether it’s exercising more, learning a new language, or just trying to drink less coffee, that initial surge of fresh New Year’s energy is fading, and if you want to make a resolution stick, this is the key time to make a lasting change. The same is true with any reliability resolutions you might have made.

Read Post

Gremlin

Read more about Reliability Resolutions: How to build effective reliability programs that won't fade away

AI reliability requires different SLOs

Jan 16, 2026 By Gremlin In Gremlin

In this webinar clip, Alex Nauda, CTO of Nobl9, explains how keeping AI reliable means changing how you look at SLOs.

View Video

Gremlin

Read more about AI reliability requires different SLOs

We test our own critical dependencies

Jan 14, 2026 By Gremlin In Gremlin

Even if you know a dependency is critical, you still should test it. Otherwise, who knows what will happen if it goes down?

View Video

Gremlin

Read more about We test our own critical dependencies

AI reliability changes how you watch your systems

Jan 8, 2026 By Gremlin In Gremlin

In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls talks about how AI shifts how you watch your systems to keep them reliable.

View Video

Gremlin

Read more about AI reliability changes how you watch your systems

Chaos Engineering strengthens your team

Jan 6, 2026 By Gremlin In Gremlin

Reliability testing not only strengthens your system, it also strengthens your team.

View Video

Gremlin

Read more about Chaos Engineering strengthens your team

Improving reliability starts with the 10 most common failures

Dec 19, 2025 By Gremlin In Gremlin

Failures will occur, but reliability testing helps us understand them instead of being surprised. Gremlin founder and CEO Kolton Andrus sat down with Stephen Townshend on the Slight Reliability podcast to talk about how!

View Video

Gremlin

Read more about Improving reliability starts with the 10 most common failures

How to test application resiliency by simulating the Cloudflare December 2025 outage

Dec 19, 2025 By Gavin Cahill In Gremlin

This fall and winter have had their share of major outages (including AWS, Azure, and Cloudflare), and December was no exception. On December 5, 2025, Cloudflare suffered a 25-minute outage that served responses with HTTP 500 errors to about 28% of HTTP traffic served by Cloudflare. Since Cloudflare handles an average of 81 million HTTP requests per second, this represents a substantial chunk of internet traffic, including LinkedIn, Zoom, and Downdetector.

Read Post

Gremlin

Read more about How to test application resiliency by simulating the Cloudflare December 2025 outage

AI is changing our reliability response teams

Dec 18, 2025 By Gremlin In Gremlin

In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls talks about how AI is bringing AI engineers into incident response teams.

View Video

Gremlin

Read more about AI is changing our reliability response teams

Release Roundup 2025: Reliability across AI, on-prem, and applications

Dec 15, 2025 By Andre Newman In Gremlin

2025 was a stark reminder of why reliability is so critical in the tech sector. The year wrapped up with multiple high-profile outages across several major cloud providers, costing companies around the world billions of dollars. Building resilient systems has never been more of a priority, especially as we move into the era of agentic AI.

Read Post

Gremlin

Read more about Release Roundup 2025: Reliability across AI, on-prem, and applications

Operations | Monitoring | ITSM | DevOps | Cloud

AI reliability needs system reliability

Reliability Resolutions: How to build effective reliability programs that won't fade away

AI reliability requires different SLOs

We test our own critical dependencies

AI reliability changes how you watch your systems

Chaos Engineering strengthens your team

Improving reliability starts with the 10 most common failures

How to test application resiliency by simulating the Cloudflare December 2025 outage

AI is changing our reliability response teams

Release Roundup 2025: Reliability across AI, on-prem, and applications

Monthly Archive

Follow Us