%term

True reliability takes the whole team

Aug 22, 2025 By Gremlin In Gremlin

Reliability takes the whole team working together. Full transcript: If you really want to get good at measuring your reliability, then you have to work together as a team. Once your software engineer organization has decided, "We're gonna test these applications to make sure that they have redundancy, availability, resilience." Just stick to that framework that you come up with as a team.

View Video

Gremlin

Read more about True reliability takes the whole team

Encourage the boring reliability work

Aug 20, 2025 By Gremlin In Gremlin

Proactive, regular reliability work is boring, repetitive, and EFFECTIVE. And if leadership wants the incredible results it brings, they have to encourage the right behavior.

View Video

Gremlin

Read more about Encourage the boring reliability work

Reliability upholds your promise to users

Aug 19, 2025 By Gremlin In Gremlin

Consistent systems are reliability systems according to Ganesh Seetharaman, Managing Director at @Deloitte. Full transcript: Strong reliability is demonstrated when systems consistently work as expected even during peak demand or unexpected events. When issues do happen, they are resolved quickly and transparently so users experience minimal disruption. Reliability also means data integrity. No matter how much stress the system is under, information needs to be accurate and secure.

View Video

Gremlin

Read more about Reliability upholds your promise to users

How Experiment Analysis uncovers the cause behind failures

Aug 15, 2025 By Gavin Cahill In Gremlin

Chaos Engineering has proven itself to be incredibly effective at tracking down failure modes, remediating reliability issues, and preventing risks before they happen. Unfortunately, it can also come with a steep adoption curve. In order to get the most out of Fault Injection testing, a practitioner needs to have a deep knowledge of the service, its expected behavior, and the code behind it. Ultimately, the rewards are worth the time.

Read Post

Gremlin

Read more about How Experiment Analysis uncovers the cause behind failures

Reliability is when customers aren't impacted

Aug 14, 2025 By Gremlin In Gremlin

Ultimately, a system is reliable when customers and engineers can count on it. Full transcript: When I get to hear stories like, "Hey, we just had our holiday sales event kick off and everything went smoothly and I didn't have to wake up in the middle of the night." That is really the true definition of reliability these people that are constantly hands-on keyboard in charge of making sure that people like myself and like you aren't impacted when we're going to, for example, buy a new pair of sneakers, or we're going to get some sort of limited edition release that's coming out, right?

View Video

Gremlin

Read more about Reliability is when customers aren't impacted

Reliability isn't an afterthought

Aug 12, 2025 By Gremlin In Gremlin

“Reliability must be a crucial outcome for all of the architectures.” —Anish Behanan from @CapgeminiGlobal.

View Video

Gremlin

Read more about Reliability isn't an afterthought

Introducing Reliability Intelligence

Aug 11, 2025 By Gremlin In Gremlin

Reliability Intelligence draws on Gremlin expertise with every test to show you how the test failed and recommended remediation.

View Video

Gremlin

Read more about Introducing Reliability Intelligence

Reliability Intelligence: your reliability expert

Aug 11, 2025 By Gavin Cahill In Gremlin

For the last decade, Gremlin has helped Fortune 500 organizations with critical uptime requirements proactively uncover reliability risks and prevent costly outages. We started with Chaos Engineering, then built Reliability Management to help teams standardize and scale their testing efforts. Today, we take another leap forward with the release of Reliability Intelligence. Reliability Intelligence draws on Gremlin expertise with each test to show you what happened and recommend remediation.

Read Post

Gremlin

Read more about Reliability Intelligence: your reliability expert

The riskiest thing you can do is not measure your risk

Aug 8, 2025 By Gremlin In Gremlin

Hiring good engineers is important, but it’s not enough to prevent outages. You need to measure and track your risk to get real results. Full transcript: My name's Jeff Nickoloff. I'm a principal engineer here at Gremlin. What I hear non-technical functions talk about is really they are much happier to sort of lean on their great engineers. Oh, we've got a great engineering culture. "We don't have reliability issues because we hire the best people.".

View Video

Gremlin

Read more about The riskiest thing you can do is not measure your risk

Avoid the Chaos Engineering bottleneck

Aug 6, 2025 By Gremlin In Gremlin

Chaos Engineering is great, but by itself it can create bottlenecks that limit your reliability journey. FULL TRANSCRIPT: One of the things we've learned while building Gremlin and being the first Chaos Engineering tool to market is with all the greatness that comes with this approach, we've learned some of the downfalls, some of the drawbacks. And one of those is how you scale this practice.

View Video

Gremlin

Read more about Avoid the Chaos Engineering bottleneck

Operations | Monitoring | ITSM | DevOps | Cloud

True reliability takes the whole team

Encourage the boring reliability work

Reliability upholds your promise to users

How Experiment Analysis uncovers the cause behind failures

Reliability is when customers aren't impacted

Reliability isn't an afterthought

Introducing Reliability Intelligence

Reliability Intelligence: your reliability expert

The riskiest thing you can do is not measure your risk

Avoid the Chaos Engineering bottleneck

Monthly Archive

Follow Us