Operations | Monitoring | ITSM | DevOps | Cloud

Gremlin

Podcast: Break Things on Purpose | Carmen Saenz, Senior DevOps Engineer at Apex Clearing

This week Ana sits down with Carmen Saenz, Senior DevOps Enginner at Apex Clearing and PhD student at DePaul University in Chicago, sits down this week to talk about her history in engineering. She brings to the table some anecdotes about her own time engineering chaos. Carmen goes into detail about the early days of chaos engineering and her work there, going from on-prem to the cloud, how she is always learning, her passion for teaching and more.

Podcast: Break Things on Purpose | Zack Butcher, Founding Engineer at Tetrate

Welcome back to another edition of “Build Things on Purpose.” This time Jason is joined by Zack Butcher, a founding engineer at Tetrate. They also break down Istio’s ins and outs and the lessons learned there, the role of open source projects and their reception, and more. Tune in to this episode and others for all things chaos engineering!

Gremlin Chaos Engineering Practitioner Certificate Prep Session

Looking to become one of the world’s first Gremlin-certified Chaos Engineering Practitioners? Find everything you need to prepare for the exam during our prep session! Get an in-depth understanding of exactly what you need to focus on in order to pass the Gremlin Chaos Engineering Practitioner Certificate exam.

Podcast: Break Things on Purpose | Paul Marsicovetere, Senior Cloud Infrastructure Engineer at Formidable

Break Things on Purpose is a podcast for all-things Chaos Engineering. In this episode of the Break Things on Purpose podcast, we speak with Paul Marsicovetere, Senior Cloud Infrastructure Engineer at Formidable.

When Disaster Strikes: Ensuring Your DRP Actually Works

Black swan events are inherently unpredictable—you can’t prepare for every possible threat. Instead, you must identify the ways systems can fail and develop strategies to restore them to full service when these failures happen. But a disaster recovery plan (DRP) can’t be relied on until it’s been proven to work. The use of Chaos Engineering allows you to test your DRP much more safely and predictably than you could otherwise.

SRE's Guide to Chaos & Observability

Today’s distributed, cloud-based environments are incredibly complex. Not only does each component depend on many others, but modern systems are also highly dynamic—changing frequently as teams push new code or make updates to infrastructure. Taming this complexity to ensure reliability requires end-to-end observability to understand how components depend on each other. Additionally, proactive Chaos Engineering combined with AI-driven observability lets you uncover “unknown unknowns” that impact how your system will respond to different failure scenarios.

Building Reliable Applications Webinar 6 17 21

Test-driven development (TDD) is a process that ensures quality in the applications we develop while guarding against feature creep/skew. But as our applications have become increasingly complex, traditional testing methods are not enough. Traditional testing only evaluates what we know, but complex systems often fail due to unknowns—the things that are almost impossible to test because we are unaware of them. Chaos Engineering is the exception that allows us to test for what we don’t know.