Chaos Engineering

How to Convince Your Organization to Adopt Chaos Engineering

Jul 1, 2020 By Gremlin

Win over and convince your coworkers and management to explore and adopt Chaos Engineering and Site Reliability Engineering (SRE). The playbook provides ideas and techniques that can be used to articulate the need and benefits to internal stakeholders in your organization. It also guides the initial implementation in a way that will lead to success and growth across the organization. Implementing something new like Chaos Engineering successfully is a good way to get promoted and help the organization succeed, and this guide is here to help you.

Get White Paper

Gremlin

Read more about How to Convince Your Organization to Adopt Chaos Engineering

Chaos Engineering for MongoDB

Jul 1, 2020 By Gremlin

MongoDB is designed for performance, scale, and high-availability. But, as with any software, you need to test your configuration to verify that it will work as advertised. Ensure that MongoDB performs the way you expect by using Chaos Engineering to test four key features. This guide includes four experiment tutorials to verify that MongoDB will perform reliably: In order to ensure you get the most out of MongoDB's rich features, including built-in data sharding and replication, it's crucial to test your configuration.

Get White Paper

Gremlin

Read more about Chaos Engineering for MongoDB

Performance tuning MongoDB with Chaos Engineering

Jun 26, 2020 By Andre Newman In Gremlin

You’ve pored over the MongoDB documentation, crafted highly polished and well-tuned queries, and confidently deployed your new code to production. Everything ran great at first, but once CPU or RAM usage hit a certain point, your queries suddenly slowed to a crawl. What happened, and how can you prepare for situations like this in the future? This is an unfortunate but common scenario with databases like MongoDB.

Read Post

Gremlin

Read more about Performance tuning MongoDB with Chaos Engineering

Announcing Status Checks to Ensure Safe Chaos Engineering Scenarios

Jun 23, 2020 By Matt Schillerstrom In Gremlin

One of the most important aspects of any Chaos Engineering program is knowing that every experiment is being run safely. And one of the simplest ways to ensure safe experiments is by having safeguards that prevent running chaos experiments on a system that is unhealthy or has an incident in progress. Today, Gremlin is excited to announce Status Checks, which run before you kick off a Chaos Engineering Scenario in order to verify your system is in a steady state.

Read Post

Gremlin

Read more about Announcing Status Checks to Ensure Safe Chaos Engineering Scenarios

Chaos Engineering and Windows: Mitigating common Windows failure scenarios

Jun 18, 2020 By Matthew Helmke In Gremlin

Microsoft Windows is a popular operating system for many enterprise applications, such as Microsoft SQL Server clusters and Microsoft Exchange Servers. About 30% of the world’s web application hosting systems are running Windows, making it an important part of every enterprise’s plans to prevent outages and enhance reliability.

Read Post

Gremlin

Read more about Chaos Engineering and Windows: Mitigating common Windows failure scenarios

Achieving AWS DevOps Competency Status (and What it Means for You)

Jun 16, 2020 By Eugene Wu In Gremlin

Chaos Engineering was conceived as a direct response to the complexity and nondeterministic nature of cloud-based applications. Thoughtful fault injection closes the gap between traditional testing methodologies and modern approaches to software engineering like microservices, continuous delivery, and DevOps.

Read Post

Gremlin

Read more about Achieving AWS DevOps Competency Status (and What it Means for You)

Performing chaos in a serverless world Gunnar Grosch Failover Conf 2020

May 5, 2020 By Gremlin In Gremlin

Chaos engineering is the practice of hypothesis testing through planned experiments to gain a better understanding of a system’s behavior. The principles of chaos engineering have been around for years, and we have now reached the point where chaos engineering has gone from just being a buzzword and practice used by a few large organizations in very specific fields, to it being put in to use by companies of all sizes and industries.

View Video

Gremlin

Read more about Performing chaos in a serverless world Gunnar Grosch Failover Conf 2020

Swim Don't Sink: Why Training Matters to a Site Reliability Engineering Practice Jennifer Petoff

May 5, 2020 By Gremlin In Gremlin

Do you offer training to the engineers in your organization or do you throw them off the deep end to “sink or swim”? Providing training and education is universally important to set team members up for success in your organization and is critical for establishing a thriving Site Reliability Engineering (SRE) or DevOps practice and culture in the first place.

View Video

Gremlin

Read more about Swim Don't Sink: Why Training Matters to a Site Reliability Engineering Practice Jennifer Petoff

Fight, Flight, or Freeze - Releasing Organizational Trauma Matt Stratton Failover Conf 2020

May 5, 2020 By Gremlin In Gremlin

When humans are faced with a traumatic experience, our brains kick in with survival mechanisms. These mechanisms are the familiar fight or flight response, but can also include the freeze response - which occurs when we are terrified or feel that there is no chance of escape.

View Video

Gremlin

Read more about Fight, Flight, or Freeze - Releasing Organizational Trauma Matt Stratton Failover Conf 2020

Y2K and Other Disappointing Disasters: Risk Reduction and Harm Mitigation Heidi Waterhouse

May 5, 2020 By Gremlin In Gremlin

Every disaster is a concatenation of smaller failures. How can we design software and processes to accept that we live in an imperfect world? Explore the concepts of resiliency, harm reduction, over-engineering, and planning for failure with real examples.

View Video