%term

Performance tuning MongoDB with Chaos Engineering

Jun 26, 2020 By Andre Newman In Gremlin

You’ve pored over the MongoDB documentation, crafted highly polished and well-tuned queries, and confidently deployed your new code to production. Everything ran great at first, but once CPU or RAM usage hit a certain point, your queries suddenly slowed to a crawl. What happened, and how can you prepare for situations like this in the future? This is an unfortunate but common scenario with databases like MongoDB.

Read Post

Gremlin

Read more about Performance tuning MongoDB with Chaos Engineering

Announcing Status Checks to Ensure Safe Chaos Engineering Scenarios

Jun 23, 2020 By Matt Schillerstrom In Gremlin

One of the most important aspects of any Chaos Engineering program is knowing that every experiment is being run safely. And one of the simplest ways to ensure safe experiments is by having safeguards that prevent running chaos experiments on a system that is unhealthy or has an incident in progress. Today, Gremlin is excited to announce Status Checks, which run before you kick off a Chaos Engineering Scenario in order to verify your system is in a steady state.

Read Post

Gremlin

Read more about Announcing Status Checks to Ensure Safe Chaos Engineering Scenarios

Chaos Engineering and Windows: Mitigating common Windows failure scenarios

Jun 18, 2020 By Matthew Helmke In Gremlin

Microsoft Windows is a popular operating system for many enterprise applications, such as Microsoft SQL Server clusters and Microsoft Exchange Servers. About 30% of the world’s web application hosting systems are running Windows, making it an important part of every enterprise’s plans to prevent outages and enhance reliability.

Read Post

Gremlin

Read more about Chaos Engineering and Windows: Mitigating common Windows failure scenarios

Achieving AWS DevOps Competency Status (and What it Means for You)

Jun 16, 2020 By Eugene Wu In Gremlin

Chaos Engineering was conceived as a direct response to the complexity and nondeterministic nature of cloud-based applications. Thoughtful fault injection closes the gap between traditional testing methodologies and modern approaches to software engineering like microservices, continuous delivery, and DevOps.

Read Post

Gremlin

Read more about Achieving AWS DevOps Competency Status (and What it Means for You)

Built-in Application Resiliency Allan Shone Failover Conf 2020

May 5, 2020 By Gremlin In Gremlin

When starting a new application build, starting with an eye on resiliency prevents headaches down the line. There are many ways to tackle this, especially within different language environments and system eco-systems, but there are many shared across them all. Getting a high-level take-away list to use as a reference later, from a dive into them during this talk, viewers will learn how to develop software that is more fault-tolerant and able to with-stand impact of failures.

View Video

Gremlin

Read more about Built-in Application Resiliency Allan Shone Failover Conf 2020

Pitfalls in Measuring SLOs Danyel Fisher & Liz Fong-Jones Failover Conf 2020

May 5, 2020 By Gremlin In Gremlin

We built support for SLOs (Service Level Objectives) against our event store so we could monitor our own complex distributed system. In the process of doing so, we learned that there were a number of important aspects that we didn’t expect from carefully reading the SRE workbook. This talk is the story of the missing pieces, unexpected pitfalls, and how we solved those problems. We’d like to share what we learned and how we iterated on our SLO adventure.

View Video

Gremlin

Read more about Pitfalls in Measuring SLOs Danyel Fisher & Liz Fong-Jones Failover Conf 2020

Human-in-the-Loop DevOps Taylor Barnett Failover Conf 2020

May 5, 2020 By Gremlin In Gremlin

Within DevOps, automation has become a North Star. We want to automate the toil away, but the goal of "no toil" is unattainable. Many runbooks can only be partially automated because they still require human intervention and insights. Human-in-the-Loop DevOps is the idea that we can benefit from automating toil while still embracing the human interaction in specific tasks.

View Video

Gremlin

Read more about Human-in-the-Loop DevOps Taylor Barnett Failover Conf 2020

The Future of DevOps is Resilience Engineering Amy Tobey Failover Conf 2020

May 5, 2020 By Gremlin In Gremlin

For more than a decade, many of us have been working to bring Devops to organizations around the world. We’ve made amazing progress, but there’s so much more to do. Now that we have continuous integration & deployment widespread and developers are taking more ownership of production, what’s next? Amy will talk about what Resilience Engineering is, how it relates to devops, and how she thinks it gives us the science and research we need to take our organizations to the next level of robustness while remaining agile and growing our ability to care for the people around us.

View Video

Gremlin

Read more about The Future of DevOps is Resilience Engineering Amy Tobey Failover Conf 2020

Performing chaos in a serverless world Gunnar Grosch Failover Conf 2020

May 5, 2020 By Gremlin In Gremlin

Chaos engineering is the practice of hypothesis testing through planned experiments to gain a better understanding of a system’s behavior. The principles of chaos engineering have been around for years, and we have now reached the point where chaos engineering has gone from just being a buzzword and practice used by a few large organizations in very specific fields, to it being put in to use by companies of all sizes and industries.

View Video

Gremlin

Read more about Performing chaos in a serverless world Gunnar Grosch Failover Conf 2020

Swim Don't Sink: Why Training Matters to a Site Reliability Engineering Practice Jennifer Petoff

May 5, 2020 By Gremlin In Gremlin

Do you offer training to the engineers in your organization or do you throw them off the deep end to “sink or swim”? Providing training and education is universally important to set team members up for success in your organization and is critical for establishing a thriving Site Reliability Engineering (SRE) or DevOps practice and culture in the first place.

View Video