%term

Chaos Engineering in 60 seconds - Gremlin Host I/O Attack

Nov 9, 2020 By Gremlin In Gremlin

Chaos Engineering in 60 seconds - Gremlin Host I/O Attack.

View Video

Gremlin

Read more about Chaos Engineering in 60 seconds - Gremlin Host I/O Attack

Chaos Engineering in 60 seconds - Scheduled Shutdown Attack (aka Chaos Monkey)

Nov 9, 2020 By Gremlin In Gremlin

Chaos Engineering in 60 seconds - Scheduled Shutdown Attack. This is commonly referred to as a Chaos Monkey style attack.

View Video

Gremlin

Read more about Chaos Engineering in 60 seconds - Scheduled Shutdown Attack (aka Chaos Monkey)

Is your online gaming platform "Chaos Monkey"-proof?

Nov 9, 2020 By David Sachs In Exigence

Try to imagine a bunch of monkeys running around your data center, pulling cables, trashing routers and wreaking havoc on your applications and infrastructure. Ever more crucial in these days of heated competition between online gaming operators, is player experience. Continuity of operations is “Uber-Alles” and avoiding churn, due to service disruption, is the organizational mantra.

Read Post

Exigence

Read more about Is your online gaming platform "Chaos Monkey"-proof?

Grubhub and JPMC Shift Reliability Testing Left at Chaos Conf 2020

Nov 5, 2020 By Taylor Smith In Gremlin

Get started with Gremlin's Chaos Engineering tools to safely, securely, and simply inject failure into your systems to find weaknesses before they cause customer-facing issues. Gremlin’s Chaos Conf is always an exciting event, bringing together leaders at the forefront of Chaos Engineering practices. This year was no exception, moving beyond defining Chaos Engineering to more advanced adoption and best practices discussions.

Read Post

Gremlin

Read more about Grubhub and JPMC Shift Reliability Testing Left at Chaos Conf 2020

ObservabilityCON Day 4 recap: a panel discussion on observability (and its future), the benefits of Chaos Engineering, and an observability demo showcase

Oct 30, 2020 By Joey Bartolomeo In Grafana

Over the past four days, Grafana Labs' ObservabilityCON 2020 brought together the Grafana community for talks dedicated to observability. We hope you enjoyed all of the sessions, which are available on demand now. (Link to them from the schedule on the event page). The conference wrapped up with predictions and advice from observability experts, lessons in failure, and Grafana Labs team members showcasing ways Grafana and other tools fit into an observability workflow.

Read Post

Grafana

Read more about ObservabilityCON Day 4 recap: a panel discussion on observability (and its future), the benefits of Chaos Engineering, and an observability demo showcase

Chaos Engineering: How to create an automated Chaos Gauntlet with Gremlin and Jenkins on AWS

Oct 29, 2020 By Gremlin In Gremlin

In this video, we will demonstrate how to use Gremlin and Jenkins to create an automated Chaos Gauntlet. This will be done using Jenkins Pipelines and Stages to inject a controlled amount of failure with the Gremlin API. We then add a final stage that allows you to optionally halt the attack from the pipeline, rather than having to wait for the full duration of the attack.

View Video

Gremlin

Read more about Chaos Engineering: How to create an automated Chaos Gauntlet with Gremlin and Jenkins on AWS

Chaos Engineering: The Path to Reliability - Kolton Andrus

Oct 15, 2020 By Gremlin In Gremlin

We’re all here for the same purpose: to ensure the systems we build operate reliably. This is a difficult task, one that must balance people, process and technology during difficult conditions. We operate with incomplete information, assessing risks and dealing with emerging issues. We’ve found Chaos Engineering to be a valuable tool in addressing these concerns. Learn from real world examples what works, what doesn’t, and what the future holds.

View Video

Gremlin

Read more about Chaos Engineering: The Path to Reliability - Kolton Andrus

Identifying Hidden Dependencies - Liz Fong Jones

Oct 15, 2020 By Gremlin In Gremlin

You don't need to write automation or deploy on Kubernetes to gain benefits from resilience engineering! Learn how Honeycomb improved the reliability of our Zookeeper, Kafka, and stateful storage systems through terminating nodes on purpose. We'll discuss the initial manual experiments we ran, the bugs in our automatic replacement tools we uncovered, and what steps we needed to progress towards continuously running the experiments. Today, no node at Honeycomb lives longer than 12 months, and we automatically recycle nodes every week.

View Video

Gremlin

Read more about Identifying Hidden Dependencies - Liz Fong Jones

Lessons from Incident Management and Postmortems at Atlassian - Jim Severino

Oct 15, 2020 By Gremlin In Gremlin

How do you run incidents and postmortems at a company with thousands of engineers spread across the globe? Jim Severino shares what worked (and didn't worked) for Atlassian.

View Video

Gremlin

Read more about Lessons from Incident Management and Postmortems at Atlassian - Jim Severino

Looking back on Chaos Conf 2020

Oct 15, 2020 By Andre Newman In Gremlin

It’s already been a week since we closed our third annual Chaos Conf! While we were forced to take the conference online, this meant that more of you could join us. Over 3,500 people signed up to help make this the world’s largest Chaos Engineering conference. That’s 5x more than 2019, and nearly 10x more than 2018! This is a testament to the growth of Chaos Engineering as a practice across many different industries and around the world.

Read Post