%term

The latest News and Information on Service Reliability Engineering and related technologies.

Creating Chaos to Achieve Reliability

Apr 22, 2021 By JJ Tang In Rootly

How can creating chaos achieve better reliability? Chaos and reliability might seem mutually exclusive, but through the use of Chaos Engineering, SREs can bring about meaningful changes to system resiliency.

Read Post

Rootly

Read more about Creating Chaos to Achieve Reliability

Using Coralogix + StackPulse to Automatically Enrich Alerts and Manage Incidents

Apr 20, 2021 By Jonathan Brown In Coralogix

Keeping digital services reliable is more important than ever. When something goes wrong in production, on-call teams face significant pressure to identify and resolve the incident quickly – in order to keep customers happy. But it can be difficult to get the right signals to the right person in a timely fashion.

Read Post

Coralogix

Read more about Using Coralogix + StackPulse to Automatically Enrich Alerts and Manage Incidents

Should You Be an SRE or a DevOps Engineer?

Apr 15, 2021 By Quentin Rousseau In Rootly

SREs may have better long-term job prospects, but DevOps might be an easier career to pursue.

Read Post

Rootly

Read more about Should You Be an SRE or a DevOps Engineer?

Creating Custom Slack Commands

Apr 15, 2021 By FireHydrant In FireHydrant

Site Reliability Engineers are expected to know everything that’s happening, all of the time. That’s a lot of things! To help you sift through the noise, we’ve developed a feature that lets you find accurate data about your organization on-demand. You can do this by sending custom-designed commands to FireHydrant directly from your integrated Slack account.

Read Post

FireHydrant

Read more about Creating Custom Slack Commands

Catchpoint Announces Virtual SRE Community Event on June 10

Apr 13, 2021 By Catchpoint In Catchpoint

'SRE From Anywhere' will be the largest community event for Site Reliability Engineers to learn and share best practices for delivering best digital performance.

Read Post

Catchpoint

Read more about Catchpoint Announces Virtual SRE Community Event on June 10

How Would an SRE Conduct a Postmortem on the Suez Canal Incident?

Apr 7, 2021 By JJ Tang In Rootly

The Suez Canal has been big news over the last couple of weeks. We wondered how a Site Reliability Engineer (SRE) might conduct a postmortem on what happened with the Ever Given, and what that might mean if a comparable incident occurred at a modern tech company.

Read Post

Rootly

Read more about How Would an SRE Conduct a Postmortem on the Suez Canal Incident?

How Netflix Uses Fault Injection To Truly Understand Their Resilience

Apr 6, 2021 By Thomas Russell In Coralogix

Distributed systems such as microservices have defined software engineering over the last decade. The majority of advancements have been in increasing resilience, flexibility, and rapidity of deployment at increasingly larger scales. For streaming giant Netflix, the migration to a complex cloud based microservices architecture would not have been possible without a revolutionary testing method known as fault injection. With tools like chaos monkey, Netflix employs a cutting edge testing toolkit.

Read Post

Coralogix

Read more about How Netflix Uses Fault Injection To Truly Understand Their Resilience

How SREs Can React to COVID-19's Impact on Incident Management

Apr 2, 2021 By Quentin Rousseau In Rootly

By adding new complexity to reliability engineering and making physical war rooms a thing of the past, COVID-19 has imposed permanent changes on incident management. Here’s how SREs can respond.

Read Post

Rootly

Read more about How SREs Can React to COVID-19's Impact on Incident Management

Coffee Break Webinar Series: Intelligent Observability for SRE

Mar 30, 2021 By David Conner In Moogsoft

A selection of live questions and answers from the audience of our recent webinar on how site reliability engineers can best leverage intelligent observability to monitor SLIs and SLOs, prioritize reliability over functionality, and more.

Read Post

Moogsoft

Read more about Coffee Break Webinar Series: Intelligent Observability for SRE

A Day in the Life: Intelligent Observability at Work with a Super SRE

Mar 23, 2021 By Helen Beal In Moogsoft

After we’d fixed Aparna’s network issue, James came to see me at my desk. Masks on, socially distanced and all that, but it was nice to have some face-to-face time. James is cool – that dry British humor and not your classic IT Ops dude. He’s been here forever and mentored me when the CIO, Charlie, hired me as the first SRE here a year or so ago. I lucked out really.

Read Post