%term

The latest News and Information on Service Reliability Engineering and related technologies.

De-Siloing Incident Management: How to Make Reliability Engineering Everyone's Job

Jul 15, 2021 By JJ Tang In Rootly

4 best practices for breaking down silos and establishing a culture of shared responsibility toward reliability.

Read Post

Rootly

Read more about De-Siloing Incident Management: How to Make Reliability Engineering Everyone's Job

Pragmatic Incident Response: 3 Lessons Learned from Failures

Jul 15, 2021 By Robert Ross In FireHydrant

In my past experience as an SRE I’ve learned some valuable lessons about how to respond and learn from incidents. Declare and run retros for the small incidents. It's less stressful, and action items become much more actionable. Decrease the time it takes to analyze an incident. You'll remember more, and will learn more from the incident. Alert on pain felt by people — not computers. The only reason we declare incidents at all is because of the people on the other side of them.

Read Post

FireHydrant

Read more about Pragmatic Incident Response: 3 Lessons Learned from Failures

Rootly Announces $3.2 Million in Seed Funding from XYZ Venture Capital, 8VC, & Y Combinator

Jul 8, 2021 By Quentin Rousseau In Rootly

Rootly is on a mission to create a world where maintaining reliability is frictionless, delightful, and accessible to anyone. Making resolving and learning from incidents every organizations superpower.

Read Post

Rootly

Read more about Rootly Announces $3.2 Million in Seed Funding from XYZ Venture Capital, 8VC, & Y Combinator

The Incident Review: 4 Incidents in Outer Space

Jul 6, 2021 By JJ Tang In Rootly

From network problems to computer failures, a variety of incidents can disrupt operations for systems in outer space.

Read Post

Rootly

Read more about The Incident Review: 4 Incidents in Outer Space

SRE Report 2021: The Highlights

Jun 30, 2021 By Anna Jones In Catchpoint

Our fourth annual SRE Report launched last week. I had the good fortune to be involved in writing and editing it this year for the first time alongside our very own driving force Leo Vasiliou and the brilliant Eveline Oehrlich at DevOps Institute (check out Eveline’s take on the report’s Key Takeaways here), in addition to a number of folks at VMware Tanzu.

Read Post

Catchpoint

Read more about SRE Report 2021: The Highlights

7 Essential Tools for SREs

Jun 25, 2021 By Quentin Rousseau In Rootly

From chaos engineering to monitoring and beyond, SREs rely on several key types of tools to do their jobs.

Read Post

Rootly

Read more about 7 Essential Tools for SREs

"Should SRE Be Broken Up?": SRE from Anywhere Recap

Jun 25, 2021 By Anna Jones In Catchpoint

This year’s SRE from Anywhere (SREFA) brought together hundreds of registrants from around the world to gather virtually, share experiences, and network around all things SRE. We were thrilled to see so many friendly faces!

Read Post

Catchpoint

Read more about "Should SRE Be Broken Up?": SRE from Anywhere Recap

Practical Guide to SRE: Incident Severity Levels

Jun 17, 2021 By Nancy Chauhan In Rootly

Incident severity levels are a measurement of the impact an incident has on the business. Classifying the severity of an issue is critical to decide how quickly and efficiently problems get resolved.

Read Post

Rootly

Read more about Practical Guide to SRE: Incident Severity Levels

Catchpoint SRE Study Reveals a Global Drop in Toil, Warns of Looming Scalability Ceiling, and Highlights the Need for New Operational Capabilities

Jun 15, 2021 By Catchpoint In Catchpoint

Adoption of AIOps is slow.

Read Post

Catchpoint

Read more about Catchpoint SRE Study Reveals a Global Drop in Toil, Warns of Looming Scalability Ceiling, and Highlights the Need for New Operational Capabilities

Service quality and the rising need for enterprise SRE

Jun 15, 2021 By Valerie O'Connell In ServiceNow

In its DevOps 2021 survey of global IT professionals, Enterprise Management Associates (EMA) found that 95% of organizations with highly successful DevOps initiatives were predominantly decentralized and purposefully becoming more so as fast as possible (see Figure 1). This decentralization of development and DevOps teams is making site reliability engineering (SRE) both critical and difficult to achieve.

Read Post