Latest News

7 Ways SRE Is Changing IT Ops And How To Prepare For Those Changes

Apr 29, 2021 By Squadcast Community In Squadcast

SRE best practices are disrupting and catalyzing change in the ways organizations approach IT Operations. In this blog we look at 7 ways SRE is bringing this transition. ‍Site Reliability Engineering is a new practice that has been growing in popularity among many businesses. Also known as SRE, the new activity puts a premium on monitoring, tracking bugs, and creating systems and automations that solve the problem in the long term.

Read Post

Squadcast

Read more about 7 Ways SRE Is Changing IT Ops And How To Prepare For Those Changes

How Kubernetes Can Both Help and Hinder Incident Management Teams

Apr 29, 2021 By Quentin Rousseau In Rootly

Kubernetes makes it easier in certain ways to manage reliability. But incident response teams and SREs must also be prepared to handle the unique reliability challenges that K8s creates.

Read Post

Rootly

Read more about How Kubernetes Can Both Help and Hinder Incident Management Teams

What is Site Reliability Engineering [Simple Intro to SRE]

Apr 26, 2021 By Emily Arnott In Blameless

Wondering what SRE is all about? We will explain what it is, how it works, why it was developed, and how it can help your organization. So what is SRE (Site Reliability Engineering)? SRE is a methodology that fuses software and operations teams, with the goal of producing reliable, resilient, and scalable systems. Site Reliability Engineering (SRE) was developed by Google engineer Ben Treynor Sloss in 2003. Google’s goal was to increase the reliability of its sites and services.

Read Post

Blameless

Read more about What is Site Reliability Engineering [Simple Intro to SRE]

4 Characteristics of Monitoring Essential to Implementing DevOps

Apr 23, 2021 By Theo Schlossnagle In Circonus

In the new world of rapid releases, continuous change, and increasingly high user expectations, more organizations are embracing DevOps. One of the primary drivers for adopting DevOps is speed — particularly the reduction of risk at speed. As DevOps seeks to reduce risk and deliver insight at an increasingly faster pace, new tools have emerged in the monitoring space. But these tools alone will not deliver us into the low-risk world of DevOps — not without new and updated thinking.

Read Post

Circonus

Read more about 4 Characteristics of Monitoring Essential to Implementing DevOps

Creating Chaos to Achieve Reliability

Apr 22, 2021 By JJ Tang In Rootly

How can creating chaos achieve better reliability? Chaos and reliability might seem mutually exclusive, but through the use of Chaos Engineering, SREs can bring about meaningful changes to system resiliency.

Read Post

Rootly

Read more about Creating Chaos to Achieve Reliability

SREview Issue #12 April 2021

Apr 20, 2021 By Blameless Community In Blameless

Spring is here! We have rain! We have flowers! We have allergies! We also have some of the most exciting Tweets, content, and events happening in the SRE and resilience engineering community this month.

Read Post

Blameless

Read more about SREview Issue #12 April 2021

Using Coralogix + StackPulse to Automatically Enrich Alerts and Manage Incidents

Apr 20, 2021 By Jonathan Brown In Coralogix

Keeping digital services reliable is more important than ever. When something goes wrong in production, on-call teams face significant pressure to identify and resolve the incident quickly – in order to keep customers happy. But it can be difficult to get the right signals to the right person in a timely fashion.

Read Post

Coralogix

Read more about Using Coralogix + StackPulse to Automatically Enrich Alerts and Manage Incidents

Resilience in Action E6: Oversize Coffee Mugs, SLOs, and ML with Todd Underwood

Apr 19, 2021 By Blameless Community In Blameless

‍Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.

Read Post

Blameless

Read more about Resilience in Action E6: Oversize Coffee Mugs, SLOs, and ML with Todd Underwood

Creating Custom Slack Commands

Apr 15, 2021 By FireHydrant In FireHydrant

Site Reliability Engineers are expected to know everything that’s happening, all of the time. That’s a lot of things! To help you sift through the noise, we’ve developed a feature that lets you find accurate data about your organization on-demand. You can do this by sending custom-designed commands to FireHydrant directly from your integrated Slack account.

Read Post