Latest News

On Not Being a Cog in the Machine

Feb 9, 2021 By Fred Hebert In Honeycomb

This is my first week here as the first dedicated SRE for Honeycomb, and in a welcoming gesture, I was asked if I wanted to write a blog post about my first impressions and what made me decide to join the team. I’ve got a ton of personal reasons for joining Honeycomb that may not be worth being all public about, but after thinking for a while, I realized that many of the things I personally found interesting could point towards attitudes that result in better software elsewhere.

Read Post

Honeycomb

Read more about On Not Being a Cog in the Machine

Communication Tool Down? Here are 3 Ways to Handle it

Feb 8, 2021 By Emily Arnott In Blameless

January 4th, 2021, the communication service Slack suffered a major outage. Teams working remotely found their primary communication method unavailable. The incident lasted over 4 hours, during which some customers had intermittent or delayed service, and others had no service at all. It was a reminder that even the most established tools are susceptible to downtime. This is a core lesson of SRE: that failure is inevitable.

Read Post

Blameless

Read more about Communication Tool Down? Here are 3 Ways to Handle it

Beginners Guide to Incident Postmortems

Feb 7, 2021 By Camille Hodoul In Rootly

Successful and blameless postmortems can turn incidents into a gift of learning and prevent repeat mistakes.

Read Post

Rootly

Read more about Beginners Guide to Incident Postmortems

"I'm Just Doing my Job," An SRE Myth

Feb 2, 2021 By Darrell Pappa In Blameless

"Sorry, but I'm just doing my job." I heard this recently from a customer service representative. What they were saying made sense (afterall, we don’t have total control over our work environments), but it felt wrong. As a customer, I was left dissatisfied with our interaction. However, the representative assured me that they were simply following protocol. This got me thinking: can established practices and protocols sometimes get in the way of excellent customer experience?

Read Post

Blameless

Read more about "I'm Just Doing my Job," An SRE Myth

Who Else Wants to Increase Development Velocity?

Jan 26, 2021 By Emily Arnott In Blameless

Implementing SRE is fundamentally about shifting culture, but it often means adding new tooling and processes to your team's workflows to support that cultural change. Teams add new steps and checks to incident response procedures. Incident responders write retrospectives and create new meetings to review them. Engineers consult new tools like monitoring dashboards and SLOs. In other words, SRE creates another layer of consideration in development and operations.

Read Post

Blameless

Read more about Who Else Wants to Increase Development Velocity?

7 Tips On Building And Maintaining An SRE Team In Your Company

Jan 22, 2021 By Squadcast Community In Squadcast

In today's "always on" world, Reliability is a primary business KPI. Plant the culture of Reliability by implementing these 7 simple tips to build a solid SRE team in your organization. Many of today’s hottest jobs didn’t exist at the turn of the millennium. Social media managers, data scientists, and growth hackers were never heard of before. Another relatively new job role in demand is that of a Site Reliability Engineer or SRE. The profession is quite new.

Read Post

Squadcast

Read more about 7 Tips On Building And Maintaining An SRE Team In Your Company

Take the first step toward SRE with Cloud Operations Sandbox

Jan 22, 2021 By Simon Zeltser In Google Operations

At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues faster, increase release velocity and improve service reliability.

Read Post

Google Operations

Read more about Take the first step toward SRE with Cloud Operations Sandbox

The Key Differences between SLI, SLO, and SLA in SRE

Jan 20, 2021 By Biju Chacko In Squadcast

To incentivize reliability in your platform, there should be shared goals across your team to measure & quantify the capabilities of your product/service along with customer experience. Define the path of "Always-On" services by understanding few key SRE fundamentals and their implications - SLIs, SLOs & SLA. Framing SRE metrics for building or scaling a product is quite a daunting task.

Read Post

Squadcast

Read more about The Key Differences between SLI, SLO, and SLA in SRE

2021 is the Year of Reliability

Jan 20, 2021 By Robert Ross In FireHydrant

There’s no better time than now to dedicate effort to reliable software. If it wasn’t apparent before, this past year has made it more evident than ever: People expect their software tools to work every time, all the time. The shift in the way end-users think about software was as inevitable as our daily applications entered our lives, almost like water and electricity entered our homes.

Read Post

FireHydrant

Read more about 2021 is the Year of Reliability

SREview Issue #9 January 2021

Jan 19, 2021 By Blameless Community In Blameless

New year, new SRE! We’ve said goodbye to 2020 and hello to 2021. Here’s some of the most exciting Tweets, content, and events happening in the SRE and resilience engineering community so far this year.

Read Post

Blameless

Read more about SREview Issue #9 January 2021

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

On Not Being a Cog in the Machine

Communication Tool Down? Here are 3 Ways to Handle it

Beginners Guide to Incident Postmortems

"I'm Just Doing my Job," An SRE Myth

Who Else Wants to Increase Development Velocity?

7 Tips On Building And Maintaining An SRE Team In Your Company

Take the first step toward SRE with Cloud Operations Sandbox

The Key Differences between SLI, SLO, and SLA in SRE

2021 is the Year of Reliability

SREview Issue #9 January 2021

Monthly Archive

Follow Us