%term

The latest News and Information on Service Reliability Engineering and related technologies.

Enabling SRE best practices: new contextual traces in Cloud Logging

Nov 10, 2021 By Eyamba Ita In Google Operations

The need for relevant and contextual telemetry data to support online services has grown in the last decade as businesses undergo digital transformation. These data are typically the difference between proactively remediating application performance issues or costly service downtime. Distributed tracing is a key capability for improving application performance and reliability, as noted in SRE best practices.

Read Post

Google Operations

Read more about Enabling SRE best practices: new contextual traces in Cloud Logging

SLA vs. SLO vs. SLI: Understanding the Similarities and Differences

Nov 5, 2021 By JJ Tang In Rootly

An explanation of the meaning of SLA, SLO and SLI, and how SREs should use each concept to manage reliability.

Read Post

Rootly

Read more about SLA vs. SLO vs. SLI: Understanding the Similarities and Differences

Podcast: Break Things on Purpose | Gustavo Franco, Senior Engineering Manager at VMWare

Nov 3, 2021 By Jason Yee In Gremlin

In this episode Jason is joined by Gustavo Franco, Senior Engineering Manager at VMWare, to chat about chaos in the Gustavo’s early days. Gustavo reflects on Googles early disaster recovery practices, to the contemporary SRE movement.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | Gustavo Franco, Senior Engineering Manager at VMWare

How they SRE: Insights from the Cloudflare SRE team

Nov 3, 2021 By Pruthvi In Spike

Cloudflare is a global cloud services provider that is based all over the globe, from San Francisco, US to London, England to Sydney, Australia. Their mission, as stated front and center on their homepage, is to help build a better Internet. While that may read like hyperbole, their numbers are impressive - Cloudflare has over 126,000 paying customers and 95% of Internet Users in the developed world are within 50ms of their network.

Read Post

Spike

Read more about How they SRE: Insights from the Cloudflare SRE team

SRE vs. SWE: Similarities and Differences

Oct 29, 2021 By Quentin Rousseau In Rootly

SREs and SWEs complement each other, but they perform different tasks and focus on different priorities.

Read Post

Rootly

Read more about SRE vs. SWE: Similarities and Differences

Site Reliability Engineer (SRE) Roles and Responsibilities

Oct 29, 2021 By Greg Leffler In Splunk

Software development is getting faster and more complex – frustrating IT operations teams more than ever. So, DevOps gained popularity in order to combat siloed workflows, decreased collaboration and a lack of visibility. While establishing a culture of DevOps has helped teams collaborate better and deliver reliable software faster, DevOps teams don’t necessarily have someone specifically dedicated to developing systems that increase site reliability and performance.

Read Post

Splunk

Read more about Site Reliability Engineer (SRE) Roles and Responsibilities

How Changelog monitors and optimizes website performance with Grafana Cloud

Oct 27, 2021 By Lauren Johnson In Grafana

Developers around the world get their news from Changelog, an indie media company on a mission to create inspiring content for software developers. Through their popular podcasts, including The Changelog, Go Time, JS Party, and Ship It!, the team at Changelog helps listeners stay up-to-date on the latest happenings, trends, and tools in a constantly evolving industry.

Read Post

Grafana

Read more about How Changelog monitors and optimizes website performance with Grafana Cloud

How We Use Sloth to do SLO Monitoring and Alerting with Prometheus

Oct 26, 2021 By Stavros Foteinopoulos In Mattermost

One of the most challenging tasks for Site Reliability Engineers is to align the reliability of the systems with the business goals. There is a constant battle between delivering more features—which increases the product’s value—and keeping the system reliable and maintainable. A significant ally to achieve both objectives is the Service Level Objective Framework.

Read Post

Mattermost

Read more about How We Use Sloth to do SLO Monitoring and Alerting with Prometheus

Differences between Site Reliability Engineer Vs. Software Engineer Vs. Cloud Engineer Vs. DevOps Engineer

Oct 26, 2021 By Squadcast Community In Squadcast

The evolution of Software Engineering over the last decade has lead to the emergence of numerous job roles. So how different is a Software Engineer, DevOps Engineer, Site Reliability Engineer and a Cloud Engineer from each other? In this blog, we drill down and compare the differences between these roles and their functions.

Read Post