Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Are your SLOs realistic? How to analyze your risks like an SRE

Setting up Service Level Objectives (SLOs) is one of the foundational tasks of Site Reliability Engineering (SRE) practices, giving the SRE team a target against which to evaluate whether or not a service is running reliably enough. The inverse of your SLO is your error budget — how much unreliability you are willing to tolerate.

SRE vs DevOps: What's The Difference?

Whether you’ve heard of or fully jumped on the DevOps or SRE bandwagon, you may have also wondered how the two relate. What’s the difference? Are they really just different ways of looking at the same problem? The term DevOps hit the market first, but SRE wasn’t too far behind. And though they have different origin stories, they both focus on autonomy, automation, and iteration. So why do these paradigms exist? And why do we need both? Let’s look at this further.

The Pros and Cons of Embedded SREs

To embed or not to embed: That is the question. At least, that’s one of the questions that companies have to answer as they decide how to implement Site Reliability Engineering. They can either embed SREs into existing teams, or they can build a new, separate SRE team. Both approaches have their pros and cons. The right strategy for your company or team depends, of course, on your needs and priorities.

Freshdesk + Squadcast: Enabling Streamlined Incident Response for Enterprises

Freshdesk is a cloud-based customer service platform used by enterprises that provides a centralized help desk(with the help of support tickets) across multiple channels, including email, phone, chat, and social media. Squadcast is an incident management platform that integrates with major monitoring, ChatOps and project management tools to provide a centralized place for reliability.

Site Reliability Engineering: An Imperative in Today's Enterprise IT

Site reliability engineering (SRE) is fast becoming an essential aspect of modern IT operations, particularly in highly scaled, big data environments. As businesses and industries shift to the digital and embrace new IT infrastructures and technologies to remain operational and competitive, the need for a new approach for IT teams to find and manage the balance between launching new systems and features and ensuring these are intuitive, reliable, and friendly for end users has intensified as well.
Sponsored Post

Show character with Blameless Postmortems (part one)

This is Part 1 of a two-part series on Blameless Postmortems. Today, we'll discuss why blameless postmortems are so important and their implications for your team; the second part will go into detail on how to set them up as a process and make them successful. Somebody wise may have once told you that how we handle adversity shows our character. Being able to acknowledge and admit mistakes is the first step towards learning - it's a key part of success both in personal relationships and in large companies.

New StackPod Episode: Implementing an SRE Practice with Yousef Sedky of Axiom/Hyke

For our latest StackPod episode, we invited Hyke’s DevOps team lead and AWS Cloud architect: Yousef Sedky. Axiom Telecom is one of the largest telephone retailers in the United Arab Emirates and Saudi Arabia and Hyke, its sister company, is a distribution platform for mobile products.

SRE vs. Platform Engineering: The Key Differences, Explained

Site Reliability Engineering (SRE) teams and Platform Engineering teams share similar goals -- like maximizing automation and reducing toil -- and similar methodologies. But they have different priorities, and use somewhat different tools to achieve them. What are SREs, what are platform engineers and how is each role similar and different? This article explains.