What Managed Kubernetes Service is Best for SREs?
A comparison of EKS, AKS, GKE, Rancher and OpenShift from an SRE’s perspective.
The latest News and Information on Service Reliability Engineering and related technologies.
A comparison of EKS, AKS, GKE, Rancher and OpenShift from an SRE’s perspective.
Catchpoint is proud to present the top SRE tools as voted on by SREs. In our fourth annual SRE Survey, compiled in partnership with VMware Tanzu Observability and DevOps Institute, we simply asked, “What are a few tools that every SRE should have available in their toolbelt?” Today, we are excited to share the findings with you. While some of the answers were not strictly tools, the analysis gives us valuable insight into the mindset of an SRE.
Facebook’s October 2021 outage was the type of event that gives SREs nightmares: A series of critical business apps crashed in minutes and remained unavailable for hours, disrupting more than 3.5 billion users around the world and costing about 60 million dollars. As incidents go, this was a pretty big one.
xMatters is part technology, part service reliability, and a little bit of magic. If you’ve spent time on the xMatters website, you’ll likely have seen a number of valuable use cases for the platform—it can alert SREs when there’s a website outage, it can accelerate product development for DevOps teams, it can manage on-call schedules and alerts for support teams.
In a world with everything digital, you need AIOps to help ensure uptime and break through the noise. Still not sold? Let's explore 5 ways SRE and DevOps teams are using AIOps to boost existing monitoring tools.
A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. It also encompasses a strategy and set of practices and principles across service offerings and is closely tied to DevOps and operations. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.
The four key takeaways for SREs from Google’s State of DevOps 2021 report
If you're an engineer reading this, you might be wondering what I mean by the title. You might be a Site Reliability Engineer whose primary responsibility is to maintain the reliability of your company’s product/solution. You might be a software builder, a programmer responsible for building new capabilities and shipping them to production. All of these are important for any business to remain competitive.
In 2016, Google released the definitive book on Site Reliability Engineering (SRE) - a practice that had originated in the company to take care of a monumental problem - how to keep the Google services running with high reliability. Over the years, SRE has been widely adopted by dev teams across the globe and is a popular role at startups and enterprises alike. Here is a look at how search for SRE has trended over the years.
SRE and DevOps are closely related concepts, and many businesses can benefit from embracing both of them. Nonetheless, there are important distinctions between SRE and DevOps.