Operations | Monitoring | ITSM | DevOps | Cloud

SRE

The latest News and Information on Service Reliability Engineering and related technologies.

Tools and Trends in Site Reliability Engineering according to Gartner's 2023 Hype Cycle

Gartner recently published its Hype Cycle for Site Reliability Engineering, 2023, report. This blog reviews the future of site reliability engineering based on Gartner’s Hype Cycle. Additionally, the OnPage team is pleased that Gartner mentioned OnPage as a sample vendor in the Automated Incident Response category.

Evolution of Site Reliability - Incidentally Reliable with Manoj Sebastian

Catch Manoj Sebastian(ex-Flipkart, Amazon, Atlassian, Intuit, Yahoo) talk about The Evolution of SRE through 20 years, Incident Response and Post Incident Culture at Big Tech and the Future of Reliability with AI ramping up at full speed. The freshest podcast for Site Reliability Engineers, hosted by Vishwa and Shubham from Zenduty.

Unveiling Squadcast's Enhanced Status Pages

Meet Kevin and Mai (again): Navigating the Troublesome Waters of Platform Downtime. Kevin is a Site Reliability Engineer (SRE), constantly on the lookout for potential downtime that could impact their platform, kryptobro.com. Mai is his adept partner, ever-ready to troubleshoot. In their journey, the previous version of Squadcast Status Pages served as a helpful tool, but they soon found room for improvements.

SRE Redefines IT Operations as Architect of Sustainable Systems

Site Reliability Engineering (SRE) is a term that’s getting attention and gaining momentum – and for a good reason. SRE takes features of software engineering and applies them to various problems in infrastructures and operations. Organizations look to build SRE teams with a couple goals in mind, including to create and increase scalability and develop solid software systems.

Kubernetes Incident Management Best Practices

Creating just any infrastructure on Kubernetes is not enough. There are so many basic configurations you could apply and create the infrastructure for your application for the time being and it might work just fine. The incident responses won’t always remain 100% reliable. You will run into newer potholes, and that’s okay.

Understanding Blameless Postmortems

Progress often accompanies unforeseen challenges and mishaps in organizations. Traditionally, these setbacks resulted in pointing fingers, hindering progress, and creating a negative work atmosphere. However, a "Blameless Postmortems" approach transforms how organizations respond to failure. In this blog, we will delve into the importance of cultivating a blameless postrmortem culture when faced with setbacks.

Introducing Squadcast's Key Based Deduplication

We are excited to share another feature update with all our valued customers! We have recently gone live with our Key Based Deduplication feature, enabling you to define dedup keys using customizable templates for configured alert sources. With this feature, you can automatically group similar incidents and effectively deduplicate alerts.