Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Elastic AI Assistant for Observability

Harness the power of generative AI to turn insights into actions. Powered by the Elasticsearch Relevance Engine™ (ESRE™), Elastic’s AI Assistant (in technical preview for Observability) transforms problem identification and resolution by eliminating manual data chasing across silos to an interactive assistant that delivers accurate and context-aware remediation for SREs.

Seven Models of Cloud Native Applications

In today's cloud-driven landscape, organizations are transitioning from legacy monolithic systems to agile, scalable, and secure cloud-native solutions. Some are even forging new cloud-native applications. However, the concept of cloud-native design remains subjective, lacking a universal blueprint. This blog aims to provide clarity and guidance for designing precise cloud-native applications and container deployment.

Enhancing Incident Management: Seven Integrations to Complete Your Ticketing Systems

Squadcast offers some powerful integrations to simplify Incident Management processes and make your work easy. These integrations enhance Incident Management processes and complete your ticketing systems, ensuring seamless collaboration and timely issue resolution.

Practical guidance for getting started as a site reliability engineer

At the beginning of May, I joined incident.io as the first site reliability engineer (SRE), a very exciting but slightly daunting move. With only some high-level knowledge of what the company and its systems looked like prior to this point, it’s fair to say that I didn’t have much certainty in what exactly I’d be working on or how I’d deliver it.

SLO Driven Incident Response: Service Level Objectives for Effective Incident Management | Squadcast

In today's tech-driven landscape, effective Incident Management is vital for seamless service and customer satisfaction. This webinar explores ways to uncover the role of Service Level Objectives (SLOs) in structuring incident response processes while acting as a compass, guiding incident prioritization and resolution to minimize customer impact and downtime. The webinar will help you demystify SLOs, their data-driven role in incident decision-making, and how to prioritize incidents to lessen customer impact by identifying critical incidents.

Latest Developments in Site Reliability Engineering, 2023

Gartner recently published its Hype Cycle for Site Reliability Engineering, 2023, (July 2023) report. OnPage was inspired by this report to share its prediction about the future of site reliability engineering. In this blog, OnPage will review evolutionary tools that can improve site reliability engineering practices.