Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

PagerTree Account Admin QuickStart Guide

In this quick start guide, we will cover the basics of getting started as an account admin within PagerTree. Transcript: In this quickstart guide, we will show you the basics of an account admin in PagerTree. Before watching this video, it is suggested to read and watch the Architecture Guide to build a strong foundation for your understanding of PagerTree and how it works. Here is a brief overview of the alert workflow.

Maximizing Uptime: Four Essential System Monitoring Best Practices

System uptime is a fundamental necessity for every organization that gives importance to the customer experience and satisfaction. A single minute of downtime can trigger a cascade of negative consequences, impacting everything from revenue streams to customer loyalty. So, why exactly is system uptime important? Downtime translates to lost revenue, frustrated users, and operational disruption.

Building AI features? Don't forget your product principles

It’s fair to say that AI is here to stay. So, as companies grapple with this reality, they’re putting their best foot forward to build AI features that really make a difference for their customers. But should you be building these features if there’s no obvious fit in your product? And even if there is, are you making sure to stay true to your product principles? The reality is that deciding to build AI into your product isn’t a decision you make on a whim.

Install OneUptime with Docker Compose

Welcome to our step-by-step tutorial on how to install OneUptime using Docker Compose! In this video, we'll guide you through the entire process of setting up OneUptime on your system using Docker Compose. OneUptime is a powerful tool that helps you monitor your websites and services, ensuring they're always up and running.

PagerTree Team Admin Quickstart Guide

In this quick start guide, we will cover the basics of getting started as a team admin within PagerTree. Transcript: In this Team Admin quickstart guide, we will explore the basics of team management in PagerTree. Team admins are responsible for managing teams within PagerTree. In the Team Page, admins can edit current teams, on-call schedules, and escalations policies.

The importance of psychological safety in incident management

When an incident strikes, it often brings a whirlwind of stress for everyone involved—from the teams directly handling the issue to the stakeholders making crucial decisions. Imagine support teams on high alert, customers anxiously awaiting resolutions, and executives probing for answers to steer the company through turbulent times. This mounting pressure can make a challenging situation nearly unmanageable, especially when faced with problems that are new or unexpected.

Post-Incident Reviews: Turning Failures into Learning Opportunities

Incidents are inevitable. From software failures to service disruptions, unexpected events can disrupt the smooth functioning of systems and processes, causing frustration for users and impacting business operations. However, what separates successful organizations from the rest is not the absence of incidents, but rather their approach to handling and learning from them.

Reliability for the Books - Incidentally Reliable with Niall Murphy

Catch Niall Murphy (Co-Founder of Stanza Systems) talk about graceful degradation, what startups are getting wrong about reliability and how well-thought user-experiences can communicate credibility to current and potential customers. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.