The PagerDuty Incident Response Process is a detailed document that provides a framework for how to structure your incident response process. But sometimes it helps to understand how these seemingly abstract concepts play out during real-world scenarios. You can now hear an incident call recording that’s based on a real PagerDuty incident. Due to the nature of incident response practices, the process guide we publish is filled with very explicit details regarding a variety of situations.
While there are some very good sources out there on how to manage a critical incident, Google also wrote a chapter on incident management in their book, “Site Reliability Engineering”. In this chapter, the folks at Google present their approach to a well-designed critical incident management process.
This is the second edition of our features review from the past few years, here we will share some features that were created or updated in 2017. We are currently moving content from our old blog platform so all features mentioned here are not new but it is always good to take a fresh look at things. And as we are always upgrading & enhancing features, some of the items have been edited to reflect the current state.
This month, we are excited to announce a new set of product capabilities and enhancements designed to ensure that teams can work in real time, all the time, wherever they are. Whether they’re on-the-go with their mobile devices or at their desks on a typical work day, we will continue to innovate without sacrificing ease-of-use and adoption.
Let’s set the scene: You’re an on-call engineer, working for a dedicated support team. Your priorities are twofold, including, (1) speedy incident resolution and (2) satisfying clients and stakeholders. With these demands in mind, you adopt OnPage’s integration with ConnectWise. The integration streamlines the ticketing-to-alerting process, ensuring that your team achieves client service excellence.
At Monitorama 2018, Engineering Manager Kale Stedman shared Demonware’s journey to assisted remediation, or as he likes to call it: “How my team nearly built an auto-remediation system before we realized we never actually wanted one in the first place.” In this post, I’ll recap Kale’s Monitorama talk, highlighting the key decisions that helped his team reduce daily alerts, fix underlying problems, and establish a more engaged Monitoring Team — including the steps the
COLUMBIA, Md., June 18, 2019 — StatusCast, a leading SaaS provider of Corporate Status Pages, is pleased to announce the successful completion of a Service Organization Controls (SOC) 2 attestation engagement covering the trust principles of Security, Confidentially, and Availability.