Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Getting Started with PagerDuty

In this video you will achieve a baseline understanding of what PagerDuty does and how to configure your PagerDuty account. To dive deeper into the PagerDuty platform, select relevant topics in our complimentary on-demand e-learning center at university.pagerduty.com. The PagerDuty Operations Cloud is essential infrastructure that detects and diagnoses disruptive events, mobilizes the right team members to respond, and automates workflows across your digital operations - so that your business moves forward, faster. Get started now!

What's missing from your incident management workflow

The first fifteen minutes of an incident set the tone for the rest of the resolution process. But what makes the difference between a rapid response and a stressful scramble—clear ownership—hasn't always been easy to ascertain. In this article, we’ll cover how Cortex, an internal developer portal, can be your team’s source of truth to accelerate the incident management process, and reduce MTTR.

Synced for Success: OnPage & Slack for Incident Response

As the post-pandemic world finds its footing again, a resilient spirit drives the revival, propelling businesses to embrace a new era of technological innovation. Notably, IT teams are swiftly adopting the digital transformation of their processes, particularly in incident response. From virtual collaboration tools and remote IT support to automated incident management, teams have found innovative ways to ensure seamless business continuity while delivering IT services with minimum downtimes.

Scaling Up to Keep Costs Down: Automation for Web Application Incident Management

Any organization that’s keeping up with today’s sharp rise in business demands (or better yet, getting ahead of the game) is doing so by getting innovative and jumping at the chance to do things differently. They’re not relying on the old ways or trying to use their existing toolbox. Instead, organizations are looking to the newest technologies and means of adding efficiency to as many day-to-day functions as possible.

Evolution of Site Reliability - Incidentally Reliable with Manoj Sebastian

Catch Manoj Sebastian(ex-Flipkart, Amazon, Atlassian, Intuit, Yahoo) talk about The Evolution of SRE through 20 years, Incident Response and Post Incident Culture at Big Tech and the Future of Reliability with AI ramping up at full speed. The freshest podcast for Site Reliability Engineers, hosted by Vishwa and Shubham from Zenduty.

incident.io: A scalable incident management solution built for enterprises

For enterprise businesses, a lot is riding on the efficiency of their incident response. These organizations have large customer bases, complex products, and many incidents. They also have loads of incident responders across various roles, making it difficult to coordinate internally.

Unveiling Squadcast's Enhanced Status Pages

Meet Kevin and Mai (again): Navigating the Troublesome Waters of Platform Downtime. Kevin is a Site Reliability Engineer (SRE), constantly on the lookout for potential downtime that could impact their platform, kryptobro.com. Mai is his adept partner, ever-ready to troubleshoot. In their journey, the previous version of Squadcast Status Pages served as a helpful tool, but they soon found room for improvements.

Discover what's driving the recognition behind BigPanda's AIOps innovations

Every day, BigPanda is transforming the way our customers operate. Our advanced AIOps technology redefines incident management, prevents service disruptions, and elevates customer satisfaction – and I couldn’t be more thrilled to see industry experts take notice. I’m particularly proud to see BigPanda mentioned in nine of the highly esteemed 2023 Gartner Hype Cycle reports.