Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

PagerDuty Copilot | Generative AI for PagerDuty Operations Cloud

Introducing PagerDuty Copilot: Your GenAI assistant for critical operations work. For scaling your teams. For sustaining customer experiences. For moving business forward – faster. Work more efficiently. Protect more revenue. Build greater operational resilience. PagerDuty Copilot is the AI assistant operations teams trust to help them manage business-impacting issues in seconds, not hours. From event to resolution, PagerDuty Copilot’s automations help you resolve issues faster, reduce risk, and control costs.

Improving Customer Support with Squadcast Webforms: A Smart Solution for MSPs

Managed Service Providers (MSPs) handle a multitude of customer support cases, each requiring efficient routing to the right team member. Squadcast's Webforms provide a solution to expedite issue reporting and streamline resolution. In this blog, we will explore how MSPs can leverage webforms to enhance the customer support experience.

Introducing Workflows: Enhancing Automation in Incident Response

At Squadcast, we advocate for the principles of Site Reliability Engineering (SRE), which emphasize the critical importance of automating routine tasks to boost efficiency in Incident Management. We're aiding organizations in implementing these principles with one of our newest features: 'Workflows'. Workflows has been designed to automate manual facets of your Incident lifecycle, all while ensuring human-in-the-loop execution for critical decisions.

Best Practices to Avoid Website Outages on Black Friday

The most frenzied shopping day of the year – Black Friday – is fast approaching, and businesses around the globe are bracing themselves. However, imagine this – a massive number of eager shoppers ready to snag the hottest deal, and just when your website should be working at its best, it crashes, leaving behind frustrated customers and potential revenue slipping through your virtual fingers. This scenario is not entirely fictional.

Resilience Engineering in 2024: Challenges, Trends, & Priorities

Is your organization ready to fortify, expand, and cultivate a robust resilience engineering culture in 2024? In this webinar Chris Evans (Co founder & Chief Product Officer, incident.io) and Courtney Nash (Internet Incident Librarian, The VOID) will delve into crucial considerations and top priorities for improving your organization’s ability to build safer and more reliable complex systems while unlocking insights for shaping your plans for 2024 and beyond.

Quick start guide to Unified Analytics dashboards

When it comes to observability, we’ve found that most organizations have ~20 tools installed in their IT environments. With so many tools, it’s difficult for IT leaders to gain insight into how their tools are performing and determine how much value ITOps is bringing to the organization.

Weathering Black Friday and Other Storms Reliably

If you work in eCommerce, you can see the storm on the horizon. Black Friday, the biggest shopping day of the year both online and off, is only a few days away. Your services are going to hit usage spikes you possibly have never seen before. And it will be all aspects of your services pushed to your limit – people won’t just be searching, or just buying, or signing up for programs, they’ll be doing all of these at once. ‍ Most crucially, everyone else is offering deals too.

Should data teams consider incident management tools to respond to pipeline issues?

Data teams are adopting more processes and tools that align with software engineering, and from talks at the dbt Coalesce conference in 2023, there’s clearly a big push towards adopting software engineering practices at enterprise scale companies. At the moment, there are a lot of tools in the data space for identifying errors in data pipelines, but no tools for responding to these errors, such as coordinating fixes. This is exactly where an incident management platform makes sense to implement.