Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Weaving AI into SIGNL4

Over the past two years, artificial intelligence (AI) has experienced remarkable growth, significantly influencing various sectors and daily life. In 2023, the release of advanced large language models (LLMs), such as OpenAI’s GPT-4 and Google DeepMind’s Gemini, marked a pivotal shift by enabling AI systems to process and generate diverse data types, including text, images, and audio.

PagerDuty Operations Cloud Spring 25 Release: Reimagining Operations in the Age of AI and Automation

Operational excellence isn’t just a goal—it’s critical for survival for all companies. And, when powered by AI and automation, it’s a strategic competitive differentiator. With over a decade of AI and ML experience in our platform, PagerDuty pioneered the Incident Response space. And now, PagerDuty is redefining what modern operations can look like in the era of AI and automation.

Microsoft Entra ID Outage: How Vantage DX Detected the Issue Before Microsoft Acknowledges the Issue

On February 25, 2025, at 11:32 AM EST, Martello’s Vantage DX monitoring began alerting on an issue affecting Microsoft Entra ID (Azure AD SSO). While Microsoft had not yet acknowledged the incident, online reddit forums had noted the issue and our Vantage DX proactive monitoring detected disruptions impacting authentication across multiple workloads. See here the critical warning for Exchange in Vantage DX Monitoring. Here is the critical warning for OneDrive and SharePoint in Vantage DX.

Operational excellence in the age of AI and Automation

The future of operations is here with PagerDuty's groundbreaking AI and automation innovations. Learn how PagerDuty AI agents, powered by PagerDuty Advance, and new use cases like security incident management and LLMOps can help your organization achieve operational excellence to reduce cost, mitigate the risk of outages, and accelerate innovation.

February 2025 Box Outage: Timeline and Post-Mortem

Box.com is a cloud-based content management and file-sharing platform designed for the enterprise and used by nearly 100,000 companies around the world. When a Box outage strikes, businesses can experience costly disruptions. On February 19, 2025, a disruption in core Box services including uploads, downloads, and the All Files page, affected thousands who depend on the cloud storage and collaboration platform.

Feature Spotlight - Post-Incident Reports

The Post-Incident Report builder is available to Advanced plan customers to help document the incident post-mortem process. This allows users to share key information and understanding about why an incident occurred, how resolvers responded, and what preventive actions can be taken to ensure it doesn't happen again. After creating a Post-Incident Report, you can share it with other colleagues or stakeholders to keep them informed about the steps you’re taking to mitigate and prevent potential recurrences.

How to connect Google Calendar events and Slack

Managing Google Calendar events within Slack has never been easier! Pagerly’s Slack integration is the ultimate solution for teams looking to streamline their event management, on-call scheduling, and team communication—all without leaving Slack. Whether you need event reminders, real-time Slack status updates, or automated Slack notifications about important events, Pagerly ensures your team stays informed and organized.

New Integration: ilert + RapidSpike for Proactive Website Monitoring

We are pleased to announce a new inbound integration in the ilert catalog: RapidSpike. This integration enhances incident management by connecting ilert with RapidSpike’s website monitoring capabilities, ensuring teams receive real-time alerts on website performance, uptime, and security threats.