%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Managing Alerts: Car Alarms and Smoke Alarms

Nov 3, 2025 By Ritik In Spike

Building and shipping an application is exciting, you watch your idea come alive and reach users. But once it’s out there, your real job begins: keeping it alive. An app in production isn’t just code running, it’s a living system. It needs monitoring to stay healthy and alerting to warn when something’s off. But there’s a catch: too few alerts, and you’ll miss real issues; too many, and you’ll drown in noise.

Read Post

Spike

Read more about Managing Alerts: Car Alarms and Smoke Alarms

The one where we scaled

Oct 31, 2025 By incident-io In Incident.io

From 3 people in 2020 to 93 in 2025—incident.io has come a long way, and we’re just getting started. Whether you’ve been here since the early days or just joined, this is what it looks like to build something great *together*. If you're after:️️ Great people Real impact (across the globe, not just in Greece) A place where growth is the default And teammates who’ll always be there for you... We’re hiring! (And we're going to need a bigger couch…)

View Video

Incident.io

Incident Management

Read more about The one where we scaled

PagerDuty MCP AIOps enhancements: Incident Insights, Service & Global Orchestrations

Oct 30, 2025 By PagerDuty Inc. In PagerDuty

View Video

PagerDuty

Read more about PagerDuty MCP AIOps enhancements: Incident Insights, Service & Global Orchestrations

We Built an SRE Agent With Memory And It's Transforming Incident Response

Oct 30, 2025 By Julia Nasser In PagerDuty

If you feel like your incidents are multiplying while your stack gets more complex by the week, you’re not alone. Event volumes keep climbing, signals live in a dozen tools, and human responders are stretched thin. That’s exactly why we built the PagerDuty SRE Agent—a vendor‑agnostic AI teammate that improves with every response to make the next one faster, smarter, and more reliable.

Read Post

PagerDuty

Read more about We Built an SRE Agent With Memory And It's Transforming Incident Response

Too Late to Learn: Why Security Post-Mortems Fail and How AI Can Help

Oct 30, 2025 By Casey Lems In PagerDuty

An effective post-mortem can turn a security breach into a blueprint for lasting resilience. But too often, in the stress of an incident, documenting what happened takes a back seat to containment and recovery. The resulting analysis relies heavily on memory, scattered notes, and competing narratives. Valuable context gets lost, timelines blur, and lessons that could strengthen defenses never become institutional knowledge.

Read Post

PagerDuty

Read more about Too Late to Learn: Why Security Post-Mortems Fail and How AI Can Help

Automating the First Hour of Troubleshooting with Netdata AI

Oct 30, 2025 By Netdata In netdata

Avoid the most expensive hour of incident response. Learn how Netdata AI uses hybrid AIOps to detect, reason, and summarize incidents.

View Video

netdata

Read more about Automating the First Hour of Troubleshooting with Netdata AI

Same code, same infra but your model is now broken #ai #devops

Oct 30, 2025 By Rootly In Rootly

View Video

Rootly

Read more about Same code, same infra but your model is now broken #ai #devops

How agentic ITOps helps ensure resilient IT infrastructures

Oct 29, 2025 By C Beers In BigPanda

Infrastructure resilience is essential for any modern IT environment. Downtime is expensive. Beyond the stresses of day-to-day operations, you want to be confident that your IT systems will continue functioning during service disruptions, hardware failures, or natural disasters. Agentic ITOps can help ensure a reliable, resilient IT infrastructure environment. These systems use agentic AI to help IT teams minimize downtime, improve customer trust, and protect your business’s revenue and reputation.

Read Post

BigPanda

Read more about How agentic ITOps helps ensure resilient IT infrastructures

Jira Service Management (JSM) Review for Alerting (2025)

Oct 29, 2025 By Sreekar In Spike

Atlassian is shutting down OpsGenie. New sales stopped on June 4, 2025, and the platform will be completely offline by April 5, 2027. As an OpsGenie user, you now face a critical decision: Migrate to Jira Service Management (JSM), Atlassian’s recommended path, or choose a different solution. And if you’re not sure JSM is the right fit for your team’s alerting needs, this review will help you decide. I signed up for JSM and put it through real-world testing.

Read Post

Spike

Read more about Jira Service Management (JSM) Review for Alerting (2025)

Product Update - Turn Off Alerts, Use Microsoft Teams, and Custom Domains

Oct 29, 2025 By Hrishikesh Barua In IncidentHub

Over the last few months IncidentHub has added several new features to make it easier to fine tune your alerts. IncidentHub now also integrates with Microsoft Teams and supports custom domains for your public status pages. Let's take a comprehensive look at what's new.

Read Post