Monthly Archive

Customers over control: how we measure On-call reliability

May 28, 2026 By Article In Incident.io

Our On-call product has a lot of great features: configuring escalation paths, viewing rotas and schedules, requesting cover, etc. However, when framing its reliability, we reduce it down to two critical pieces of functionality: It’s not that we’re happy if only these parts are working, but they are the most important parts. In this post, I'll go into more detail on how we think about their reliability.

Read Post

Incident.io

Read more about Customers over control: how we measure On-call reliability

Engineering teams in 2027

May 19, 2026 By Article In Incident.io

There's a conversation I keep having with our design partners at incident.io. It starts when I ask "what are you doing with AI internally?" and lands in a similar place every time. The shape of how their engineering teams work is changing fast. Not in vague "AI is transforming everything" ways, but in concrete, repeatable patterns. Different companies are building the same things. The frontier teams are six to twelve months ahead of the average, and they're describing the same future.

Read Post

Incident.io

Read more about Engineering teams in 2027

PagerDuty Rescue Program

May 13, 2026 By incident-io In Incident.io

We're announcing the PagerDuty Rescue Program. PagerDuty worked. For a long time, it was the standard. But the world's changed, and PagerDuty hasn't. The single biggest reason teams stay on PagerDuty isn’t the product - it’s the pain of leaving. So, we’ve removed every barrier. You've wanted out for a while. Now, nothing is stopping you.

View Video

Incident.io

Incident Management

Read more about PagerDuty Rescue Program

Humans aren't fast enough for 4 9's

May 11, 2026 By Article In Incident.io

When thinking about Service Level Objectives (SLOs) and contractual Service Level Agreements (SLAs) for availability, I always like to put the percentages into concrete numbers. It’s easy to lose track of what’s meant when saying “99.95%” availability, and even more is lost when thinking how much harder it is to achieve 99.99% compared to 99.95%. On a monthly basis, and in concrete terms, 99.95% availability means you get 21 minutes and 55 seconds of downtime.

Read Post

Incident.io

Read more about Humans aren't fast enough for 4 9's

Operations | Monitoring | ITSM | DevOps | Cloud

Customers over control: how we measure On-call reliability

Engineering teams in 2027

PagerDuty Rescue Program

Humans aren't fast enough for 4 9's

Monthly Archive

Follow Us