Operations | Monitoring | ITSM | DevOps | Cloud

5 things to do before you go on-call for the first time

Going on-call for the first time can feel a bit overwhelming, but a little prep work makes it smooth and stress-free. This guide covers five things to set up before you start your first on-call shift. They help you stay on top of your schedule, get on-call notifications, and have a backup in place. By the end, you’ll be ready to handle your first on-call shift with confidence.

Getting started with on-call

Setting up on-call is simpler than it seems. It comes down to a few clear decisions about your team and what your service actually needs. This guide walks you through those decisions. You’ll learn who to add in your rotation, how long shifts should last, when to hand off, and what coverage makes sense for your service. By the end, you’ll know exactly how to set up your first schedule and move from ad-hoc firefighting to organized incident response.

A Recap of 2025

In the past, our yearly recaps were mostly about numbers. What we shipped, how much Spike grew, and a long list of stats. See past recaps: 2023, 2024. But 2025 felt different to me. It had many moments that shaped how Spike as a product and the company looks today. Some of them were exciting. Some were uncomfortable, and all of them changed how I think about building Spike. We’re still bootstrapped and operating lean, with a team of fewer than ten people.

Introducing a More Flexible On-Call Schedule

Today, we are introducing some new on-call features: Add Gaps to on-call, Scheduled Layers, Handoff Days, and more. Flexibility in on-call schedules has been the single focus point in this release. These features give you much finer control over when people are on-call, how handoffs work, and what your schedule looks like around holidays and time off.

Incident Postmortem: How to Learn From Failures and Build Reliable Systems

When the issue settles, and systems are back, one question always remains: What actually happened, and how do we stop it from happening again? That’s where incident postmortems come in. Not just as documentation, but as a structured way to learn, improve reliability, and replace guessing with clarity. A good postmortem isn’t about blame, heroics, or perfect narratives. It’s about truth, learning, and building systems that get stronger with every failure.

7 Common Incident Response Challenges and How to Overcome Them

Incident response teams deal with several challenges. Alert noise, unclear ownership, lack of automation, and more. It’s important to keep an eye on these challenges and resolve them from time to time because they can turn minor issues into major outages. In this blog, we’ll discuss some of the common incident response challenges, how they affect, and how you can resolve them. Let’s dive in!

Incident Response Team: Roles, Responsibilities, and Structure Explained

Incidents don’t wait. They hit production, disrupt users, and pull teams into long recovery cycles. And a well-structured incident response team helps you move fast, limit damage, and restore services without chaos. In this blog, we’ll explain what an incident response team is, its key functions, team composition, and different types of teams. Let’s get started!

4 Golden Signals of System Reliability: A Practical Guide for Your Team

Modern systems produce endless streams of metrics. CPU usage, request volume, cache hit rates, node counts, queue depth, the list keeps growing. With this much data, it’s easy for teams to get lost in dashboards without knowing what actually matters. That’s why DevOps and SRE teams rely on the 4 Golden Signals of System Reliability. They provide the simplest and clearest way to understand user experience and system health.