Latest Posts

The unreasonable effectiveness of shipping every day

Aug 26, 2022 By Max Rozen In OnlineOrNot

It's fairly common for folks in tech to dream of quitting their day job and working on their side projects. I find when you ask them how their projects are going, they tend to have 2-3 projects running at the same time, none of the projects are actually available for potential users to try out. The question they seem to ask me most is "you seem to complete your projects, how do you stay motivated?" My secret? It's a habit. I ship something every day.

Read Post

OnlineOrNot

Read more about The unreasonable effectiveness of shipping every day

How I accidentally told 19k people Hacker News was down

Jul 12, 2022 By Max Rozen In OnlineOrNot

In case you missed it, Hacker News had an extremely rare outage last week.

Read Post

OnlineOrNot

Read more about How I accidentally told 19k people Hacker News was down

On moving over a million uptime checks per week onto fly.io

Jun 29, 2022 By Max Rozen In OnlineOrNot

The other day, a friend told me about fly.io's nice developer experience (DX). For my day job, I work on improving wrangler2's DX, so naturally it had me curious. I went from "I'll just play around with it, maybe give it a toy workload" to "holy shit, what if I quickly rewrite my business's AWS Lambda + SQS stack to fit entirely within their free tier" in about 90 minutes. It wasn't that simple in the end, but I did manage to migrate most of my active workload from AWS Lambda to fly.io.

Read Post

OnlineOrNot

Read more about On moving over a million uptime checks per week onto fly.io

What I learned running a SaaS for a year

Feb 21, 2022 By Max Rozen In OnlineOrNot

This time last year, I showed the internet a little prototype uptime checker I built using Next.js as the frontend, with services running on AWS Lambda. I gave myself one week to put it together. I wrote a few articles about how the business was going throughout the year: The gist of my approach is as follows: I started with a single Lambda function that checks if static websites were still online, added an email alert if it's offline, wrapped authentication around it, integrated Stripe, and shipped it.

Read Post

OnlineOrNot

Read more about What I learned running a SaaS for a year

How to monitor your uptime with OnlineOrNot

Feb 3, 2022 By Max Rozen In OnlineOrNot

Jumping into monitoring software for the first time can be pretty overwhelming. If you're not in an exploring mood, it can be easy to get lost, and you're not entirely sure what all these knobs and buttons do. To help lighten this feeling for OnlineOrNot, I thought it might be useful to let folks know how I use OnlineOrNot, to monitor OnlineOrNot (as part of running OnlineOrNot day to day). Also, our friends at DebugBear wrote a similar article about how DebugBear uses DebugBear to keep their site fast.

Read Post

OnlineOrNot

Read more about How to monitor your uptime with OnlineOrNot

Communicating to Users During Incidents

Jan 23, 2022 By Max Rozen In OnlineOrNot

Imagine you're having a regular day at work, opening up your browser, double checking something for a client in that web app your team built for them, when suddenly, you see this screen: You hit refresh a few times, just to be sure. Nope. Still down. What happens next depends on how well your team has planned for incidents like this (some folks call it unplanned downtime).

Read Post

OnlineOrNot

Read more about Communicating to Users During Incidents

Improving your team's on-call experience

Jan 22, 2022 By Max Rozen In OnlineOrNot

Your engineers probably dislike going on-call for your services. Some might even dread it. It doesn't have to be this way. With a few changes to how your team runs on-call, and deals with recurring alerts, you might find your team starting to enjoy it (as unimaginable as that sounds). I wrote this article as a follow-up to Getting over on-call anxiety.

Read Post

OnlineOrNot

Read more about Improving your team's on-call experience

Getting over on-call anxiety

Jan 21, 2022 By Max Rozen In OnlineOrNot

You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.

Read Post

OnlineOrNot

Read more about Getting over on-call anxiety

Communicating to Users During Incidents

Jan 14, 2022 By Max Rozen In OnlineOrNot

Imagine you're having a regular day at work, opening up your browser, double checking something for a client in that web app your team built for them, when suddenly, you see this screen: You hit refresh a few times, just to be sure. Nope. Still down. What happens next depends on how well your team has planned for incidents like this (some folks call it unplanned downtime).

Read Post

OnlineOrNot

Read more about Communicating to Users During Incidents

What we learned from AWS's us-east-1 outage

Dec 8, 2021 By Max Rozen In OnlineOrNot

In case you missed it, for several hours on December 7, 2021, AWS's us-east-1 region had an outage impacting multiple AWS APIs, taking out various websites across the internet. According to our own monitoring at OnlineOrNot, the outage started at 2021-12-07 15:32 UTC and began to recover well at 2021-12-07 22:48 UTC (with minor signs of life for a few minutes around 2021-12-07 20:08 UTC). Had we relied solely on AWS to update their status page before reacting, we would have been waiting a while.

Read Post

OnlineOrNot

Read more about What we learned from AWS's us-east-1 outage

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

The unreasonable effectiveness of shipping every day

How I accidentally told 19k people Hacker News was down

On moving over a million uptime checks per week onto fly.io

What I learned running a SaaS for a year

How to monitor your uptime with OnlineOrNot

Communicating to Users During Incidents

Improving your team's on-call experience

Getting over on-call anxiety

Communicating to Users During Incidents

What we learned from AWS's us-east-1 outage

Monthly Archive

Follow Us