OnlineOrNot

On moving over a million uptime checks per week onto fly.io

Jun 29, 2022 By Max Rozen In OnlineOrNot

The other day, a friend told me about fly.io's nice developer experience (DX). For my day job, I work on improving wrangler2's DX, so naturally it had me curious. I went from "I'll just play around with it, maybe give it a toy workload" to "holy shit, what if I quickly rewrite my business's AWS Lambda + SQS stack to fit entirely within their free tier" in about 90 minutes. It wasn't that simple in the end, but I did manage to migrate most of my active workload from AWS Lambda to fly.io.

Read Post

OnlineOrNot

Read more about On moving over a million uptime checks per week onto fly.io

What I learned running a SaaS for a year

Feb 21, 2022 By Max Rozen In OnlineOrNot

This time last year, I showed the internet a little prototype uptime checker I built using Next.js as the frontend, with services running on AWS Lambda. I gave myself one week to put it together.

Read Post

OnlineOrNot

Read more about What I learned running a SaaS for a year

How to monitor your uptime with OnlineOrNot

Feb 3, 2022 By Max Rozen In OnlineOrNot

Jumping into monitoring software for the first time can be pretty overwhelming. If you're not in an exploring mood, it can be easy to get lost, and you're not entirely sure what all these knobs and buttons do. To help lighten this feeling for OnlineOrNot, I thought it might be useful to let folks know how I use OnlineOrNot, to monitor OnlineOrNot (as part of running OnlineOrNot day to day).

Read Post

OnlineOrNot

Read more about How to monitor your uptime with OnlineOrNot

Communicating to Users During Incidents

Jan 23, 2022 By Max Rozen In OnlineOrNot

Imagine you're having a regular day at work, opening up your browser, double checking something for a client in that web app your team built for them, when suddenly, you see this screen: You hit refresh a few times, just to be sure. Nope. Still down. What happens next depends on how well your team has planned for incidents like this (some folks call it unplanned downtime).

Read Post

OnlineOrNot

Read more about Communicating to Users During Incidents

Improving your team's on-call experience

Jan 22, 2022 By Max Rozen In OnlineOrNot

Your engineers probably dislike going on-call for your services. Some might even dread it. It doesn't have to be this way. With a few changes to how your team runs on-call, and deals with recurring alerts, you might find your team starting to enjoy it (as unimaginable as that sounds). I wrote this article as a follow-up to Getting over on-call anxiety.

Read Post

OnlineOrNot

Read more about Improving your team's on-call experience

Getting over on-call anxiety

Jan 21, 2022 By Max Rozen In OnlineOrNot

You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.

Read Post

OnlineOrNot

Read more about Getting over on-call anxiety

Communicating to Users During Incidents

Jan 14, 2022 By Max Rozen In OnlineOrNot

Imagine you're having a regular day at work, opening up your browser, double checking something for a client in that web app your team built for them, when suddenly, you see this screen: You hit refresh a few times, just to be sure. Nope. Still down. What happens next depends on how well your team has planned for incidents like this (some folks call it unplanned downtime).

Read Post

OnlineOrNot

Read more about Communicating to Users During Incidents

What we learned from AWS's us-east-1 outage

Dec 8, 2021 By Max Rozen In OnlineOrNot

In case you missed it, for several hours on December 7, 2021, AWS's us-east-1 region had an outage impacting multiple AWS APIs, taking out various websites across the internet. According to our own monitoring at OnlineOrNot, the outage started at 2021-12-07 15:32 UTC and began to recover well at 2021-12-07 22:48 UTC (with minor signs of life for a few minutes around 2021-12-07 20:08 UTC). Had we relied solely on AWS to update their status page before reacting, we would have been waiting a while.

Read Post

OnlineOrNot

Read more about What we learned from AWS's us-east-1 outage

Dealing with Noisy Error Monitoring

Dec 1, 2021 By Max Rozen In OnlineOrNot

Say you've been tasked with monitoring an application, so you set up some alerts to let you know when errors are coming in. The minutes roll by, the errors start coming... ...and they don't stop coming... Oh my, there seems to be quite a few errors coming through. Alerting on each error isn't going to help, better report on changes in the error rate instead right? Not quite.

Read Post

OnlineOrNot

Read more about Dealing with Noisy Error Monitoring

Scaling AWS Lambda and Postgres to thousands of uptime checks

Nov 18, 2021 By Max Rozen In OnlineOrNot

When you're building a serverless web app, it can be pretty easy to forget about the database. You build a backend, send some data to a frontend, write some tests, and it'll scale to infinity with no effort, right? Not quite. Especially not with Postgres. As the number of users of your frontend increases, your app will open more and more database connections until the database is unable to accept any more. It gets worse.

Read Post

OnlineOrNot

Read more about Scaling AWS Lambda and Postgres to thousands of uptime checks

Subscribe to OnlineOrNot

Operations | Monitoring | ITSM | DevOps | Cloud

OnlineOrNot

On moving over a million uptime checks per week onto fly.io

What I learned running a SaaS for a year

How to monitor your uptime with OnlineOrNot

Communicating to Users During Incidents

Improving your team's on-call experience

Getting over on-call anxiety

Communicating to Users During Incidents

What we learned from AWS's us-east-1 outage

Dealing with Noisy Error Monitoring

Scaling AWS Lambda and Postgres to thousands of uptime checks

Monthly Archive

Follow Us