Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

How to Improve Downtime Response: Error Budgeting and Unplanned Downtime

Every one of us reading this blog has seen a fire spring up and quietly walked away from the impending chaos. And everyone one of us has managed to live this long because we understand when to react to a fire. A real fire affects our Service Level Objectives (SLO), and affects the user base. You need to figure out where it is, what started it, and what your team will do about it, and you need to do that now.

Why Your Status Page Matters and How to Use It

When an outage hits your service, everybody starts talking. Your engineers are talking about what caused the problem, and how to fix it; your management is asking about when it’ll be fixed; and your customers are telling the world that they’re not happy. But there’s an even more important conversation you should be having: communicating with your users about the issue.

January 2020 Outage Report

Welcome to 2020, where Google Drive can fail for some of you but not others, you can’t access your passwords, and you can’t withdraw cash on vacation. This stranded on a desert isle dream was reality in the month of January, which saw drama in the financial services and internet infrastructure sectors. January’s downtime reinforces just how connected we have become, and how reliant we are on infrastructure that can seemingly fail on a whim.

Transaction Monitoring | Upgrades and Use Cases in 2020

Synthetic monitoring takes care of all of the small interactions on our website that QA can’t catch. If you’re building an application for the web, a transaction check is an integral part of proactive downtime resolution. What we call transaction monitoring, or a transaction check, is a set of instructions that a probe server follows.

Got Game? Secrets of Great Incident Management

When his phone wakes him at two in the morning, operations engineer Andy Pearson knows it’s bad news. There’s a major server problem, and hundreds of client websites are down. Automated monitoring checks detected the outage within seconds, and paged the on-call engineer. This time, it’s Pearson in the hot seat. Pearson quickly confirms the issue is real and, escalates it to his boss, tech lead Lewis Carey.

December 2019 Outage Report

December was a busy month with systems we don’t normally see experiencing server downtime. Our first story, in particular, is an excellent example of how complicated monitoring can get as infrastructure grows. We saw every level get hit, from the government to big-name players, with ransomware being one of the major thorns in our collective sides. But we also bring you the heartwarming tale of the little Minecraft Server that could.

What We Learned About Uptime from 2019 Website Outages

One thing we’ve always known: there’s no such thing as 100% uptime for any website. Too many variables are at play to keep a site from staying up all the time. From traffic surges to hardware failures and everything in between, keeping sites up and running is a full-time job for SREs and IT pros. Here at Uptime.com, we track major downtime all year long to provide websites of all sizes with lessons in how to catch downtime and resolve incidents quickly.

Black Friday 2019 Website Performance Report by Uptime.com

Tis’ the season for website downtime. No matter how great your web infrastructure is, many ecommerce sites experience downtime on Black Friday. You’re not alone. New research by Google Cloud says that one out of ten retail executives reported downtime on Black Friday 2018, and four out of ten reported outages during one of the past three years.

October 2019 Social Media Outages by Uptime.com

This past month was haunted with social media outages, finance, and tech services. DDoS attacks affected entire countries and broadband services in Africa. We’re continuing to see a pattern of outages from social media services with multiple applications. When one of these apps goes down, at least one of the others usually goes as well. The worst of these outages occurred in March, when Facebook’s entire suite of apps went down.