The latest News and Information on DevOps, CI/CD, Automation and related technologies.
Facebook’s October 2021 outage was the type of event that gives SREs nightmares: A series of critical business apps crashed in minutes and remained unavailable for hours, disrupting more than 3.5 billion users around the world and costing about 60 million dollars. As incidents go, this was a pretty big one.
In my interactions at industry events like AWS re:invent and KubeCon, I talk with a lot of developers. Devs often tell stories of things that prevent them from working quickly and efficiently. Many involve frustrating interactions with sys admins, SREs, or DevOps colleagues. One story I have heard several times involves a conversation like this: dev: Hey, SRE team. My build is failing and I don’t know what’s happening with the app in the build node.
If you’ve been building client websites for a while, you may remember a time before WordPress. A time when building websites meant creating every HTML page by hand. At some point, you probably decided that there were common features that every customer needed on their site, so you started using one customer’s website as the template for the next. Of course these days, WordPress is the underlying software for many modern websites, and there’s no need to re-invent core functionality.
Persistent storage is one of the more difficult aspects of managing distributed systems. When we attach a storage device to a host—whether it’s flash storage, network attached storage (NAS), or old fashioned spinning disks—we generally don’t give it much thought until we start running distributed applications or need to increase capacity. But there’s more that can go wrong with storage, and this can have unexpected consequences for our systems, services, and applications.
For decades, the development and operations teams within companies were siloed. Developers created the software. Operations tested and deployed it. But in 2009, IT consultant Patrick Debois coined the term “DevOps,” a merging of development and operations to improve communications, establish best practices and create feedback loops for organizations to keep improving the overall process.