The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.
As autumn graced the vibrant city of Chicago, I had the distinct opportunity to immerse myself in the heart of innovation and camaraderie at the CNCF’s Kubecon North America conference. Over the span of four remarkable days, from Nov 6-9, I was fortunate enough to walk alongside the many enthusiasts, contributors and organizers of open source and cloud native communities.
As someone living the Honeycomb ops life for a while, SLOs have been the bread and butter of our most critical and useful alerting. However, they had severe, long-standing limitations. In this post, I will describe these limitations, and how our brand new feature, budget rate alerts, addresses them. We usually don’t have SREs writing product announcements, but I’m so excited about this one that I said, “Screw it, I’m doing it!”
Last week a major internet outage took out one of Australia’s biggest telecoms. In a statement out yesterday, Optus blames the hours-long outage, which left millions of Aussies without telephone and internet, on a route leak from a sibling company. In this post, we discuss the outage and how it compares to the historic outage suffered by Canadian telecom Rogers in July 2022.
Information technology (IT) departments are always juggling multiple tasks in the fast-paced world of modern business, from maintaining the network infrastructure to guaranteeing data security and enhancing system performance. Even the most seasoned IT professionals can get overwhelmed by the sheer volume of data created and the rising complexity of IT systems. Artificial intelligence for IT operations (AIOps) can revolutionize the way businesses manage their IT environments in this situation.
As a Site Reliability Engineer (SRE) or DevOps professional, you are well aware of the importance of observability in ensuring the smooth functioning and performance of your applications. Observing and monitoring your applications can help you identify and resolve issues in real-time, resulting in increased reliability and improved user experience. Logs play a crucial role in this process as they provide detailed information about the activity and behavior of your applications.