Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Cloud Imperium Games moves ELK stack with ChaosSearch.

Cloud Imperium Games (CIG) is a prominent video game development company known for its ambitious project, Star Citizen, which aims to be an open-world, massively multiplayer online space simulation game. As a result of the game's popularity, all the metrics, events, and logs, generated to track every single action during gameplay, also experienced explosive growth in terms of volume and also in diversity (a consequence of the dynamic and fast-paced development environment).

Demo of Internet Sonar: From Disruption to Instant Detection

Catchpoint's new Internet Sonar shows you global Internet status at a glance in an AI-powered, real-time, interactive dashboard and map. It answers the first question any IT team needs to ask when there's an outage: "Is it me, or is it something else?" Key product features: In this recorded live demo session, leaders from our Product team will walk you through how Internet Sonar works, how you can use it to lower MTTR, and how organizations are using it to save millions.

What is Zero Trust Reliability in engineering: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Production vs Local in engineering: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The Link Between Early Detection and Internet Resilience: A Lesson from Salesforce's Outage

Almost every study examining the hourly cost of outages invariably leads to a clear and undeniable conclusion: outages are expensive. According to a 2016 study, the average cost of downtime was estimated at approximately $9,000 per minute. In a more recent study, 61% of respondents stated that outages cost them at least $100,000, with 32% indicating costs of at least $500,000 and 21% reporting expenses of at least $1 million per hour of downtime.

The Single Pane of Glass in Modern Observability

Recently I caught up with Jamie Allen on Episode 67 of the Slight Reliability podcast to discuss the idea of a single pane of glass (SPOG). Jamie had written an article titled The Single Pain of Glass which coincidentally was what I titled Slight Reliability Episode 10. I thought given our shared use of puns and this topic that it was worth a conversation! So, what is a single pane of glass? Is it an idea with practical application? How does it fit into the world of modern observability?

Harmonizing Digital Channels and Business Operations to Deliver a Good Customer Experience

In celebration of Customer Experience Day 2023, this post is part of a series on customer experience and the ways that Splunk strifves to deliver superior customer experience at every level. Today, customers interact with brands through a variety of channels and platforms. In fact, 57% of customers prefer to engage with brands through digital channels first.

Simplifying Microsoft Teams Troubleshooting for IT Teams

Microsoft Teams has become the go-to platform for seamless collaboration and communication. However, like any technology, performance issues can arise, and these issues affect user experience and productivity. For IT teams tasked with Microsoft Teams troubleshooting, having access to comprehensive data is key. In this blog, we explore the challenges faced by IT teams and how harnessing more data can make the process significantly easier.