Operations | Monitoring | ITSM | DevOps | Cloud

Global Event Rulesets: Streamlining Alert Routing Across Services

In the fast-paced world of organizations handling numerous microservices and projects, tackling the challenges that arise can be a daunting task. As many of our customers come with infrastructures that included a large number of microservices we set out to make it easier for them to streamline alert source management. Enter Global Event Rulesets (GER). This feature is designed to redefine the way you manage alerts.

Important Situations Every Business Owner Should Be Aware Of

As a business owner, you need to stay on top of all the important situations that can impact your business. From legal questions and financial concerns to customer disputes and HR matters, there is no shortage of scenarios that require careful thought, communication, and attention - often with the help of experts in their respective fields.

Whose fault was it anyway? On blameless post-mortems

No one wants to be on the receiving end of the blame game—especially in the wake of a major incident. Sure, you know you were the one who made the final change that caused the incident. And hopefully, it was a small one that didn’t cause any SEV-1s. Still, the weight of knowing you caused something bad should be enough, right? Unfortunately, sometimes fingers get pointed, your name gets called, and suddenly, everyone knows that you’re the person who created more work for everyone.

The Link Between Early Detection and Internet Resilience: A Lesson from Salesforce's Outage

Almost every study examining the hourly cost of outages invariably leads to a clear and undeniable conclusion: outages are expensive. According to a 2016 study, the average cost of downtime was estimated at approximately $9,000 per minute. In a more recent study, 61% of respondents stated that outages cost them at least $100,000, with 32% indicating costs of at least $500,000 and 21% reporting expenses of at least $1 million per hour of downtime.

The Single Pane of Glass in Modern Observability

Recently I caught up with Jamie Allen on Episode 67 of the Slight Reliability podcast to discuss the idea of a single pane of glass (SPOG). Jamie had written an article titled The Single Pain of Glass which coincidentally was what I titled Slight Reliability Episode 10. I thought given our shared use of puns and this topic that it was worth a conversation! So, what is a single pane of glass? Is it an idea with practical application? How does it fit into the world of modern observability?

Charmed Kubeflow 1.8 Beta is here

Have you heard the news? Charmed Kubeflow 1.8 is available in Beta. Kubeflow is the foundation of Canonical MLOps. The latest release brings improved capabilities to personalise different components of the platform, including the images that can be used in Notebooks. We are looking for data scientists, machine learning engineers, creators and AI enthusiasts to take Charmed Kubeflow 1.8 Beta for a test drive and share their feedback with us.

Harmonizing Digital Channels and Business Operations to Deliver a Good Customer Experience

In celebration of Customer Experience Day 2023, this post is part of a series on customer experience and the ways that Splunk strifves to deliver superior customer experience at every level. Today, customers interact with brands through a variety of channels and platforms. In fact, 57% of customers prefer to engage with brands through digital channels first.

Simplifying Microsoft Teams Troubleshooting for IT Teams

Microsoft Teams has become the go-to platform for seamless collaboration and communication. However, like any technology, performance issues can arise, and these issues affect user experience and productivity. For IT teams tasked with Microsoft Teams troubleshooting, having access to comprehensive data is key. In this blog, we explore the challenges faced by IT teams and how harnessing more data can make the process significantly easier.