Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How to use Prometheus to efficiently detect anomalies at scale

When you investigate an incident, context is everything. Let’s say you’re working on-call and get pinged in the middle of the night. You open the alert and it sends you to a dashboard where you recognize a latency pattern. But is the spike normal for that time of day? Is it even relevant? Next thing you know, you’re expanding the time window and checking other related metrics as you try to figure out what’s going on. It’s not to say you won’t find the answers.

Splunking GenAI Applications for Observability Insights

Has your organization finally developed that game changing generative AI application? Is your CTO, CIO, or CEO banking on it being a success? I bet they are! Now, here’s the big question: Are you prepared to monitor and troubleshoot your new application once users get engaged? Fear not, my boy Derek Mitchell has you covered with two incredible Splunk Lantern articles which goes deep into how Splunk Observability Cloud allows you to instrument GenAI apps to gain critical observability insights.

Handling Kafka Partition Rebalancing Issues

If you’ve been working with Kafka long enough, you know its power when it comes to real-time data streaming. But, like any complex system, it comes with its own set of headaches—especially when it comes to partition rebalancing. One day your cluster is humming along, and the next, a rebalance kicks in, and suddenly you’re staring at a bunch of overloaded brokers and bottlenecked data flows.

Retail ITOps: Boost Operational Resilience with Business Service Observability

david.arrowsmith • Oct 03, 2024 In today’s competitive and fast-paced retail environment, service availability is paramount to delivering exceptional customer experiences. As an ITOps Manager or Site Reliability Engineer in a large retail enterprise, you're tasked with managing complex, interdependent systems that support vital business functions such as supply chain operations, point-of-sale (POS) systems, and inventory management.

Azure Cost Allocation to manage Azure spend and get the most out of it

As more and more companies are moving their operations to the cloud, cost management becomes one of the significant concerns, and Azure cost allocation provides the solution for it. Due to the fact that more and more companies use Azure services, it has become crucial to define the correct Azure cost allocation and resource management.

SolarWinds Day | Observability Anywhere. Precision Everywhere.

SolarWinds is expanding its cloud-monitoring capabilities across our self-hosted and SaaS observability offerings. In this video, we'll explore new and expanded capabilities for our observability solutions and learn how this increased functionality enables IT teams or organizations to decide for themselves how they monitor and manage their hybrid IT.