Operations | Monitoring | ITSM | DevOps | Cloud

Don't get caught in the dark: Lessons from a Lumen & AWS micro-outage

While major outages like the recent CrowdStrike incident dominate headlines, those of us in the trenches ensuring Internet Resilience know that most of our issues are not necessarily global but localized by geography, autonomous systems, or something else. Micro-outages – those elusive, localized incidents – can pose the most persistent threat to observability.

How DPM monitoring helps you manage your metrics volume

At Sumo Logic, we’re committed to helping you scale without breaking your budget. As you may have heard, we recently launched Flex Licensing, a first-of-its-kind economic model that offers free, unlimited log data ingest so different teams can capture and analyze critical data across their enterprise in one place. We’re also committed to tackling related challenges raised by other data sources — like metrics.

Understanding and Controlling AWS Transit Gateway Costs with Kentik

AWS Transit Gateway costs are multifaceted and can get out of control quickly. In this post, discover how Kentik can help you understand and control the network traffic driving AWS Transit Gateway costs. Learn how Kentik can help you understand traffic patterns, optimize data flows, and keep your Transit Gateway costs in check.

All about span events: what they are and how to query them

If you’re already familiar with distributed tracing, you know that spans are the building blocks of traces. But are you sleeping on what span events can do for you? First, you may need a wake-up call as to what a span event even is. While spans represent units of work or operation within a trace, a span event is a unique point in time during the span’s duration.

BigPanda and ServiceNow improve IT service management

By breaking down the silos between observability, IT operations, and service management, teams can improve service delivery and enhance IT incident management. However, this is more easily said than done. The average BigPanda customer uses more than 20 observability and monitoring data sources. Combining mountains of alert data with legacy event management systems can make it almost impossible to sift through the noise to find the most important alerts.

A CoPE's Guide to Alert Management

Alerts are a perennial topic, and a CoPE will need to engage with them. The bounds of this problem space are formed by two types of alerts: Understanding what these alerts are and how to configure them is one thing. Thinking about what they each do for your organization, and how using one or the other affects things, is another. The latter will be the focus of this article.

IBM partners with Elasticsearch to deliver Conversational Search with watsonx Assistant

To meet customer needs for scale, speed, and precision, IBM partners with Elasticsearch to deliver retrieval augmented generation (RAG) capabilities that can be seamlessly integrated into the IBM watsonx Assistant’s new Conversational Search feature. Customers using IBM watsonx Assistant and watsonx Orchestrate can now build conversational AI assistants grounded on their company data with comprehensive search capabilities with RAG.

What is High Packet Loss & How to Fix It

Believe it or not, seamless and efficient data transmission is super important for both businesses and individual users and impacts a variety of different applications and services. However, one of the common issues that can disrupt this flow is packet loss, which can lead to a variety of network performance problems. High packet loss is particularly concerning since it can severely impact the quality of VoIP calls, video conferencing, online gaming, and even basic web browsing.

Everything you need to know about Large AI Model Training

When looking back at the role artificial intelligence (AI) has played in revolutionizing different industries that would typically require human intelligence, it is important to consider the next steps in this journey and how it is starting to evolve. With the growth of the industry, the volume and complexity of data are becoming unmanageable for pre-existing AI models.