Operations | Monitoring | ITSM | DevOps | Cloud

Amazon Cognito outage: How StatusGator notified customers 30 minutes before Amazon did

On December 12, 2024, Amazon Cognito experienced a significant outage in the US-EAST-1 (N. Virginia) region, impacting authentication for numerous applications. This operational issue, caused by a configuration change deployment, led to widespread “TooManyRequestsException” errors for several hours. Many Amazon Cognito users were left scrambling to figure out why their application was down, why users could authenticate, and how to get back up and running.

What is API Monitoring? Importance, Tools & Strategies

API Monitoring is the process of continuously observing and testing APIs to ensure they perform as expected, maintain uptime, and deliver the desired functionality. This includes tracking metrics such as API availability, uptime, latency, and response times. Whether you’re dealing with a REST API, a web API, or a microservices architecture, it’s important to understand that monitoring is essential for detecting issues before they impact end-users.

12 Ways We Sleighed Innovation This Year

As we wrap up an incredible year, it’s the perfect time to celebrate Cribl’s progress and innovation in 2024! This year brought many exciting features designed to solve real-world problems and make life easier for our customers. In the spirit of reflection and festivity, I’ll highlight twelve game-changing product features, releases, and enhancements— each a testament to listening, learning, and delivering value to you, our users.

AI Log Analysis - Shaping the Future of Observability

As digital applications and infrastructures grow increasingly complex, managing and understanding log data has become increasingly vital in achieving practical observability, enabling organizations to detect, diagnose, and prevent issues across their systems. However, traditional log analysis methods often struggle with the volume and complexities of modern log data in cloud-native environments.

Full-Stack Observability with OpenTelemetry and DX Operational Observability

DX Operational Observability (DX O2) from Broadcom supports ingestion and retention of OpenTelemetry (OTel) data. Teams who have instrumented applications with OpenTelemetry SDKs and APIs can now ingest telemetry into DX O2 using the OpenTelemetry Collector, a core component of OpenTelemetry, and the OTel Collector Exporter, which is now available through early access in DX O2.

Essential Linux Patch Management: Tools And Best Practices For Success

Let’s talk Linux Patch Management—a crucial practice in today’s cybersecurity landscape. In 2023, cybercriminals doubled their exploitation of Linux vulnerabilities compared to the previous year, according to Kaspersky. Although there has been a slight decrease in 2024, the trend persists due to the increasing popularity of Linux systems. This growth makes Linux Patch Management more important than ever for businesses of all sizes.

Windows Patch Management: Tips & Tools For Total Protection

Let’s talk numbers for a second. Did you know that Microsoft’s Windows dominates over 68% of the desktop operating systems market worldwide (Statista)? With such global dominance, ensuring effective Windows Patch Management to keep systems secure and running smoothly isn’t just important—it’s absolutely essential. And that’s exactly what we’re here to discuss. In this article, we’ll cover: Ready to take control of your Windows patching process?

Grafana Labs: Top 10 moments of 2024

2024 was a year of making connections. The open source community gathered in person for GrafanaCON for the first time in five years — meeting in Amsterdam to celebrate Grafana 11, Loki 3.0, a new open source project (cue Grafana Alloy), and more. TailCtrl, an early-stage company that specializes in adaptive trace sampling, joined Grafana Labs to advance our Adaptive Telemetry story (welcome, founder Sean Porter!).