Operations | Monitoring | ITSM | DevOps | Cloud

Using Amazon RDS for high availability: How monitoring ensures reliable failover

Database downtime can lead to significant disruptions, revenue loss, and frustrated users. Amazon Relational Database Service (RDS) provides a managed database solution with high availability and automated failover to minimize such risks. However, continuous monitoring is crucial to ensuring reliable failover and minimizing downtime by detecting potential issues before they impact operations.

Managing Multiple Service Instances with a Systemd Generator

When working with systemd services in Linux, you might encounter situations where multiple instances of a service need to be managed dynamically. When I had to develop a solution to monitor multiple Kubernetes clusters with Icinga for Kubernetes, I ran into exactly this challenge.

Why Context Matters: Mastering Serverless App Monitoring

Hi there, and welcome to the second video in this series on observing AWS serverless applications with Datadog. In this video, you’ll learn how important it is to add custom business context to the telemetry you send to Datadog and how you can use that inside APM to quickly diagnose and debug issues. You’ll walk away with an understanding of the importance of distributed tracing, as well as how you can add specific business context to the telemetry you send.

Netdata vs. Prometheus: Which Monitoring Tool is Right for You? #monitoring #realtime

Netdata's founder Costa Tsaousis built Netdata with performance and efficiency in mind. The result? 8x less RAM usage, 30x less disk I/O, 40x more data retention, 40x more data stored, and up to 22x faster queries—all thanks to our innovative tiered storage system, enabling ultra-efficient long-term queries.

State of DevOps: 2024 DORA Report Insights with Google

Enjoy this exclusive webinar with Ben Good from Google as we explore the findings in the 2024 State of DevOps report. For over a decade, the DORA report has provided critical insights into the capabilities and practices that fuel high-performing technology organizations. This report highlights the significant impact of AI on software development, explores platform engineering’s promises and challenges, and emphasizes user-centricity and stable priorities for organizational success.

GTMetrix Alternatives: The Best Tools for Website Performance Testing

GTMetrix used to be the go-to tool for checking website speed, but let’s be honest—paying for one-off synthetic tests isn’t worth it. If you’re still relying on synthetic testing alone, you’re missing a big part of the web performance picture. If you care about Core Web Vitals, SEO performance, and user experience, you need more than just lab data. The good news? There are better (and free) alternatives like PageSpeed Insights and WebPageTest for synthetic testing.

How to Implement OpenTelemetry in NestJS

Modern applications are becoming increasingly complex, and debugging distributed systems can feel like searching for a needle in a haystack. This is where OpenTelemetry (OTel) comes in. If you're using NestJS, integrating OpenTelemetry can provide deep insights into your application's behavior, helping you track performance, troubleshoot issues, and understand service interactions.

Pino Logger: The Fastest and Efficient Node.js Logging Library

Logging is an integral part of any production-ready Node.js application. Whether you're debugging issues, monitoring application performance, or setting up a centralized logging system, an efficient logger is crucial. Pino is one of the best choices available due to its speed, low overhead, and powerful features. This guide goes beyond the basics, providing an in-depth exploration of how to optimize Pino for your applications, use advanced features, and integrate it seamlessly with other tools.

Elasticsearch Reindex API: A Guide to Data Management

If you've been working with Elasticsearch for a while, you’ll eventually run into a situation where you need to reindex your data. Maybe you’re changing mappings, upgrading versions, or restructuring your documents. That’s where the Elasticsearch Reindex API comes in. In this guide, we'll walk through everything you need to know about the Reindex API—what it is, how it works, common use cases, performance optimizations, and potential pitfalls. Let’s dive in.

Fine-tune notifications with Alert sensitivity

We’re excited to introduce a new feature that gives you greater control over how and when you receive alerts from your website and ping monitors. With Alert sensitivity, you can now specify the number of retries before an alert is triggered, reducing false alarms and ensuring more reliable notifications.