Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

GDPR Log Management: A Practical Guide for Engineers

GDPR compliance for logs can be tricky—especially when you're trying to maintain system visibility and protect user data at the same time. For SREs and IT teams, it’s a balancing act between staying on the right side of privacy laws and not losing the context you need to troubleshoot. This guide walks through practical ways to handle personal data in logs, set up retention rules that make sense, and stay compliant without creating unnecessary friction.

Why Reliability Starts with the Network, even in the AI era, with Marino Wijay

In this episode, we explore how networking has shaped reliability as we know it. Marino Wijay cloud networking expert and Staff Solutions Architect at Kong shares how his journey began not as an SRE, but with cables, routers, and switches. Marino explains the evolution of the fabric holding systems together through virtualization, and how software-defined networking, which is now a key element to resilient applications.

The New Rootly Ringtones: How Research-based On-Call Sounds

We set out to create a ringtone that wasn’t just loud—but the sound of a modern pager. Something that wakes you up, but without triggering a full-blown adrenaline spike. In this video, go behind the scenes with sound engineer Gorjão as he crafts a how research-based on-call sound sounds like.

A Closer Look at Docker Build Logs for Troubleshooting

In the world of containerization, understanding what's happening under the hood during image builds can mean the difference between smooth deployments and frustrating debugging sessions. Docker build logs are your window into this process, offering crucial insights that help you optimize builds, troubleshoot errors, and maintain robust container infrastructure.

How to Connect ELK Stack with Grafana

In today’s distributed systems world, you need clear visibility into logs, metrics, and everything in between to keep systems healthy and reliable. That’s where the ELK Stack and Grafana work well together—each solving a different part of the observability puzzle. ELK handles the heavy lifting of log collection and processing. Grafana adds intuitive dashboards and powerful visualizations.

Log Consolidation Made Easy for DevOps Teams

Managing multiple systems that each generate their alerts and logs can quickly become overwhelming. The challenge of scattered logs is a real headache, especially in the fast-paced world of DevOps. Log consolidation is not just a convenience—it's an essential practice that can save you from chaos and improve your operational efficiency. This guide covers everything you need to know about log consolidation, from understanding what it is and why it matters, to practical steps for making it work.

APM Observability: A Practical Guide for DevOps and SREs

Modern application architectures have evolved from simple monoliths to complex distributed systems spanning multiple environments. This evolution has transformed how we approach monitoring and troubleshooting. Traditional monitoring methods that focus solely on uptime and basic health checks are no longer sufficient for understanding system behavior in cloud-native environments.

Everything You Need to Know to Start Monitoring Postgres

Keeping your Postgres databases healthy is non-negotiable if you care about application performance and reliability. But monitoring Postgres the right way? That’s where things get tricky. Between the sheer volume of metrics and the noise that comes with them, it’s not always obvious what to pay attention to—or when. This guide breaks things down with a focus on what matters in real-world production setups.

Histogram Buckets in Prometheus Made Simple

Staring at a monitoring dashboard and still feeling like you're missing half the picture? Happens more often than you'd think. Especially when you're dealing with metrics like request durations or payload sizes—data that doesn’t behave nicely or fit into neat little averages. This is where Prometheus' histogram buckets step in. They're not just another metric type; they're a better way to track the messy, uneven world of performance data.