%term

The latest News and Information on Service Reliability Engineering and related technologies.

Is self-healing the future? w/ Zscaler VP of SRE #ai #devops

Aug 11, 2025 By Rootly In Rootly

View Video

Rootly

Read more about Is self-healing the future? w/ Zscaler VP of SRE #ai #devops

Log Format Standards: JSON, XML, and Key-Value Explained

Aug 6, 2025 By Faiz Shaikh In Last9

Your log format defines how your application records events. The structure you choose shapes how logs get parsed, indexed, and queried. It affects how quickly you can debug issues, build alerts, or control storage usage. In this guide, we'll take a look at the log formats developers typically use, the essential fields to include, and what trade-offs to consider before locking down a format for your system.

Read Post

Last9

Read more about Log Format Standards: JSON, XML, and Key-Value Explained

PostgreSQL Performance: Faster Queries and Better Throughput

Aug 5, 2025 By Faiz Shaikh In Last9

A PostgreSQL setup that performed well with 10,000 users starts to show strain at 100,000. Queries that once returned in under 50ms now take over 2 seconds. The connection pool regularly hits its limit during peak usage, leading to timeouts and degraded performance. This blog focuses on practical ways to reduce query latency by 50–80% and increase throughput for high-concurrency environments.

Read Post

Last9

Read more about PostgreSQL Performance: Faster Queries and Better Throughput

What are Application Metrics?

Aug 4, 2025 By Anjali Udasi In Last9

Application metrics are structured, quantifiable signals that reflect how your software behaves in production. They capture key aspects of performance, response times, error rates, throughput, and resource usage, giving you a real-time view into the health of your system. Tracking the right metrics helps detect regressions early, surface latent issues before they impact users, and guide optimization decisions based on hard data, not guesswork.

Read Post

Last9

Read more about What are Application Metrics?

Jaeger Monitoring: Essential Metrics and Alerting for Production Tracing Systems

Aug 1, 2025 By Anjali Udasi In Last9

Your Jaeger setup is running. Traces are coming in, and the UI is helping you spot slow services or debug broken flows. But just like any part of your observability stack, Jaeger needs some basic monitoring to stay reliable. If the collector starts queueing spans or the agent runs out of buffer, it can lead to dropped traces, sometimes without any obvious sign in the UI. This blog focuses on the operational side of Jaeger.

Read Post

Last9

Read more about Jaeger Monitoring: Essential Metrics and Alerting for Production Tracing Systems

Incident Management Software for 2025: Revolutionizing Efficiency in Crisis Handling

Jul 30, 2025 By Vishal Padghan In Squadcast

With the growing reliance on technology and complex IT infrastructures, having a robust Incident Management software is no longer a luxury but a necessity. As we step into 2025, organizations are seeking more sophisticated, intuitive, and scalable solutions to streamline their Incident Response Workflows and ensure uninterrupted service delivery.

Read Post

Squadcast

Read more about Incident Management Software for 2025: Revolutionizing Efficiency in Crisis Handling

Why Observability Isn't Just for SREs (and How Devs Can Get Started)

Jul 30, 2025 By Elizabeth Mathew In SigNoz

Almost every other day, when I scroll past r/devops or r/sre, I see a post like this asking how a dev can get started with devops, observability, etc. Sample Reddit thread on how to get started with OTel This blog is an attempt for anyone lost to find their way into observability and a wake-up call for devs to they should think about observability more actively today than ever before. A dev’s observability playbook.

Read Post

SigNoz

Read more about Why Observability Isn't Just for SREs (and How Devs Can Get Started)

This Month in Datadog: Bits AI SRE, Datadog Data Observability, and more

Jul 30, 2025 By Datadog In Datadog

Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. To learn more about Datadog and start a free 14-day trial, visit Cloud Monitoring as a Service | Datadog. This month, we chat with two guests about Bits AI SRE and Datadog Data Observability.

View Video

Datadog

Read more about This Month in Datadog: Bits AI SRE, Datadog Data Observability, and more

New in OTel: Auto-Instrument Your Apps with the OTel Injector

Jul 29, 2025 By Anjali Udasi In Last9

As distributed systems scale, maintaining manual instrumentation across services quickly becomes unsustainable. The OTel Injector addresses this by automatically attaching OpenTelemetry instrumentation to applications, no code changes needed. This blog covers how the OTel Injector works, how it integrates with Linux environments, and how to set it up for consistent telemetry across your stack.

Read Post

Last9

Read more about New in OTel: Auto-Instrument Your Apps with the OTel Injector

Why Your Loki Metrics Are Disappearing (And How to Fix It)

Jul 29, 2025 By Faiz Shaikh In Last9

Grafana Loki is up and running, log ingestion looks healthy, and dashboards are rendering without issues. But when you query logs from a few weeks ago, the data's missing. This is a recurring problem for many teams using Loki in production: while the system handles short-term log visibility well, it often lacks the retention guarantees developers expect for historical analysis and incident review.

Read Post