%term

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Automatically Catch Failed HTTP Requests in your Playwright Tests!

Jul 29, 2025 By Checkly In Checkly

In this video, Stefan (Playwright Ambassador) dives into his favorite topic, Playwright fixtures, and shows how to set up automatic network monitoring in your Playwright end-to-end tests to catch failed HTTP requests (404s, 500s) to maintain high-quality standards.

View Video

Checkly

Read more about Automatically Catch Failed HTTP Requests in your Playwright Tests!

Data Observability: Build confidence in the data life cycle

Jul 29, 2025 By Datadog In Datadog

Datadog Data Observability provides a complete solution with quality checks (e.g., volume, row changes, freshness), custom SQL-based monitors, anomaly detection, column-level lineage across systems like Snowflake and Tableau, full pipeline visibility, and targeted alerts when data issues arise.

View Video

Datadog

Read more about Data Observability: Build confidence in the data life cycle

Disposable Code Is Here to Stay, but Durable Code Is What Runs the World

Jul 29, 2025 By Charity Majors In Honeycomb

Every day I seem to run into yet another post with someone solemnly opining that “writing code has never been the hardest part of software engineering. And hey, that’s smashing. As an engineer from the ops/infra/SRE side of the house, I feel like I’ve been saying this my whole career. (Is there anything more satisfying than being proven right in public? Not in my book.) So, which is it?

Read Post

Honeycomb

Read more about Disposable Code Is Here to Stay, but Durable Code Is What Runs the World

Why Your Loki Metrics Are Disappearing (And How to Fix It)

Jul 29, 2025 By Faiz Shaikh In Last9

Grafana Loki is up and running, log ingestion looks healthy, and dashboards are rendering without issues. But when you query logs from a few weeks ago, the data's missing. This is a recurring problem for many teams using Loki in production: while the system handles short-term log visibility well, it often lacks the retention guarantees developers expect for historical analysis and incident review.

Read Post

Last9

Read more about Why Your Loki Metrics Are Disappearing (And How to Fix It)

New in OTel: Auto-Instrument Your Apps with the OTel Injector

Jul 29, 2025 By Anjali Udasi In Last9

As distributed systems scale, maintaining manual instrumentation across services quickly becomes unsustainable. The OTel Injector addresses this by automatically attaching OpenTelemetry instrumentation to applications, no code changes needed. This blog covers how the OTel Injector works, how it integrates with Linux environments, and how to set it up for consistent telemetry across your stack.

Read Post

Last9

Read more about New in OTel: Auto-Instrument Your Apps with the OTel Injector

Hands-On with Continuous Observability

Jul 29, 2025 By Johan Kraft (PhD) In Percepio

Ask any embedded developer about their worst debugging experience, and chances are you’ll hear stories of unreproducible bugs, late-night watchdog resets, or CI test failures with no trace. Traditional tools often leave us blind at the exact moment we need insight.

Read Post

Percepio

Read more about Hands-On with Continuous Observability

What is Grafana Cloud? Fully Managed Observability Built on Open Standards | Grafana Labs

Jul 29, 2025 By Grafana In Grafana

Grafana Cloud helps teams detect, investigate, and resolve incidents faster—thanks to AI, open standards, and seamless integrations with OpenTelemetry, Prometheus, Salesforce, and more. See how it all works in this live demo of a simulated e-commerce outage.

View Video

Grafana

Read more about What is Grafana Cloud? Fully Managed Observability Built on Open Standards | Grafana Labs

Building an Effective Post-Mortem Culture: A Step-by-Step Guide

Jul 29, 2025 By Nuno Tomas In isDown

Post-mortems are the cornerstone of continuous improvement in incident management. When done right, they transform failures into learning opportunities and prevent future outages. Yet many teams struggle to build a culture where post-mortems are valued rather than feared.

Read Post

isDown

Read more about Building an Effective Post-Mortem Culture: A Step-by-Step Guide

From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace

Jul 29, 2025 By Mezmo In Mezmo

It is 12PM and you just start eating lunch when your phone starts buzzing. A storm of different monitoring and system-level alerts start stacking up on your phone and slack. The incident response "war room" opens and downtime communications are being drafted to customers. Your team is under pressure to find the root cause, but you are immediately hit with roadblocks.

Read Post

Mezmo

Read more about From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace

Incident IQ integration is here!

Jul 29, 2025 By Colin Bartlett In StatusGator

We’re excited to launch one of our most highly requested integrations: StatusGator now connects directly with Incident IQ. This powerful new integration bridges the gap between real-time service monitoring and your internal support workflow. Now, whenever someone reports an outage on your public StatusGator page, a ticket is automatically created in Incident IQ—ensuring your IT team can respond quickly and efficiently.

Read Post