Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Argo Rollouts Canary Monitoring: Metrics, Gotchas, and Automated Gates with Last9

Argo Rollouts exposes Prometheus metrics on port 8090 — but the docs lie about which labels exist. Here's how to scrape them into Last9, build a canary dashboard, and use Last9 as an automated AnalysisTemplate gate, including the auth and base64 gotchas. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Customize preconfigured views for AWS, Azure, and Google Cloud with Cloud Provider Observability in Grafana Cloud

Part of what makes Cloud Provider Observability in Grafana Cloud really useful is that it gives you prebuilt dashboards and drill-downs for AWS, Azure, and Google Cloud. Out of the box you get service overviews, instance-level views, and quick links to explore your data. However, you might already have dashboards you trust, want a view tailored to your team’s workflow, or need to change which panels show up when you drill into a single instance.

Approaching the Parhelion

One early spring morning in 1535, the residents of Stockholm awoke to a most curious sight. Six suns lit up the sky, connected by bright halos, as immortalized in Vädersolstavlan, seen here. Today, we recognize these atmospheric effects as a parhelion (also referred to as ‘sun dogs’)—an illusion caused by light refracting off crystalline formations in the atmosphere.

What are operational maturity levels (OMLs) for MSPs?

Service Leadership, a leading company that works to measure IT and managed service provider (MSP) performance, defines the five levels of operational maturity for solution providers. Often referred to simply as operational maturity levels (OMLs), OMLs help managed service providers (MSPs) measure how consistently, intentionally, and effectively they run their businesses.

Best Practices for a Smooth ERP System Implementation Experience

ERP system implementation requires precise coordination between planning, data handling, and system configuration. Each stage must follow a defined structure to prevent delays and maintain operational accuracy. Clear timelines, assigned responsibilities, and validated processes help ensure that deployment progresses without disruption.

Code Agents Need Observability

For those of us using tools like Claude Code, Codex, or Gemini, we already know they’re powerful. They can write code, refactor functions, open PRs, even run commands. For a lot of developers, they’re already part of the daily workflow. But once you zoom out beyond the individual developer, the biggest problem isn’t productivity. It’s control. AI coding tools are powerful, but they introduce a new, unpredictable cost layer that most teams don’t fully understand.

What is AI SRE? The Complete Guide to AI-Assisted Site Reliability Engineering

It's 2:47 AM. PagerDuty fires. You open a Slack alert and see: p99 latency spike on checkout-service. You SSH into the host, check dashboards in four tabs, grep logs for the last 20 minutes, and eventually find a slow query introduced in a deploy six hours ago. It took 34 minutes. You resolved it, w Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Capturing HTTP Request and Response Bodies in .NET Traces with PHI Redaction

> Standard OTel.NET instrumentation captures headers, status codes, and timing — not request or response bodies. Here's how to add body capture to your traces while keeping PHI out of your observability backend. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.