Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Easy Guide for Connecting Redis to a Grafana Data Source

Redis is a widely used in-memory data store, commonly deployed as a cache, session store, message broker, or fast key-value database. Because Redis often sits on the critical path of an application, having visibility into its behavior (memory usage, client connections, command throughput, cache efficiency) is essential for troubleshooting and performance tuning.

Why Aging Networks Put Critical Infrastructure at Risk-and What It Means for Us

Everywhere around us, technology is evolving at lightning speed, yet the networks which underpin these capabilities often lag behind. This gap creates vulnerabilities that can impact everything from energy grids to emergency services. Forbes recently explored this urgent issue in an article featuring insights from our CEO Bruce McClelland, who shared an informed perspective on why modernization is essential, not optional. I encourage you to take a few minutes to read the full article.

How To Calculate Your OpenAI Cost Per API Call (And Why It Matters Now)

OpenAI doesn’t bill per feature, per customer, or per transaction. It bills per token, across multiple models, with usage patterns that can change by the hour. As a result, two API calls that support the same feature can have very different costs. Without a clear way to translate token-level pricing into something product, engineering, and finance teams can reason about, AI spend becomes difficult to forecast and harder to control.

Six FinOps Certifications And Courses To Set You Up For Success in 2026

FinOps is evolving fast, and 2026 is shaping up to be a big year for specialization. While these certifications are ranked from beginner to advanced to help you build skills in the right order, one course stands out as the hottest recommendation right now: FinOps for AI. AI spend is accelerating, ownership is getting murky, and teams are scrambling to keep up. That urgency is exactly why FinOps for AI is generating so much interest heading into 2026.

Let Your LLM Debug Using Production Recordings

Modern LLM coding agents are great at reading code, but they still make assumptions. When something breaks in production, those assumptions can slow you down—especially when the real issue lives in live traffic, API responses, or database behavior. In this post, I’ll walk through how to connect an MCP server to your LLM coding assistant so it can pull real production data on demand, validate its assumptions, and help you debug faster.

AI SRE in Practice: Resolving GPU Hardware Failures in Seconds

When a pod fails during a TensorFlow training job, the investigation usually starts with the obvious questions. The answers rarely come quickly, especially when the failure involves GPU hardware that most engineers don’t troubleshoot regularly. This scenario walks through an actual GPU hardware failure and shows how AI-augmented investigation changes both the time to resolution and the expertise required to handle it.

Cloud Strategy for 2026: the Year of Repatriation, Resilience, and Regional Rebalancing

This year is set to be a pivotal year for cloud strategy, with repatriation gaining momentum due to shifting legislative, geopolitical, and technological pressures. This trend has accelerated, with a growing focus on data sovereignty. These challenges have set the stage for 2026 to be the year of repatriation, resilience, and regional rebalancing. Here, Rob Coupland, Chief Executive Officer at Pulsant, offers his insights.

Speedscale vs. LocalStack for Realistic Mocks

API mocking plays a crucial role in modern software development allowing developers to simulate external API endpoints. It’s an effective way to isolate your application for testing and ensure that code changes don’t inadvertently break critical dependencies. Essentially, API mocking helps you create robust, reliable software by allowing you to test how your application interacts with external services.

How to Do Full-Text Search Across All Application Traffic with Speedscale

Modern DevOps observability tools are excellent for monitoring system health, tracking distributed traces, and aggregating metrics. However, they lack the fidelity needed for full-text search across application traffic. While observability platforms excel at showing what happened and when, they often fall short when you need to find where a specific piece of data (like an email address, user ID, or transaction token) appears as it flows through your entire application stack.