%term

Automate flaky test fixes with the Bits AI Dev Agent and Test Optimization

Jan 12, 2026 By Eric Metaj In Datadog

Flaky tests are a significant source of inefficiency that impacts many engineering teams. Along with failing your build, they interrupt your entire development flow, generate excessive CI/CD noise, and, critically, compromise developer trust in the test suite itself. Datadog Test Optimization enables you to manage test suites at scale by pinpointing the flakiest tests, analyzing their history across hundreds of runs, and automatically surfacing the root cause.

Read Post

Datadog

Read more about Automate flaky test fixes with the Bits AI Dev Agent and Test Optimization

How To Calculate Your OpenAI Cost Per API Call (And Why It Matters Now)

Jan 12, 2026 By Lyne Carolyne In CloudZero

OpenAI doesn’t bill per feature, per customer, or per transaction. It bills per token, across multiple models, with usage patterns that can change by the hour. As a result, two API calls that support the same feature can have very different costs. Without a clear way to translate token-level pricing into something product, engineering, and finance teams can reason about, AI spend becomes difficult to forecast and harder to control.

Read Post

CloudZero

Read more about How To Calculate Your OpenAI Cost Per API Call (And Why It Matters Now)

Supercharge your LLM Using Production Data Context

Jan 12, 2026 By Speedscale In Speedscale

Are your LLM coding agents (like Cursor or Claude Code) hallucinating fixes because they don't know what's actually happening in production? In this video, Matt from Speedscale shows you how to bridge the gap between your local IDE and live production traffic using the Model Context Protocol (MCP). Most observability tools just give you telemetry. Speedscale’s MCP server gives your agent the "inner workings" of actual API calls and payloads, so it can check its assumptions against reality. No more "vibe-coding" and hoping it works; let your agent find the 500 errors and rate limits for you.

View Video

Speedscale

Read more about Supercharge your LLM Using Production Data Context

How to Optimize Claude Code #speedscale #claude #coding #aiagents #devops

Jan 12, 2026 By Speedscale In Speedscale

Learn more at speedscale.com or visit https://docs.speedscale.com/proxymock/how-it-works/mcp/

View Video

Speedscale

Read more about How to Optimize Claude Code #speedscale #claude #coding #aiagents #devops

Stop the Claude Latency! #speedscale #claude #aicoding #performancetesting #devops

Jan 12, 2026 By Speedscale In Speedscale

Learn more at speedscale.com or check out https://docs.speedscale.com/proxymock/how-it-works/mcp/

View Video

Speedscale

Read more about Stop the Claude Latency! #speedscale #claude #aicoding #performancetesting #devops

The 54% Improvement Playbook: How Top Performers Integrate GenAI into ITSM

Jan 12, 2026 By solarwindsinc In SolarWinds

Don't just read the report—learn how to replicate its most impressive results. In our 2025 State of ITSM Report, a select group of top-performing organizations achieved a staggering 54.3% reduction in resolution time by strategically integrating GenAI. This live session moves beyond the data to share their playbook. We'll provide a step-by-step guide on how to pair GenAI with foundational ITSM practices and demonstrate how to weave these tools into your team's daily workflows to achieve maximum efficiency.

View Video

SolarWinds

Read more about The 54% Improvement Playbook: How Top Performers Integrate GenAI into ITSM

Agentic AI Essentials: Examining the Hype Around Agentic AI

Jan 12, 2026 By Krishna Sai In SolarWinds

In the first article of our Agentic AI Essentials series, we’ll establish what makes agentic AI distinct. We’ll look at the process of tool calling and examine how agentic systems convert intelligence into action. We’ll also explore the human fears, pressures, and ambitions that fuel the hype around agentic systems. By sorting the signal from the noise, IT decision-makers can take the first step toward making sound decisions around agentic AI adoption.

Read Post

SolarWinds

Read more about Agentic AI Essentials: Examining the Hype Around Agentic AI

Operational Risk Management in High-Stakes Decision Environments

Jan 12, 2026 By OpsMatters In OpsMatters

In high-stakes environments, every choice carries weight. Whether it is a complex financial process, a real-time cybersecurity response, or a tightly regulated operational workflow, small missteps can rapidly evolve into major failures. Organizations increasingly rely on integrated riskmanagement strategies that blend human judgment with technology. The goal is simple: reduce uncertainty before it becomes costly. But the path to that goal is rarely straightforward.

Read Post

OpsMatters

Read more about Operational Risk Management in High-Stakes Decision Environments

Let Your LLM Debug Using Production Recordings

Jan 11, 2026 By Matthew LeRay In Speedscale

Modern LLM coding agents are great at reading code, but they still make assumptions. When something breaks in production, those assumptions can slow you down—especially when the real issue lives in live traffic, API responses, or database behavior. In this post, I’ll walk through how to connect an MCP server to your LLM coding assistant so it can pull real production data on demand, validate its assumptions, and help you debug faster.

Read Post

Speedscale

Read more about Let Your LLM Debug Using Production Recordings

AI SRE in Practice: Resolving GPU Hardware Failures in Seconds

Jan 11, 2026 By Itiel Shwartz In Komodor

When a pod fails during a TensorFlow training job, the investigation usually starts with the obvious questions. The answers rarely come quickly, especially when the failure involves GPU hardware that most engineers don’t troubleshoot regularly. This scenario walks through an actual GPU hardware failure and shows how AI-augmented investigation changes both the time to resolution and the expertise required to handle it.

Read Post