Operations | Monitoring | ITSM | DevOps | Cloud

Humanized AI Text for Stronger DevOps and Operations Content

You create content for operations teams, DevOps engineers, SREs, and IT decision-makers. Topics include monitoring, incident management, cloud infrastructure, ITSM processes, and observability tools. AI generates initial drafts quickly. The results frequently come across as mechanical. Sentences follow predictable patterns. Technical explanations lose nuance. Readers in this field expect precise, practical language from experienced practitioners. They detect generated text easily. Engagement drops when content feels detached from real-world ops challenges.

How Fabrix.ai Agents Ensure Data Privacy & Security

As Agentic AI moves into enterprise environments, IT and security leaders face a critical challenge on how to leverage advanced LLMs without exposing sensitive data, intellectual property, or proprietary configurations to the cloud. You cannot build a self-driving, autonomous IT infrastructure if your security team blocks the deployment, and that’s exactly why the Fabrix.ai platform features an Enterprise-Grade LLM Integration architecture anchored by our built-in Data Security layer.

Canonical and Ubuntu RISC-V: a 2025 retro and looking forward to 2026

2025 was the year that RISC-V readiness gave way to RISC-V adoption. It’s been quite a journey. What began years ago as early architectural exploration and enablement has matured into real silicon, systems, and deployments. In particular, RVA23 provides a stable and predictable baseline we can align on with our wider ecosystem of partners. At Canonical, we’re committed to making RISC-V a viable option for anyone who wishes to adopt it.

Why Evidence-Backed RCA in Edwin AI Starts With Logs

A step-by-step look at how Edwin AI uses native LogicMonitor logs, topology, and context to turn root cause analysis from alert-driven inference into evidence-backed investigation. Most root cause analysis today starts with alerts and ends with explanations that sound reasonable but can’t be verified. An alert is fed into a language model, and the output looks like an answer. It often isn’t.

The rise of agentic AI in production: Can observability systems run themselves?

Sometimes the biggest shifts in technology aren’t about collecting more data — they’re about who (or what) gets to act on it. In this episode of “Grafana’s Big Tent” podcast, host Tom Wilkie, Grafana Labs CTO, is joined by Spiros Xanthos, Founder & CEO of Resolve AI, Manoj Acharya, VP of Engineering for Observability at Grafana Labs, and Cyril Tovena, Principal Engineer on the Grafana Assistant team, to discuss agentic AI in observability.

From RCA to Autonomous Ops: The Future of AI in Observability | Big Tent S3E7

SREs are famously skeptical of AI — so how do you convince them to trust agents in production? In this episode of Grafana’s Big Tent, Tom Wilkie talks with Spiros Xanthos (Resolve AI), Manoj Acharya (Grafana Labs), and Cyril Tovena (Grafana Assistant team) about agent-first observability. They unpack knowledge graphs, LLM reasoning, autonomous debugging, pricing models, and the “Claude Code moment” for observability. Is autonomous production ops closer than we think?

How to Create an AI Chatbot for Your Website?

Chatbots are starting to look fairly promising for businesses of all kinds. Customers today are keen to get things resolved faster than ever. Every startup out there is tempted to take the deal. But before jumping onto the bandwagon, you need to do some thinking as to what type of chatbot you must invest in. The decisive question being, which model of conversational AI perfectly aligns with the needs of your organization.

AI Agents in IT Operations: From Concept to Practical Value

Artificial intelligence has been a defining theme in IT operations for nearly a decade. Early AIOps initiatives focused on predictive analytics and anomaly detection, promising to reduce operational overhead and improve system reliability. While these capabilities delivered incremental value, they often fell short of transforming how operations actually functioned.

Talk to Your Logs: LLM-Powered Chat UI in DSDL 5.2.3

We are excited to announce the release of the Splunk App for Data Science and Deep Learning (DSDL) version 5.2.3. Since 2018, DSDL has served as an innovation hub for custom AI integrations within Splunk. In 2025, the release of DSDL 5.2.0 introduced customizable Large Language Model (LLM) integrations, bringing Retrieval Augmented Generation (RAG) and Agentic AI workflows to Splunk users.

Harness AI February 2026 Updates: Securing & Making the SDLC Reliable and Shipping Faster with Agents | Harness Blog

February is all about making AI in software delivery secure and easier to operate at scale. This month’s updates span enterprise-grade application security, API security via MCP, SRE automation, and a major upgrade to the DevOps Agent.

What is Site24x7 Event Correlation? Causal AI and autonomous IT operations explained

When your distributed system goes down, your team spends days sorting through noise. That is revenue walking out the door. In this video, Jasper Paul breaks down the event correlation engine built to eliminate alert fatigue, and accelerate root cause analysis. Most monitoring tools still rely on basic time-window alert grouping — clustering alerts that fire at the same time and calling it correlation. But in a distributed system, outages are never isolated events. And grouping symptoms doesn't find root causes.

AI can do what now?! What an ethical hacker says about deepfakes and AI

Real-time camera deepfakes are no longer science fiction. High-fidelity, AI-generated impersonation may be advancing quickly — but that's not the only AI risk financial services companies should be thinking about. In this episode of AI Can Do What Now?!, Lisa Jones-Huff, director of security solutions architecture at Elastic, sits down with ethical hacker Freakyclown (FC) to explore what is technically possible today with AI, where reality still falls short of the hype, and what security teams should be worried about.

AI can do what now?! The real risks of AI in social engineering

What is the most immediate risk financial services companies face today? AI-enabled social engineering is already accelerating real-world attacks. Scale, personalization, speed, and automation are lowering the barrier for attackers while making fraud detection more complex for defenders. In this episode of AI Can Do What Now?!, Lisa Jones-Huff, director of security solutions architecture at Elastic, is joined by ethical hacker Freakyclown (FC), and principle solutions architect Joe Murin to explore what is actually happening right now — beyond the hype.

Managing AI Models and Datasets with Harness Artifact Registry | AI/ML Artifact Management

Building AI applications often means juggling multiple models, scattered datasets, and version chaos across local systems. But what if you could bring it all together — securely and efficiently — in one place? In this walkthrough, Shibam Dhar, DevRel Engineer at Harness, demonstrates how Harness Artifact Registry makes it easy to manage and govern your AI/ML assets — from models and datasets to prompts and agents — with built-in support like Hugging Face and generic registry types.

Inside the architecture: How Upsun delivers 99.99% uptime for AI

For a CTO, "four nines" represents a commitment to keeping production revenue live with less than 0.01% of total downtime per year. As AI workloads move from pilot projects into core production services, the reliability requirements for infrastructure have shifted. AI agents, RAG pipelines, and automated LLM workflows depend on a consistent platform state.

Stop Vibe Coding Everything: The Case for Spec-Driven Dev

Spec-driven development with AI coding agents could change how you build software. In this GitKon 2025 talk, Erik Hanchett, Senior Developer Advocate at AWS, breaks down why AI coding assistants perform dramatically better when they start with structured specifications instead of raw prompts. If you've been vibe coding your way through complex features and wondering why your AI keeps going off the rails, this is the video for you.

[Webinar] Conquering the Complexity of Self-Hosted Apps with Agentic AI SRE

Most enterprise SaaS products, like Komodor’s Autonomous AI SRE Platform, require installing a remote agent on the customer’s infrastructure, which varies significantly from one organization to another, in terms of architecture, configurations, permissions, processes, and more. This “unmanaged” model creates major blind spots, making daily operations, observability, debugging, and incident response challenging. When failures occur, limited visibility and bespoke systems make root-cause analysis slow, incomplete, or impossible.

AI-Powered LMS: Personalization, Analytics & Automation for Corporate Training

Corporate training systems change operationally once AI is embedded into their learning logic. In LMS environments used for onboarding and workforce development, AI shifts training from scheduled delivery toward continuous adjustment based on employee performance and role context. This shift affects how companies assign onboarding programs, detect skill gaps, and maintain compliance readiness across departments.

AI performance reviews for your app with the Flare CLI

The Flare CLI connects to your Flare performance monitoring data and uses AI to turn it into actionable insights, right from your terminal. In this video, you'll see how a single command pulls your real performance data from Flare, then generates a full review: identifying slow endpoints, spotting error trends, and suggesting concrete fixes. Links.

Claude Code + OpenTelemetry: Per-Session Cost and Token Tracking

I was looking at our Claude Code spend in the Anthropic console the other day. Aggregate cost, aggregate tokens — no breakdown by developer, no breakdown by session. I knew my Hackathon team had been using it heavily on building out new features for the OpenTelemetry Distro Builder. But heavily how? I had no idea. Turns out Claude Code has been emitting OpenTelemetry signals the whole time. Per-session cost, token counts, every tool call it makes on your codebase.

AI for App Resiliency: Automation Without Operational Chaos

Enterprise IT leaders face a persistent contradiction. Digital systems grow more complex each year, but operational stability and resilience do not improve at the same pace. Downtime costs are only the visible part of the problem. For large enterprises, unplanned outages can run into hundreds of thousands of dollars per hour in lost revenue, productivity, and remediation effort. The harder cost to quantify is the reputational damage when critical business services fail at the worst possible time.

Boosting Rust developer productivity with cursor - Our journey at ilert

AI-assisted coding has evolved from a novelty into an industry standard. At ilert, we started our adoption in mid-2023, quickly realizing that success depends heavily on proper context and workflows. This is particularly acute with Rust. While the language is central to our backend infrastructure, its strict compiler rules and distinct idiomatic approaches make it notoriously difficult for modern LLMs to master.

AI infrastructure cost optimization for scaling teams

This post is also available in German and in French. The 2026 AI landscape has shifted from "Can we build it?" to "How much will it cost to run it?" For CTOs and engineering leaders, the challenge is no longer just model performance: it is the underlying infrastructure sprawl that silently erodes margins. When AI workloads scale, they often inherit the inefficiencies of legacy cloud models: over-provisioned instances, fragmented data pipelines, and a lack of unified context.

How to Implement an AI Governance Framework Using Safe, Ethical and Reliable AI Guardrails

In my time at Ivanti, I've witnessed firsthand how AI acts as a force multiplier across enterprise organizations. When deployed strategically, AI accelerates decision-making and operational execution at scale in a way that teams simply can't sustain manually. However, without clear and enforceable AI guardrails, implementing AI opens organizations up to serious new risks.

Secure by Design : Defend against AI-driven threats

After several zero-day attacks on leading security vendors that left the industry reeling in 2024 and 2025, Ivanti redoubled our commitment to transparency, product development that prioritizes security and community awareness. The attacks galvanized our Secure by Design framework so that we could accelerate our transformation to kernel-level security — compressing a three-year roadmap into just 18 months.

I let Claude investigate a production incident with Honeybadger's MCP server

In this demo, Kevin shows how you can use Honeybadger's MCP server with Claude to investigate a production incident — going from a natural language prompt to a complete incident dashboard in minutes. Honeybadger is an application health monitoring platform that helps developers catch errors, track performance, and stay on top of incidents. The MCP server lets AI assistants like Claude query your Honeybadger data directly, so you can investigate issues conversationally without digging through dashboards manually.

Technology Trends in the Mortgage Industry

The mortgage industry is changing rapidly due to technology. Many people still see homeownership as a key goal, and new tools are making it easier to go from application to closing. This tech advancement is simplifying the process and helping both consumers and businesses have a more seamless experience.

Top 10 ChatGPT SEO Agencies for 2026 (Manually Reviewed)

A funny shift has appeared in our conversations with marketing leaders over the last year. Teams still ask for SEO help. But more often, the question is: "Who can help us appear inside ChatGPT answers, and can they prove it without hand-waving?" People research inside ChatGPT, Perplexity, Gemini, and AI Overviews, then click only when they trust the source. If your brand is not cited, clearly understood as the right entity, and consistent across your site and the wider web, even strong pages can stay invisible when buyers are deciding.

When AI Writes the Code, Who Keeps Production Running?

The production environment has become a minefield of code nobody really understands. Here’s what’s happening: Development teams are using Claude Code, Cursor, and GitHub Copilot to ship features at 10x their previous velocity. Product managers are ecstatic. Business stakeholders are thrilled. And somewhere in a war room at 2:17 AM, an SRE is staring at a stack trace for code that was AI-generated three weeks ago, trying to figure out why the payment service just fell over.

Evaluating our AI Guard application to improve quality and control cost

This article is part of our series on how Datadog’s engineering teams use LLM Observability to build, monitor, and improve AI-powered systems. Organizations are building AI agents that help users automate work, analyze data, and interact with complex systems through natural language. As these agents become more capable, they also become more complex and exposed to risks such as prompt injection, data leaks, and unsafe code execution.

From Chef to Chief Architect: Navigating the Intersection of AI and Data Security | Harness Blog

In the world of enterprise software, the transition from traditional DevOps to modern AI-driven delivery is less like a flip of a switch and more like a high-stakes kitchen. As Devan Shah, Chief Architect at IBM, puts it: the ingredients have changed from food to code, but the need for a precise, governed process remains the same.

Getting started with Claude Code and CircleCI

AI-powered coding tools are changing how developers work. Tools like Claude Code can write functions, refactor code, and build features through natural conversation, often faster than you could type them yourself. But speed creates its own risks. AI-generated code can contain subtle bugs, reference packages that don’t exist, or misuse APIs in ways that only surface at runtime. That’s where continuous integration comes in. CI is a safety net that lets you move fast confidently.

AI Assistant vs Skylar Advisor

What happens when AI understands your entire environment? With Skylar Advisor, you move beyond prompts and responses and get prioritized guidance based on real operational impact. Skylar Advisor identifies what matters most, explains why it matters, and provides clear next steps so even junior IT professionals can operate with confidence.

Getting started with Gemini and CircleCI

AI coding assistants like Gemini are changing how developers write code. They can generate entire functions, debug tricky issues, and help you move faster than ever before. But with that speed comes a new challenge: how do you make sure AI-generated code actually works? AI assistants are powerful, but they’re not perfect. They can introduce subtle bugs, miss edge cases, or generate code that breaks existing functionality. That’s where CI (continuous integration) comes in.

The path to self-healing: Re-architecting for massive scale on kubernetes

In the world of network assurance, even a few seconds of delay can result in significant business losses. In this session from Civo Navigate India, Dr. Shivananda R Poojara (Head of Cloud Business Unit, Airowire Networks) explains how his team dismantled a massive monolithic service stack and rebuilt it for a high-performance, cloud-native era in just 75 days.

Why Nexthink Intelligence Is a Game-Changer for IT Teams

Nexthink Intelligence transforms digital employee experience (DEX) for modern enterprises. Learn how IT teams can leverage real-time analytics, proactive insights, and automation to improve user productivity, troubleshoot issues fast, and deliver better workplace tech experiences. Learn more at nexthink.com.

A 4-Month Bug Fixed in <10 Minutes with Olly

In today’s highly interconnected systems, the subtle relationships between services are rarely obvious. Modern, complex architectures generate telemetry that functions less as “flashing signs” and more as faint “breadcrumbs” to be followed across a vast network of signals. In 2025, about two-thirds of outages involved third-party systems like cloud platforms and APIs.

The limits of MCP and how Olly surpasses them

Model Context Protocol (MCP) servers act as adapter layers between clients and AI based workloads. MCP installation into an IDE, such as Cursor, brings a wealth of information directly into the developers primary tool, minimizing context switching and, especially in the world of observability, bringing telemetry closer to the code. MCP is not without its limits. These limits initially seem trivial, but in time, some of the inherent limitations to a basic MCP implementation become apparent.

Trends Shaping Cross-Border Tech Recruitment in 2026

Here's the reality: distributed engineering teams have moved from bold experiment to business-as-usual. The challenge? Hiring globally in 2026 has gotten messier than ever before. Compliance rules keep morphing beneath your feet. AI recruiting tools that promised to simplify your life have introduced surprising complications.
Sponsored Post

Cisco Live'26 - Amsterdam: Aligning with the AI-Driven Future

The energy at Cisco Live EMEA in Amsterdam (February 9-13, 2026) was primarily driven by groundbreaking AI announcements, & the event provided Fabrix.ai an opportunity to strengthen our strategic position alongside Cisco and Splunk ecosystems. The event’s focus on AI, highlighted by the recent Cisco AI Summit, emphasizes a clear market direction in which Fabrix.ai is perfectly poised to accelerate innovation.

AI SRE in Practice: Accelerating Engineer Onboarding with Contextual Expertise

Onboarding new engineers to complex Kubernetes environments is expensive. Junior engineers need to learn cluster architecture, understand organizational conventions, navigate internal documentation, and build relationships with senior team members who can answer questions. The process takes weeks or months, and during that time, senior engineers spend significant time mentoring instead of working on complex problems.

When Technology Failures Become Securities Litigation Risks

When a company's systems crash or a breach hits, it often looks like lawsuits appear out of nowhere. The real issue is that even a single tech failure can shake customers, stall revenue, and erode investor confidence. Many businesses downplay risks they already know about, leaving shareholders feeling misled when problems explode publicly. That gap between internal awareness and external disclosure is exactly what opens the door to securities litigation, turning tech troubles into legal and financial fallout almost instantly.

Cost Optimization for AI Workloads: From Visibility to Control

ITOps teams can achieve cost management of AI workloads with an observability platform that connects AI usage and performance with cloud spend for clear visibility and predictability. Behind the buzz around artificial intelligence, or AI, many companies are discovering the hidden and compounding costs of AI adoption.

How LogicMonitor Delivers AI Cost Optimization

LogicMonitor delivers AI cost optimization by unifying infrastructure telemetry, AI-specific signals, and cloud financial data into a single workflow, so teams can move from visibility to continuous, operationalized cost control. In Cost Optimization for AI Workloads: From Visibility to Control, we explored why AI workloads introduce new layers of cost complexity—from GPU-heavy compute and token-based pricing to distributed infrastructure that obscures true spend.

Should You Use AI for Business Contracts?

AI is creeping into almost every corner of business life. It drafts emails, builds presentations, analyses data, and even creates marketing campaigns, So, it is hardly surprising that some companies have started using it to draft business contracts too. At first glance, this might sound like an efficient and sensible use of resources. Faster turnaround. Lower cost. Instant templates. But when it comes to legal agreements, speed and convenience are not always the priority.

AI-Driven Automated Testing for Oracle Applications

As enterprises continue to change rapidly, businesses depend on Oracle-based ecosystems to track their finances, supply chains, HR, and customer operations. With the increase of digital transformation in companies, these environments continue to become more complex. As a result, manual testing is no longer enough for maintaining pace with ongoing updates, integrations and customizations that occur within an organization's systems. This is where AI-powered automated testing for Oracle applications revolutionizes how quality assurance is approached.

Is Generative AI Eroding Our Ability to Think?

In aviation, there's a well-documented issue known as "automation addiction." As cockpit systems became more advanced, pilots gradually shifted from actively flying aircraft to supervising automated controls. Everything worked smoothly-until a system malfunctioned. Investigations revealed a troubling pattern: even experienced pilots sometimes struggled with basic manual maneuvers. Their hands remembered less because their brains had practiced less.

How to Make AI-Generated Code Reliable with Runtime Context

AI coding assistants like Cursor and Claude Code are driving massive productivity gains, yet they have introduced a critical validation gap in the software delivery lifecycle. While these tools excel at generating syntax, they lack visibility into live production environments. This article explains how Runtime Context, the missing nervous system of AI development, secures production by moving from probabilistic guessing to deterministic, live code validation.

The AI infrastructure gap: why agents fail on fragmented stacks

The initial hype of AI agents is hitting a hard reality: a clever prompt is not a production strategy. As organizations move from experimentation to operationalizing AI in 2026, a systemic bottleneck has emerged: It is not the model's intelligence; it is the model’s context and its access to the right tools. When an AI agent lacks access to live, grounded platform data, it guesses.

Use AI to turn any JSON API into a dashboard in minutes with the Infinity data source plugin and Grafana Assistant

The internet is full of fascinating data just waiting to be visualized and queried. And with the latest update to Grafana Cloud, you can start doing it in minutes. Through public APIs, you can access information about global earthquake activity, weather forecasts, music catalogs, and millions of other datasets. And then there's all the data that sits inside company APIs, partner services, and internal platforms that power everyday products and operations.

AI Merge Conflict Resolution + Commit Messages in GitKraken Desktop

AI-assisted merge conflict resolution is changing how developers handle Git workflows. Watch GitKraken Ambassador Kevin Bost demonstrate AI-powered features that eliminate merge conflict dread, clean up messy commit history, and generate contextual commit messages in seconds.

The Current State of Content Negotiation for AI Agents (Feb 2026)

The web was built for humans, but now the agents are taking over. Humans look at a web page and see content rendered by their browser. AI agents see 180,000 tokens of nav bars, footers, and div soup — burning through their context window on junk that makes them slower and stupider. The web needs to evolve, and we as developers are driving the shift. AI agents like Claude Code, Cursor, Codex, and Gemini are how we interact with documentation, CLIs, and products today.

The 2025 Wake-Up Call for Engineering Teams

For years, organizations tried to solve operational pain by collecting more data, adding more dashboards, and consolidating more tools. But 2025 exposed a deeper mismatch. Systems had become more distributed, AI-assisted, and interdependent than ever before, while teams had shrunk and on-call pressure had intensified. This wasn’t a tooling failure. It was an architectural and cognitive one.

Who Watches the Vibe Coder?

AI didn’t replace developers. It replaced the part where you were forced to understand what you just shipped. Now you can prompt your way to a feature, skim the diff, and merge something that “seems reasonable.” And then production does what production always does: finds the one weird browser + one slow network + one user flow that turns your “reasonable” code into a bonfire. So who watches the vibe coder?

AI Engineering at incident.io

Working on AI in incident management means there's no playbook. No million blogs. Just building at the forefront of what's possible with AI models.In this video, Martha, Product Engineer on our AI team, talks about what it's really like working with AI that helps engineers respond to incidents faster. This covers the shift from traditional engineering, learning the personalities of different AI models, and why you need to embrace constant change when new models drop all the time.

The Need for Clean in the AI Era

In the AI era, software and new models are being born at a breakneck pace—but they’re also bringing a lot of “baggage” into the world. While AI coding agents are busy accelerating innovation, they’re also excellent at generating a massive byproduct: “digital dust.” Between obsolete releases, orphaned dependencies, and massive model versions, your repository may soon start to look more like a digital junk drawer than a streamlined machine.

How Ecommerce Brands Are Using AI to Scale Faster and Spend Less

Running an ecommerce business has never been easy. Between managing inventory, writing product descriptions, handling customer service, and keeping up with marketing demands, the workload can feel endless. But something has changed in the last couple of years. Artificial intelligence has moved from being a buzzword in tech circles to becoming a practical, everyday tool that ecommerce brands of all sizes are using to grow without burning through their budgets.

8 AI Video Generators for Marketing Agencies Ranked by Iteration Speed

Tuesday morning. A brief lands: ten ads due Friday-no crew, no studio. Yesterday that meant panic; today you open one of the fastest-iterating AI video generators, type a prompt, and watch a finished clip appear before the coffee cools. Speed is the new creative currency. Yet platforms differ. Some surface concept loops in seconds, others render polished talking-head explainers in minutes. Which tool truly wins the sprint from prompt to publish?

5 key takeaways from the 2026 State of Software Delivery

AI has made it easier than ever to write code. Shipping it is a different story. Today we released the 2026 State of Software Delivery report, sponsored by Thoughtworks. In it, we analyzed more than 28 million CI/CD workflows across thousands of engineering teams. The picture that emerged is clear: teams are producing more code than ever, but fewer of them are able to turn that activity into software that actually reaches customers.

Introducing: Checkly Agent Skills

AI coding agents are excellent at writing code. Ask Claude Code, Codex, or Cursor to add a feature, and it just works. At Checkly, we were ready for the new agentic world from the start! Monitoring as Code means your entire monitoring setup lives in your repository. API Checks, Browser Checks, alert channels, status pages; everything is defined in code, managed with the Checkly CLI, and version-controlled like any other part of your stack.

Top 6 AI-Powered Procurement Platforms Transforming Supply Chain Management

The procurement landscape has undergone a seismic shift over the past few years. What once relied heavily on manual processes, spreadsheets, and gut instinct now operates in an environment where artificial intelligence analyses millions of data points in seconds, predicts supplier risks before they materialise, and optimises spending patterns with machine learning algorithms that continuously improve.

AI Is Changing Healthcare Faster Than Most Systems Are Ready For

Healthcare is shifting fast, and artificial intelligence is no longer a future concept sitting in research labs or pilot programs. It’s already embedded in clinical workflows, operational systems, and patient interactions, often in ways that feel subtle, uneven, and sometimes uncomfortable.

What is the Model Context Protocol (MCP)

The Iron Man’s J.A.R.V.I.S. is the artificial intelligence (AI) that almost every person wants to see. A conversational technology that answers questions like a friend would. The rise of large language models (LLMs) almost seems to give people the friendly robotic sidekick that generations of children grew up dreaming about.

Teaching AI How to Refinery

At the beginning of February, we released v3.1 of Refinery, our advanced, tail-based sampling solution. The new version comes with more performance enhancements, bug fixes, and a few new pieces of telemetry. In tandem with the 3.1 release, we also released a new tool for our MCP server which helps your AIs understand Refinery, and how Honeycomb handles sampling.

Why Your AI Code is Breaking (And How to Fix It) #speedscale #aicoding #aiagents #code #devops

New data from CodeRabbit shows AI makes 70% more errors than humans—mostly in logic. Stop shipping "AI Vibes" to production. Use the new Testing Pyramid: Deterministic (Validation) Record & Replay (Mocking) Probabilistic (Vibes) Don't let your agents break prod.

MCP: Why AI Needs Git Intelligence

GitKraken CTO Eric Amodio breaks down the Model Context Protocol (MCP) and explains why Git intelligence is critical for AI agents at GitKon 2025. In this session, Eric covers: What MCP is and why every major AI company adopted it Why AI needs Git history, not just file system access How GitKraken MCP removes Git pain safely The future of agentic developer workflows How Commit Composer uses AI to organize commits without losing data.

AI SRE in Practice: Diagnosing AWS CNI IP Exhaustion Before Widespread Outage

IP address exhaustion in Kubernetes doesn’t announce itself with clear error messages. Pods fail to schedule, services degrade unpredictably, and the symptoms look like a dozen different problems before anyone realizes the cluster has run out of available IP addresses. By the time the root cause becomes clear, multiple services are affected and recovery requires coordination across infrastructure layers.

16 new integrations - powered by AIready Low Code Plugins

Today marks a big milestone in our mission to bring more data, more context, and more visibility into a single, unified view. We’re excited to announce 16 brand‑new integrations, extending the range of data sources you can connect with just a few clicks. But the integrations themselves are only half the story.

AI for nuclear safety: Predicting component remaining useful life

As industrial systems become more complex in 2026, the reliability of critical infrastructure depends on shifting from reactive to predictive strategies. In this session from Civo Navigate India, Muthukumar Ganesan, a scientist at the Indira Gandhi Centre for Atomic Research (IGCAR), explores the application of AI and machine learning in securing the future of nuclear energy.

How to Remove Watermark from AI Generated Images (Midjourney, DALL·E & Canva)

AI image tools have made it ridiculously easy to create artwork, mockups, social media visuals, and even product photography. But many platforms add watermarks - especially on free plans or previews. If you've downloaded an image and realized there's a logo, text overlay, or semi-transparent branding across it, you're not alone. Let's go through what actually works when it comes to removing watermarks from AI-generated images - and what usually makes things worse.

Boosting IT Productivity with AI-Driven Spreadsheet Automation

Modern IT teams operate under constant pressure. They are expected to deliver faster, reduce errors, maintain uptime, and extract meaningful insights from ever-growing volumes of operational data. Spreadsheets remain one of the most widely used tools in IT operations, even in organizations that rely heavily on cloud platforms, monitoring systems, and DevOps pipelines. However, manual spreadsheet work often becomes a productivity bottleneck.

The 10 Best AI Tools for Productivity in 2026

Since the launch of ChatGPT in November 2022, the world has seen a huge shift in our personal, business, and creative lives. Although we often use AI daily, its addition to our lives has not been without problems. It has caused writer strikes and worries about how AI handles our data and what it means for privacy, and many people are worried about AI taking over their jobs.

AI Is Everywhere, So Why Isn't It Delivering Business Value?

Enterprises have never had more access to artificial intelligence and less certainty about what it is delivering. Generative AI tools now sit inside everyday workflows, embedded across productivity software and operational systems employees rely on for critical work. They generate insight at scale, reveal patterns more clearly than before, and offer earlier visibility into potential risk.

AI-driven caching strategies and instrumentation

The things that separate a minimum viable product (MVP) from a production-ready app are polish, final touches, and the Pareto 'last 20%' of work. Most bugs, edge cases, and performance issues won't show up until after launch, when real users start hammering your application. If you're reading this, you're probably at the 80% mark, ready to tackle the rest.

How To Design AI-Native SaaS Architecture That Scales Without Killing Your Margins

AI-native SaaS products aren’t failing because the models are bad. They’re failing because the architecture can’t keep up with how AI actually behaves in production. What looks affordable in staging can erode your margins once real customers, workflows, and automation come into play. Designing AI-native SaaS architecture is now as much a margin decision as it is a technical one.

Rovo Dev Code Review in Bitbucket and GitHub | Bitbucket Blitz | Atlassian

The demo portion of a recent webinar I did shows how to setup, configure, and use Rovo Dev code review in both Bitbucket and GitHub. Learn how to add custom coding standards to your repositories and see Rovo Dev check for the specific things you care about during code reviews. Learn how to add acceptance criteria to your Jira work items and see Rovo Dev verify them during code reviews.

Context Management for Agentic RAG | Johan Jern, Co-founder & CTO at Realm

Some queries are hard to solve with "basic" RAG. When questions require multi-step reasoning, full-document understanding (not just chunks), or aggregating many results that match specific criteria, simple retrieve-and-generate pipelines break down, we need agentic RAG. But this added capability comes at a cost: as agents plan, search, read, and iterate, they quickly use up a lot of context, which both degrades answer quality and increases costs and latency.

#053 - The Road to Distributed AI and Kubernetes Infrastructure with Matt Butcher (Fermyon) & Ari...

They share their professional origins, highlighting how Kubernetes transitioned from a complex tool for experts to a foundational technology for global enterprises.. Part of the conversation focuses on the history of Helm, explaining its growth from a simple hackathon project into a standard package manager. Another part takes on the future of distributed computing, specifically how Akamai is integrating infrastructure as a service to support modern workloads.

Top Legacy Application Modernization Companies

Here's the uncomfortable truth: most large enterprises are powered by technology older than their digital ambitions. Banks clear payments on legacy cores. Airlines coordinate fleets on systems built before cloud computing. Healthcare providers rely on infrastructure never designed for today's cybersecurity climate. According to multiple enterprise IT studies, organizations spend the majority of their technology budgets maintaining existing systems rather than building new ones. In some sectors, maintenance absorbs close to 70% of total IT spend.

How AI Is Empowering Frontline Workers to Become Brand Storytellers

Frontline workers are the face of every customer-facing organization. They interact with clients, solve problems in real time, and witness the moments that define a brand's reputation on the ground. Yet most organizations overlook their potential as authentic content creators and storytellers.

Claude outage - February 10, 2026

On February 10, 2026, Claude users around the world began reporting service failures affecting chat sessions, API integrations, and Claude Code workflows. The first verified outage report reached StatusGator at 19:33 UTC. StatusGator issued an Early Warning Signal at 20:24 UTC. Claude did not post an official “Investigating” update until 22:11 UTC. This incident clearly demonstrates the gap between real user impact and official status page updates.

Why Your Company Will Be Running OpenClaw Next Year

You’ve probably heard of OpenClaw. Maybe you’ve seen the demos where an AI agent opens a browser, navigates to your CRM, fills in a form, and files a support ticket. No API required. Maybe you thought “that’s cool but I’d never run that at work.” Your employees already are. According to Permiso’s research, 22% of enterprise customers have employees running OpenClaw without IT approval.

How AI Coding Is Breaking Synthetic Data Generation

Traditional synthetic data generation approaches, still called “Test Data Management” (TDM) by legacy vendor, were designed for a world where applications were monolithic, databases were the center of gravity and change happened slowly. The world looks a lot different now. Modern systems are distributed, often times event-driven, and increasingly powered by streaming data and AI agents. In this environment, batch-oriented synthetic data generation fails to capture how systems actually behave.

Troubleshooting & RCA with Olly

If troubleshooting still feels harder than it should, check on these two numbers: how many dashboards you have, and how many alerts fire every day. For most teams, it’s hundreds of dashboards and thousands of alerts, a sign of maturity, coverage, and good intentions. On the other hand, we also see that when something actually breaks, that coverage rarely turns into clarity fast enough.

Will humans be replaced by AI? The truth

Agentic AI doesn’t replace analysts, it augments them. The real value comes from making teams more efficient, not smaller. This is the perspective most people miss. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

How to Generate a New Puppet Module with VS Code and GitHub Copilot

Revolutionize your infrastructure by leveraging AI tooling in the Puppet ecosystem. In this technical demonstration, we explore how to significantly reduce the time required to create new Puppet modules using Visual Studio Code, GitHub Copilot, and the Puppet Model Context Protocol (MCP) server.

Surging AI Costs Are Eroding Business Efficiency: New CloudZero Report

What do 475 senior leaders across software, financial services, cybersecurity, and other industries all have in common? They have little to no idea whether their AI investments are paying off. CloudZero just released FinOps in the AI Era: A Critical Recalibration, a report assessing the state of cloud and AI spending. Culled from hundreds of responses from people directly accountable for cloud spending, the report shows that while FinOps maturity is accelerating, cloud efficiency is plummeting.

The AI-nigma: FinOps Is Maturing - So Why Is Cloud Efficiency Falling?

Q: What do you call it when FinOps maturity surges but cloud efficiency plummets? A: An AI-nigma. I don’t claim to be a comedian. But I do claim to be Fred FinOps, so the paradoxical findings from CloudZero’s new report titled FinOps in the AI Era: A Critical Recalibration, created in partnership with B2B SaaS benchmarking firm Benchmarkit, had me scratching my head. The good news: These numbers tell a story of cloud cost maturity and control. But then there’s the bad news.

Sustainable AI Investment: A Systems Thinking Approach

According to our new report, FinOps in the AI Era: A Critical Recalibration, 40% of companies now spend $10M or more annually on AI. Most can’t tell you if it’s working. That’s not a budgeting problem. It’s a systems problem. And Donella Meadows wrote the playbook for understanding it.

Turning Data Into Decisions with the xMatters Incident AI Agent

When an incident hits, the gap between awareness and action can make all the difference. Responders know the pain: endless tool-switching, chasing updates, and fragmented data. It’s not a lack of capability that slows response; it’s the lack of context and connection. That’s why we built the xMatters Incident AI Agent, a purpose-built, conversational assistant that brings intelligence and automation directly into the heart of incident response.

Silent Failures: Why AI Code Breaks in Production

You ship a small “safe” change on Friday. The diff is tiny, the tests are green, and the AI assistant was confident. An hour after deploy, your on-call channel lights up. A downstream service is rejecting responses that look fine in code review. Now you’re rolling back and rewriting a fix that should have been obvious if you had real traffic in the loop. This isn’t a hypothetical.

Investigate Issues in Slack: Grafana Cloud Slack App with AI

The Grafana Cloud app for Slack brings observability and incident response closer to where you and your teams already collaborate Ask questions about system health, alerts, on-call schedules, and Grafana Cloud features; manage incidents and alerts; and collaborate with full context.

Agent vs Assistant: The key distinction between Olly and the competition

The market is saturated with agents and assistants, making it difficult to tell them apart. However, the difference between these two approaches is significant. They offer radically distinct levels of impact, reflecting major differences in both their technical complexity and the quality of their inferences. Let’s figure out the distinction.

Sentry acquires XcodeBuildMCP

Today we're announcing that Sentry has acquired XcodeBuildMCP, an open source MCP server that gives AI agents the ability to build, test, and debug native iOS and macOS apps. XcodeBuildMCP has become a go-to tool for agentic Apple-platform development, with more than 4,000 GitHub stars and an active community. It unlocks the full developer loop: build, run, debug, interact, and verify, allowing users to stay in their preferred agentic development environment.

A Step-by-Step Look at how Agentic, Autonomous ITOps Resolves Incidents

Agentic, autonomous ITOps improves incident response by carrying context from detection through resolution, reducing noise, delay, and manual coordination. Most IT incidents don’t fail due to missing data. Monitoring systems generate more than enough signals. The problem is that understanding those signals—and deciding what to do with them—happens in fragments. Engineers move between dashboards, logs, tickets, and chat threads, stitching together context by hand.

What Agentic AI Is Really Made Of (Most People Miss This)

Agentic AI isn’t just an LLM. Without the right context, it gives generic answers. This is the component that makes its decisions actually useful. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

AI Query Assist for SolarWinds Database Performance Analyzer

Is your database slow? Let AI do the heavy lifting. Watch how SolarWinds DPA’s AI Query Assist transforms query tuning from a manual headache into a streamlined process. This demo shows you how to get instant, AI-powered recommendations for your worst-performing queries while maintaining the control to review and verify every fix. It’s not just about finding the problem—it’s about fixing it faster.

Scaling Creative Output Without Sacrificing Quality

Scaling creative output often breaks when volume blurs what made the work recognizable. As calendars fill, small shifts in tone, visuals, and judgment start to stack up. In editorial terms, scaling means increasing volume and coverage without lowering standards. That only works when roles stay clear and each asset has a defined path to "done." More posts do not equal scalable content production. Scale comes from repeatable systems that protect brand consistency and quality control across busy weeks, staff changes, and multiple contributors.

AI Medical Scribe in Action: How AI-Powered Medical Scribes Are Transforming Clinical Documentation

In an era where healthcare systems are stretched thin and clinicians are inundated with administrative work, technological innovation has become essential-not optional. Among the most impactful advances is the emergence of the ai medical scribe, a tool that is not just streamlining clinical documentation but is fundamentally reshaping how patient care is delivered and recorded. This editorial dives deep into the transformative power of AI-powered medical scribes, offers real-world examples, and provides expert-level analysis of why this technology is pivotal for modern medicine.

From Concept to Screen: A Pro's Guide to Multi-Shot Storytelling in Seedance 2.0

In the early days of AI video, the medium was largely defined by "one-hit wonders"-single, impressive clips that existed in a vacuum. You could generate a beautiful shot of a dragon or a futuristic city, but trying to tell a cohesive story with a beginning, middle, and end was nearly impossible. The characters would change, the art style would drift, and the logical flow between shots would crumble.

We Measured AI Impact for 12 Months. Here's What Actually Happened.

When we rolled out AI coding tools across our engineering team, the first few weeks felt great. Developers were enthusiastic. Acceptance rates looked healthy. Everyone said they felt more productive. Then my CEO asked me a simple question: “Is it working?” And I realized I didn’t have a good answer. Feeling productive and being productive are not the same thing.

Introducing Skylar Advisor: You Need an Advisor, Not an AI Assistant

Skylar Advisor is a next-generation experience powered by Skylar AI, built to help IT teams focus on what matters right now. In this video, ScienceLogic Chief Product Officer Michael Nappi shares how Skylar Advisor proactively curates and summarizes key signals across monitoring tools, logs, and streaming telemetry into clear advisories your team can act on in seconds.

"Crown Jewels In, Crown Jewels Out" - The Hidden Risk of AI

How do you secure data in the age of Agentic AI? In this episode of ShipTalk, Dewan Ahmed sits down with Devan Shah, Chief Architect of Data Security at IBM, to explore the massive shift from traditional DevOps to AI-infused software delivery. Devan shares his journey from being a chef to leading an "army" of 450+ developers at IBM. They dive deep into the technical bedrock of IBM’s "OnePipeline" (built on Tekton and Argo CD), the rise of Data Security Posture Management (DSPM), and the architectural principles required to ship AI features without compromising security or compliance.

SRE Report: AI optimism and the economics of effort

For eight years, the survey behind the SRE Report has used a consistent methodology. That consistency allows us to track how reliability work evolves over time, rather than relying on snapshots. One of the most stable questions in the survey asks respondents to estimate how much of their work, on average, is spent on toil. Between 2020 and 2024, responses showed a gradual decline in reported toil.

Build, buy, or open source? Understanding your options with Grafana's AI-powered observability

Some questions in engineering never go away. Here’s one that every team eventually confronts: Do we roll up our sleeves and build the tooling ourselves, or do we buy something built for us? It’s a choice that has the power to speed teams up or hold them back. With the rise of AI-powered observability, this familiar software dilemma has re-emerged with higher stakes and faster-moving technology.

What problem is agentic AI trying to solve?

Agentic AI isn’t limited to security operations. It’s already improving hospitals, financial systems, and service industries by reducing overload and filling skill gaps. Here’s the problem it was actually built to solve. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

The rise of the agentic future: scaling AI workflows with relaxAI and n8n

This blog is based on the webinar, “From idea to agent: Building AI workflows with relaxAI and n8n”. You can watch the full recording by clicking here! AI isn’t slowing down. We’re moving from “ask a chatbot” to agents that run the multi-step workflows, use tools, and are built for real business processes. Most teams aren’t blocked by ideas. They’re blocked by three things: complexity, cost, and control.

AI Vendor Lock-In: How AI Is Creating A New Dependency Problem

Like most SaaS companies, you’re under pressure to ship AI-powered features faster, smarter, and at scale. For many teams, that pressure leads to relying on external AI platforms, managed models, and third-party APIs instead of building everything from scratch in-house. At first, it feels like a win. Your team ships an AI-powered feature in weeks instead of months. No GPU clusters to manage. No models to train. No infrastructure to babysit.

Beyond boundaries: How global collaboration defines AI in 2026

As we move through 2026, the global conversation around AI is shifting from simple adoption to a deeper focus on true openness and sovereignty. In this session from Civo Navigate India 2025, OpenUK CEO Amanda Brock explores the evolving state of AI openness and shares a significant milestone: India is now the world’s number one open-source contributing community.

Agentic AI in DevOps: The Architect's Guide to Autonomous Infrastructure | Harness Blog

For the last decade, the holy grail of DevOps has been Automation. We spent years writing Bash scripts to move files, Terraform to provision servers, and Ansible to configure them. And for a while, it felt like magic. But any seasoned engineer knows the dirty secret of automation: it is brittle. Automation is deterministic. It only does exactly what you tell it to do. It has no brain. It cannot reason.

AI NetOps: How AI and Machine Learning Transform Network Operations

AI is changing network operations (NetOps) from static automation into adaptive, data-driven systems that can summarize incidents, retrieve knowledge, and guide remediation with human oversight. In this talk, Phil Gervasi breaks down what “AI for NetOps” really means in practice, including the difference between classical ML and large language models (LLMs), why data pipelines matter more than model tuning, and how patterns like RAG (retrieval augmented generation), text-to-SQL, and agentic workflows turn raw telemetry into decisions.

What is agentic AI? (explained in 60 seconds)

Agentic AI is the next evolution of artificial intelligence. Unlike traditional AI, it can act autonomously and make decisions on its own. Here’s what that actually means, without the hype. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

AI SRE in Practice: Tracing Policy Changes to Widespread Pod Failures

Policy changes in Kubernetes are supposed to improve security, enforce standards, or optimize resource usage. But when a policy change triggers cascading pod failures across multiple namespaces, the investigation becomes a race to identify what changed before more workloads are affected.

How To Cut Your LLM Costs for Startups (Without Slowing Product)

In February 2026, most startups don't "adopt AI" in a neat, planned way. LLM usage spikes the week you ship a new feature, add an agent, or connect tools. Budgets don't spike with it. The good news is that the biggest savings usually come from smarter routing, caching, and workload design, not from ripping out your stack or rewriting everything.

CX Trends 2026: How AI Reads Emotions, and Why 92% Still Want Humans When Things Go Wrong

By 2029, AI will autonomously handle 80% of customer interactions. That leaves your human agents handling only the remaining 20% the complex, emotionally charged moments that determine whether customers stay or leave. You're building a Ferrari for highway driving while your emergency braking system still relies on hope and 2019 escalation protocols.

DraftOut: Bypassing AI Detection in Modern Academia

Students today often face the pressure of producing high-quality academic writing quickly. They want speed, efficiency, and clear results, but many AI-generated drafts are immediately flagged by detection tools like Turnitin safety or bypass GPTZero. These detectors are designed to identify repetitive patterns, formulaic phrasing, and lack of nuanced reasoning, common in generic AI outputs.

AI Search Technologies and the Future of Intelligent Information Retrieval

The transformative power of artificial intelligence (AI) is evident across a multitude of sectors, none more so than in AI search technologies, a pivotal advancement reshaping the future of internet navigation and information retrieval.

AI Is Forcing A Return To Hybrid And Multi-Cloud (Here's What To Do Now)

For most of the last decade, the direction of cloud strategy was clear: standardize, consolidate, and reduce sprawl. Engineering teams worked to pick a primary cloud, reduce vendor dependencies, and simplify their stacks. FinOps teams unwound years of fragmentation. Platform teams built guardrails to make sure it didn’t happen again. Then AI arrived, and it’s a fundamentally different class of workload. AI demands specialized hardware and, increasingly, diverging providers.

My AI Agent Stole My Crypto #speedscale #openclaw #aicoding #codingagent #security

I thought I found the ultimate coding shortcut: an autonomous AI agent. Turns out, I just bought a one-way ticket to a digital nightmare. A friendly reminder to my fellow devs: Validation isn't optional—it's survival. Your laptop shouldn't have a higher calling than your production environment. Validate now: speedscale.com.

How we built Grafana Assistant - a conversation about AI development for observability

This conversation with Grafana Labs engineers, Mat Ryer, Cyril Tovena and Sven Großmann, dives deep into the engineering behind Grafana Assistant, exploring how agentic AI is transforming the observability landscape. From hackathon origins to sophisticated backend agents, the team shares candid lessons on building, scaling, and refining AI tools for engineers.

How AI is democratizing video and what it means for your brand

Video stopped being optional years ago. In 2026, 95% of marketers say video increases brand awareness, and 60% report it directly drives sales. But for small businesses and solo entrepreneurs, there's always been a gap between knowing video matters and actually making it. The costs, the learning curve, the time-it adds up fast.

Operational Risks and Controls When Deploying Legal AI

A law firm recently found that its AI tool had misread "limitation of liability" clauses for 6 months. No one noticed the mistake. The error only came to light when a client faced a huge insurance claim that the firm had promised was capped. The cost? That firm is now dealing with a malpractice lawsuit and a damaged reputation. Using AI in a law office poses risks beyond simple computer bugs. These tools mix technical errors with professional responsibility. As AI becomes a standard part of the job, firms without strict rules will face quality issues and legal trouble.

The AI-Empowered Site Reliability Engineer: Automating the Balance of Risk and Velocity

You might expect an AI-SRE agent to target 100% reliable services, ones that never fail. It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a non-linear cost: maximizing stability limits how fast new features can be developed, dramatically increases the operational cost, and reduces the features a team can afford to offer.

Agentic AI Essentials: The Dashboard and Changing IT Roles

Dashboards provide a useful prism through which we can study the broader evolution of the IT professional’s role in the era of agentic AI. For years, dashboards have been the centerpiece of IT work, serving as the interface where teams interpret system behavior, diagnose issues, and plan actions. Dashboards epitomize the relationship between humans and their systems: humans observe, interpret, and act. As agentic AI enters the picture, that relationship begins to change. Let’s explore how.

From Blueprint to Production: Building a Kubernetes MCP Server

As Large Language Models (LLMs) evolve from simple chatbots into agentic workflows, the need for a standardized way to connect them to external data and infrastructure has become critical. In a recent workshop hosted by Nir Adler, Innovation Engineer at Komodor, we explored how to bridge this gap using the Model Context Protocol (MCP).

Why MCP is becoming part of your product surface

AI assistants are quickly becoming a primary interface for how people interact with software. Developers ask them how to integrate APIs. Users ask them how products work. Buyers ask them how tools compare. Increasingly, the first explanation someone receives about your product does not come from your website, your documentation, or your sales team. It comes from an AI assistant. That shift has an important consequence that many organizations are only starting to notice.

Upsun's AI story: the 5% path from pilots to production value at scale

Here’s the uncomfortable truth: most companies do not have an AI problem. They have a delivery problem wearing an AI costume. MIT’s Project NANDA research has been widely cited for a brutal headline statistic: roughly 95% of corporate generative AI pilots fail to produce measurable business impact or returns, while only about 5% break through to meaningful outcomes. (Yahoo Finance) The models are impressive. The demos are dazzling. The budgets are real.

Intelligent FinOps: AI-Informed, AI-Enabled

AI is the new frontier for FinOps maturity. It introduces fresh spend patterns and new opportunities for value. As GPUs, inference, and retraining reshape costs, FinOps maturity grows through visibility, forecasting, and shared mindset about how these workloads drive business impact. In this 2025 post, I gave my guidelines for implementing AI tagging to give business context and clarity to vague AI invoices. Now, I’m sharing the next level up: how to drive FinOps in AI with AI.

(Tech Talk) Shipping with Context Knowledge Graphs as the Backbone of AI-First Software Delivery

Knowledge graphs are essential to solving the context bottleneck in AI-First software delivery, which occurs because workflows, policies, and dependencies are siloed and invisible to AI agents. In this Tech Talk, Prateek Mittal ((Product Director of AI Core and Data Platform at Harness)) discusses the key concepts: Knowledge Graphs vs. Observability: Observability tells you "what is happening," while knowledge graphs tell you "what does that mean" by modeling structured relationships. They work together to link live signals to affected services or SLAs.

We Built an MCP Server

When I joined Kubex last year, the company was already well aware of the growing power of Large Language Models. As a company focused on intelligent resource optimization for Kubernetes, GPUs, and cloud infrastructure, generative AI didn’t feel like a threat so much as a natural extension of where the industry was heading. Kubex had already invested heavily in machine learning, but it was becoming clear that foundation models could unlock an entirely new class of capabilities for our customers.

Voice AI: The Missing Link in Your Agentforce Strategy

Despite the enterprise-wide pivot toward digital deflection, voice remains the primary escalation channel for high-complexity customer issues. Yet, while organizations rigorously optimize digital touchpoints, telephony frequently remains a siloed legacy endpoint, disconnected from the broader CRM architecture. This integration gap creates a strategic blind spot that fundamentally undermines your digital roadmap.

The Human-Centric Stack: Why Logs Are the Great Equalizer in the Age of AI

In 2026, we are seeing incredible feats of engineering with agentic AI, impacting metrics and distributed traces that map thousands of microservices. Our systems have never been more intelligent and complex. However, as our observability becomes more intelligent, fewer employees know how to manage and troubleshoot complex systems. These employees, who often bear the brunt of an error’s impact, may need to rely on specialists to interpret the system.

Kiro Can Now Reason With Lightrun's Live Runtime Context

AI code generation is fast. Making it reliable requires runtime context. Today, Kiro gains live runtime visibility with the Lightrun MCP. This grounds AI-assisted development in how code actually behaves at runtime. Kiro, the AI coding assistant from the teams at AWS, is built for velocity and intuition. It moves from specification to production with speed and structure, helping teams turn intent into working code. But until now, like every AI coding assistant, Kiro had a major blind spot.

How Honeycomb Supercharges OpenTelemetry for AI

It has become common knowledge that the nature of software development has changed as AI-code generation and agent-based features gain adoption. In perhaps a more subtle shift, the fundamentals of software instrumentation are changing too. As OpenTelemetry becomes the standard instrumentation layer across enterprises, with thousands of developers (many from Honeycomb) actively contributing to it, the nature of the telemetry data captured itself is evolving to meet the growing demand for rich context.

Top 9 Observability Tools for AI-Assisted Development & Deployment

AI-assisted development is rapidly becoming the default way software is built. Code generation, AI copilots, agentic pull requests, and automated refactoring are now embedded directly into engineering workflows. While this shift dramatically increases delivery speed, it also introduces a new operational reality: production systems are changing faster than humans can fully reason about them. This is where observability becomes mission-critical.

What AI Has Never Seen: The Context Gap in Code Generation

Your AI coding assistant has read the entire internet. It knows every programming language, every framework, every best practice documented in Stack Overflow answers and GitHub repositories. It can generate a REST API handler in seconds that looks perfect with clean code, proper error handling, following all the patterns. But here’s what it’s never seen: your production traffic. Data from a real API request. Someone filling out a form with messed up or incomplete data.

Observing agentic AI workflows with Grafana Cloud, OpenTelemetry, and the OpenAI Agents SDK

As agentic AI applications are used more broadly in production, they introduce new operational models, combining multi-step reasoning, tool execution, and autonomous decision-making into a single workflow. SRE teams need visibility into how these agents behave, where they fail, and how they perform over time.

The Dangerous Power of Local AI Agents. #speedscale #proxymock #aiagents #openclaw #localai

I’ve been testing OpenClaw, a fully autonomous agent that lets you remote control your entire system via Signal. It’s incredibly powerful to text your computer from a coffee shop and have it execute tasks, but you’re essentially handing the keys to your digital kingdom to an LLM. The Golden Rule: Trust, but verify. I’m using Proxymock to sniff every single API call going in and out of the agent. If there’s a data leak or a "hallucination" that tries to wipe my drive, I see it first.

Qwiet AI Is Now Harness SAST and SCA | Harness Blog

Modern application security is struggling to keep up with AI-driven development and cloud-native scale, especially when security feels bolted onto CI/CD instead of built in. Harness SAST and SCA bring AI-powered application security testing natively into the Harness platform, reducing noise and alert fatigue. By identifying only vulnerabilities that are actually reachable in production code, teams get findings they can trust and act on faster.

The Grok-to-AI Evolution: Why Modern SREs Are Moving Beyond Manual Parsing

Grok structures logs. Context engineering connects systems. AI explains behavior. For years, Grok patterns have been the workhorse of the SRE world. Built on regular expressions, Grok helps teams extract structure from unstructured logs. As we explored in "Do You Grok It?", Grok is the key to turning messy log lines into usable fields. It's why our Grok Pattern Reference remains one of our most-visited resources — SREs are hungry for structure.

Scalable AI governance: why your policy needs a platform, not just a PDF

Most IT teams don’t lack AI policies. They lack policies that survive a Git push. In many organizations, AI governance is a paper tiger. There are comprehensive documents outlining data usage, approved models, and risk management. On an auditor's desk, these policies look complete. But inside the workflow, the reality is different. AI tools are being embedded directly into IDEs, CI pipelines, and internal automation scripts.

What mid-market IT teams wish they knew before deploying AI agents

AI agents are quickly shifting from experimentation into day-to-day operations. That shift is showing up in the data. McKinsey’s latest State of AI research highlights both broader AI use and the growing focus on “agentic AI,” even as many organizations still struggle to scale safely. For mid-market IT teams, agents can feel like the unlock: automate repetitive workflows, reduce backlog pressure, and deliver more output without expanding headcount.

AI Agent Governance: How to Keep Agentic ITOps Workflows Safe

The future of ITOps automation is better control over what AI agents can see, share, and do. AI automation in ITOps is expected to resolve incidents, reduce operational load, and operate with limited human involvement. Those outcomes depend on systems that can take action, not just surface insight. Agentic AI enables that shift. AI agents can correlate signals across tools, update tickets, trigger remediation, and coordinate workflows without waiting for instruction.

Building Trust in the Machine: A Guide to Architecting Agentic AI for SRE

The promise of Artificial Intelligence in Site Reliability Engineering (SRE) is seductive: an autonomous system that never sleeps, instantly detects anomalies, and fixes broken infrastructure while humans focus on high-value work. However, the gap between a demo-ready chatbot and a production-grade Autonomous AI SRE is vast. In complex, noisy environments like Kubernetes, a “naive” implementation of Large Language Models (LLMs) is not just ineffective, it can be dangerous.

AI Tags: Why Cloud Tagging Breaks Down For AI Workloads (And What To Use Instead)

Tags have long been the backbone of cloud cost visibility and governance. They help teams understand who owns what, where spend comes from, and how infrastructure maps back to the value the business delivers. However, AI workloads have altered that model, and exposed the limitations of traditional AI tags in the process. In fact, many of the most expensive AI operations don’t run on taggable cloud resources at all.

AI meets SQL Server 2025 on Ubuntu

Since 2016, when Microsoft announced its intention to make Linux a first class citizen in its ecosystem, Canonical and Microsoft have been working hand in hand to make that vision a reality. Ubuntu was among the first distributions to support the preview of SQL Server on Linux. Ubuntu was the first distribution offered in the launch of Windows Subsystem for Linux (WSL), and it remains the default to this day. Ubuntu was also the first Linux distribution to support Azure’s Confidential VMs.

You Need an Advisor. Not an AI Assistant.

Complex environments don’t fail because teams lack data. They fail when teams can’t trust what the data is telling them. There are too many signals, too little time, and too much risk riding on every decision. That’s the reality Skylar Advisor is built for: delivering guidance teams can verify, so they can act faster without gambling on opaque, black-box answers.

Are We Letting AI Think for Us? | SolarWinds TechPod #105

We’re more dependent on technology than ever—and AI is changing how we make decisions. But what happens when the systems fail? Or when bad actors decide to “pull the plug”? This clip dives into a scary but necessary question: Are we losing our ability to critically think and problem-solve by relying too much on AI? Is AI leveling the playing field—or quietly taking over human decision-making? A must-watch conversation about innovation, outages, AI risk, and why having a backup plan matters more than ever.

Grafana Assistant: Why you can trust our agent-and yourself-in an era of AI hallucinations

Let’s be real: AI can hallucinate. And in observability, that feels risky. No one wants an assistant that sends your SREs chasing ghosts. At best, that burns expensive engineering time. At worst, it slows incident response in production and pushes teams toward the wrong remediation path. So here’s the big question: What makes Grafana Assistant different, and why should you trust it? Let’s start by acknowledging the fear. AI hallucinations are a real issue.

Properly securing OpenClaw with authentication

OpenClaw (née MoltBot, née ClawdBot) is taking over the world. Everyone is spinning their own, either on a VPS, or their own Mac mini. But here's the problem: OpenClaw is brand new, and its security posture is mostly unknown. Security researchers have already found thousands of publicly available instances exposing everything from credentials to private messages.

Skylar Advisor: Proactive Guidance for Modern Operations

Meet Skylar Advisor, bringing trusted and verifiable guidance to IT operations by connecting real time observability with your data and knowledge. Built AI native, it helps teams cut through alert floods, understand what matters most and why, and take the next best steps with confidence. Every recommendation is evidence backed and traceable to the exact data and sources used, so guidance is clear, explainable, and defensible when the stakes are high.

Elastic 9.3: Chat with your data, build custom AI agents, automate everything

Today, we are pleased to announce the general availability of Elastic 9.3 as the latest version of the Elasticsearch Platform — the world’s most popular open source platform for transforming both structured and unstructured data into trusted answers and outcomes. In addition to including new features that help developers with context engineering and agent building, Elastic 9.3 introduces a broad set of new capabilities to Elastic Search & AI, Elastic Observability, and Elastic Security.

Protect agentic AI applications with Datadog AI Guard

Organizations are increasingly using agentic AI applications powered by large language models (LLMs) to automate analysis, decision-making, and operational workflows. As these AI agents take on more responsibility, they gain access to internal tools and services and can interact with them in unintended ways.

Tool Consolidation Is Dead. Long Live Agentic AI.

It’s 2026, and developers have more tools at their disposal than at any point in the industry’s history: CI/CD platforms are richer; observability stacks are deeper; security, data, and AI tooling have exploded into crowded, competitive ecosystems. And yet, delivery is still slow, incidents are still noisy, workflows are still brittle. The problem is no longer tool scarcity or feature depth. It’s integration debt.

8 themes shaping engineering in the age of AI

We know that AI has been transformational for engineering and it will continue to be, so stop me if this sounds familiar. Imagine an engineering lead opening a pull request for a critical security patch and finding five hundred lines of AI-generated code. While the solution is (mostly) usable, it follows a pattern no one on the team recognizes. This shift away from manually writing every line of logic has introduced a unique level of complexity for teams.

Komodor AI SRE vs. OSS AI Agent: A Technical Comparison of Agentic AI for Kubernetes Troubleshooting

Gartner predicts that AI agents will be implemented in 60% of all IT operations tools by 2028, up from fewer than 5% at the end of 2024. This acceleration has sparked an explosion of AI SRE solutions, from enterprise platforms to open-source alternatives, all promising faster root cause analysis and reduced MTTR.

How to Build AIPowered Search with Elasticsearch [2 Min Live Demo]

In this demo, we show how Elasticsearch enables production‑ready GenAI and AI‑powered search applications—from indexing and embedding your data to grounding large language models with RAG. You’ll see how developers can go from raw data to a fully functional GenAI search experience—fast Additional Resources.

Automating Infrastructure as Code changes with an AI agent

The infrastructure management landscape is undergoing a fundamental transformation. Infrastructure as Code has already revolutionized how we provision and manage cloud resources by treating infrastructure as software. The next evolutionary step involves intelligent automation that can understand, adapt, and optimize these configurations independently.

Everything you need to know about ITIL 5, AI and incident management

ITIL 5 launched in January 2026, and for the first time in the framework's 40-year history, AI governance is front and center. If you're running incident management, on-call rotations, or building operational tooling, this matters: the gap between AI adoption and AI governance is about to become a compliance and operational risk issue. I’m not usually a big ITIL fan, but this guidance has some genuinely useful framing and questions.