Operations | Monitoring | ITSM | DevOps | Cloud

Modernising Middleware and B2B Integration with Assurance

Modernising enterprise middleware is now a strategic necessity for cost efficiency, AI-readiness, and operational clarity. Hybrid estates of IBM MQ, Apache Kafka, and other brokers hide inefficiencies that drain profitability, but an operating model built on Assurance and Optimisation restores transparency and control. By unifying data, rebalancing workloads, and enabling safe AI autonomy, organisations can build a resilient “Confidence Economy.”

Mezmo's AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)

We are thrilled to announce the availability of Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)—a truly transformative leap forward for engineering and operations teams included in your existing subscription at no additional charge. We are paving the way for a new era of observability, moving beyond passive, reactive monitoring to a world of proactive AI-driven observability.

Agentic AI and the End of Traditional IT (w/ Robb Wilson)

In a wide-ranging conversation, Robb Wilson—CEO and co-founder of OneReach.ai and author of The Age of Invisible Machines—joins Tim and Tom to explore the rise of agentic AI and its seismic implications for IT, organizations, and society. Robb breaks down the concept of agent runtimes, why conversational interfaces matter more than ever, and how adaptive, self-orchestrating systems will reshape work far beyond today’s service models.

A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

About two months ago, an incident at Grafana Labs was kicked off in typical fashion: A series of alerts were triggered, our on-call engineer acknowledged it on Slack, and the rest of the team quickly began hypothesizing about the potential culprit. But the way the incident was resolved was anything but typical. Yes, our internal team followed best practices to resolve the incident as quickly as possible.

AI API Aggregation: Managing Costs And Complexity Across Multiple LLMs

Running multiple LLMs without aggregation can feel like managing five different clouds with no dashboard. Sure, you can make it work, but you won’t like the bill. And most SaaS teams didn’t start with a multi-LLM strategy. It just happened. You added one model for reasoning, another for summarization, or maybe a fine-tuned version for customer support. Fast-forward six months, and your AI stack looks like a tangle of APIs. And each charges tokens on its own terms.

Prioritize errors and create tickets using Rollbar's MCP Server

Production errors can feel overwhelming. Your Rollbar dashboard is filling up with alerts, your team is scrambling to understand what needs immediate attention, and critical revenue-impacting issues might be buried among less urgent problems. In this post, we'll walk you through a workflow that transforms production error chaos into organized, prioritized action items. We'll cover everything from analyzing Rollbar errors to creating properly linked Linear tickets.

MachineGPT: Speaking the Language of Machines to Shape the Future of AI

At.conf25, we took a bold step forward—introducing the concept of MachineGPT, which brings the power of generative AI to one of the most overlooked resources: machine data. MachineGPT speaks the language of machines. Just like ChatGPT learned the grammar of words and sentences to understand questions and respond in human language, MachineGPT can learn the hidden “grammar” of how systems behave through machine data.

The AI Workload Punishes Bad Habits

The AI workload presents the ultimate challenge, highlighting the structural limitations of the traditional hyperscaler model. In this segment from a Civo Navigate London 2025 session, Kelsey Hightower explains exactly why AI adoption forces enterprises to confront flawed architecture and rising astronomical costs. When specialized hardware is scarce and rented GPUs sit idle at a premium, it’s clear that traditional cloud providers were not built for this era. Data that didn't move is forcing organizations to move compute back to where it lives.

5 Skills Intelligence Platforms to Watch in 2025, Reviewed & Ranked

Businesses need to build strong teams, and leaders, within their organization so they can continue to drive productivity and efficiency. This also offers more than a few other benefits, like improving employee morale and retention, enhancing your employer brand, and helping you run a more cost-effective business. Skills intelligence platforms are a vital part of this. They let companies implement affordable and effective ways to engage employees as they take their careers to the next level.

How AI Is Transforming Field Service Routing and Operational Efficiency

Before, field service operations depended on set schedules, hand-planned routes, and local dispatchers. Even though we are aware of this, routing based on intuition is becoming less effective as service networks become more complex, customer expectations rise, and operating expenses shift. How can companies with a large fleet of service vehicles efficiently arrange personnel, vehicles, and parts to meet service level agreements while minimizing costs and downtime?