Operations | Monitoring | ITSM | DevOps | Cloud

ODBC Driver for MySQL: Open-Source vs Commercial (2026)

The MySQL ODBC driver is what keeps BI tools, reporting systems, and ETL pipelines connected to MySQL without errors. Teams have depended on it for years, and it’s still vital today, especially with MySQL ranked worldwide in February 2026. However, not all ODBC drivers are built alike. There are two categories: open-source options and commercial ones. While both connect applications to MySQL, they differ in areas like stability, performance, security, and support.

Humanized AI Text for Stronger DevOps and Operations Content

You create content for operations teams, DevOps engineers, SREs, and IT decision-makers. Topics include monitoring, incident management, cloud infrastructure, ITSM processes, and observability tools. AI generates initial drafts quickly. The results frequently come across as mechanical. Sentences follow predictable patterns. Technical explanations lose nuance. Readers in this field expect precise, practical language from experienced practitioners. They detect generated text easily. Engagement drops when content feels detached from real-world ops challenges.

Mastering Temporal LEFT OUTER JOINs for Historical State Analysis in SQL Server

Short Summary: This guide shows how to use time-based LEFT OUTER JOINs with SQL Server temporal tables, step by step. You’ll see how dbForge tools help you fine-tune these queries so you can get accurate reports for specific points in time, fully understand how your data changes, and confirm that your logic is correct.

IT Cost Optimization Strategy: Eliminating Guesswork with Observability

IT organizations are being asked to reduce costs, manage risk, and maintain performance at the same time. Meanwhile, infrastructure complexity continues to grow, and vendor pricing changes are reshaping budget assumptions. Too often, an IT cost optimization strategy is shaped by incomplete data around sizing, licensing, refresh timing, and platform decisions. That uncertainty leads to overprovisioning, budget surprises, and reactive operations. Observability changes that equation.

How to Create an AI Chatbot for Your Website?

Chatbots are starting to look fairly promising for businesses of all kinds. Customers today are keen to get things resolved faster than ever. Every startup out there is tempted to take the deal. But before jumping onto the bandwagon, you need to do some thinking as to what type of chatbot you must invest in. The decisive question being, which model of conversational AI perfectly aligns with the needs of your organization.

Why measuring things openly is the first step toward a stronger engineering culture

Most engineering leaders know they should be measuring more. What holds many of them back is a quieter concern about whether the organization is actually ready to see the numbers. This tension, however, did not keep Ganesh Datta, our co-founder and CTO, and Randy Shoup, SVP of Engineering at Thrive Market, from diving down this rabbit hole on the Braintrust podcast.

Escalation policies for critical incidents

When a critical incident triggers, there’s no time to figure out who to call. That decision needs to be made well before the incident arrives. A dedicated escalation policy for critical incidents gives your team a clear path to follow the moment things go wrong, rather than leaving it to whoever happens to be around. This guide covers the key decisions involved in building that policy.

Feature Friday: New Bird's Eye Report

Stop squinting at data and start driving engineering excellence. In this week’s Feature Friday, Christine from the Cortex Product Team introduces the all-new Birdseye Report (now in private beta). See how to master your engineering standards at scale. We’ve redesigned Birdseye to give you a true "top-down" view of scorecard performance across your entire organization—from the CTO level down to a single service.

The Complexity Myth in Test Data Management

This is a guest post from James Hemson. For years, the test data management market has told smaller companies the same story. Test data is complex. You need consultants. Compliance is expensive. Expect a six-month implementation before you see any value. At Redgate we think that's wrong. And we think it's wrong by design. Complexity creates services revenue. It creates switching costs. Most vendors have built their businesses around this.

Understanding L1, L2, L3 escalation policy

L1, L2, L3 is one of the most common ways to structure an escalation policy. The idea is simple: an incident triggers and lands with a first responder. If it needs more attention, it moves up the chain to someone with more expertise. This guide explains how each tier works, when this structure makes sense, and what to keep in mind when setting one up.

Canonical and Ubuntu RISC-V: a 2025 retro and looking forward to 2026

2025 was the year that RISC-V readiness gave way to RISC-V adoption. It’s been quite a journey. What began years ago as early architectural exploration and enablement has matured into real silicon, systems, and deployments. In particular, RVA23 provides a stable and predictable baseline we can align on with our wider ecosystem of partners. At Canonical, we’re committed to making RISC-V a viable option for anyone who wishes to adopt it.

On-Demand Vs. Spot Instances: What's The Difference?

Whether you’re in finance or engineering, you know keeping your customers happy is the key to success. That means, your SaaS product or service needs to be available, reliable, and cost-effective virtually all the time. On that note, you can determine how stable and high-performing your service is depending on whether you use On-Demand or Spot Instances. Pricing, capacity, and flexibility will also vary depending on which of the two instances you choose.

From Chaos Engineering to Resilience Testing: Why We're Expanding How Teams Validate Reliability | Harness Blog

At Harness, we’re committed to helping teams build and deliver software that doesn’t just work – it thrives under pressure, scales reliably, and recovers swiftly from the unexpected. Today, we’re taking the next step in that mission by evolving our Chaos Engineering module into Resilience Testing. This evolution reflects how reliability is tested in practice today.

Engineering Metrics Success: Communicate Speed, Quality, and Business Outcomes | Harness Blog

Engineering metrics tools won’t solve problems if there isn’t communication about expectations in place. Learn how leaders are connecting engineering metrics with business outcomes. Engineering organizations are waking up to something that used to be optional: measurement. Not vanity dashboards. Not a quarterly “engineering metrics review” that no one prepares for. Real measurement that connects delivery speed, quality, and reliability to business outcomes and decision-making.

Securing 80,000 transactions per second at Infobip with HAProxy Enterprise WAF

The average cost of a security breach reached nearly $4.4 million in 2025, according to the publication Cost of Data Breach Report. To proactively address this substantial financial and security risk, Infobip, a global cloud communications platform, used HAProxy Enterprise to implement a security and uptime framework that is both highly modular and highly performant.

Ghosts of Servers Past: The Bare-Metal Comeback Story

Bare-metal. Just reading that word might trigger a physical reaction for some of us. Dusty closets, old server rooms, and loud rigs that never seemed to work quite right. Remember waiting days for IT to provision a server, only to realize your ticket got lost in the shuffle? Or the classic "well, it worked on my machine" excuse right before a production push? Ah, the good old days.

What is an escalation policy? (And why every team needs one)

An escalation policy is the route an incident takes after it triggers. It lays out who gets alerted first and sets a wait time. If nobody responds, it moves the incident forward to the next person. The word “escalation” is worth pausing on. When an incident triggers and the first person doesn’t respond, the incident doesn’t sit and wait. It moves to the next person and keeps moving until someone picks it up. That forward movement is the escalation.

[Webinar] Conquering the Complexity of Self-Hosted Apps with Agentic AI SRE

Most enterprise SaaS products, like Komodor’s Autonomous AI SRE Platform, require installing a remote agent on the customer’s infrastructure, which varies significantly from one organization to another, in terms of architecture, configurations, permissions, processes, and more. This “unmanaged” model creates major blind spots, making daily operations, observability, debugging, and incident response challenging. When failures occur, limited visibility and bespoke systems make root-cause analysis slow, incomplete, or impossible.

A compass for designing your escalation policy

The first time you sit down to design an escalation policy, it can feel a little like a crossroads. You know incidents need to reach the right people. You just aren’t sure which structure makes the most sense. Should you route by severity? By who’s available? Or by team? There’s no single right answer. Think of this guide as a compass. A compass doesn’t tell you exactly where to go. It helps you orient yourself based on where you already are.

The benefits of leadership coaching in the tech industry - with Cindy Gross

Steve is joined by Cindy Gross to discuss leadership coaching in the technology industry – what it is, how it works, and the many benefits it brings. Recorded on-site at PASS Data Community Summit 2025 in Seattle. Cindy is an executive coach and Adaptive Leadership Expert with 25 years in the US tech industry. As a former SQL Server Master (MCM) and expert in cross-organizational navigation at Microsoft, she transformed her focus from complex technical systems to empowering leaders within systemic chaos.

Harness AI February 2026 Updates: Securing & Making the SDLC Reliable and Shipping Faster with Agents | Harness Blog

February is all about making AI in software delivery secure and easier to operate at scale. This month’s updates span enterprise-grade application security, API security via MCP, SRE automation, and a major upgrade to the DevOps Agent.

Azure Tagging In 2026: A Complete Guide to Organizing Resources, Costs, and Governance

Azure tags are like sticky notes for your cloud resources. They help you label and organize infrastructure in ways that make sense to your organization. Tags enable you to assign categories to resources, making it easy to group, monitor, track, and filter them across any environment. So, how do tags and tagging work in Azure?

Smarter Custom Metrics for Redgate Monitor: Additional Alert Text Query

This is a guest post from Nick Coombe. Redgate Monitor's built-in metrics cover the most common database pressure points out of the box. However, every estate has a few KPIs and metrics that are specific to the business, and users can create custom metrics to track those signals and receive an alert when they cross a threshold.

Cisco and Megaport: Redefining the Edge of Modern Networking

Explore four Cisco and Megaport Virtual Edge solutions that bring secure, high-performance networking closer to users and clouds. Enterprises have quickly outgrown the constraints of legacy WAN architectures. As cloud and SaaS dominate and users grow more distributed, rigid infrastructure models—complete with backhaul-heavy designs and unpredictable internet performance—no longer work.

Escalation policies for low-priority incidents

Teams put a lot of thought into how critical incidents are handled. Low-priority incidents usually don’t get the same attention. And without a proper escalation policy, they just land in a shared channel, waiting for someone to acknowledge. Setting up a clear policy for them is worth doing. Not because they need the same urgency as a critical incident, but because having a defined path for every incident makes the whole system more reliable.

Managing AI Models and Datasets with Harness Artifact Registry | AI/ML Artifact Management

Building AI applications often means juggling multiple models, scattered datasets, and version chaos across local systems. But what if you could bring it all together — securely and efficiently — in one place? In this walkthrough, Shibam Dhar, DevRel Engineer at Harness, demonstrates how Harness Artifact Registry makes it easy to manage and govern your AI/ML assets — from models and datasets to prompts and agents — with built-in support like Hugging Face and generic registry types.

Unmasking the Resolute Raccoon

You’ve almost certainly seen them… In the forest, rummaging through a dumpster, in poorly aging millennial memes. Raccoons are ubiquitous and endlessly entertaining creatures. YouTube and TikTok are full of videos documenting their clever antics and escapades. One such intrepid raccoon gained fame for making their way to the most unlikely places, from liquor stores to karate studios.

Inside the architecture: How Upsun delivers 99.99% uptime for AI

For a CTO, "four nines" represents a commitment to keeping production revenue live with less than 0.01% of total downtime per year. As AI workloads move from pilot projects into core production services, the reliability requirements for infrastructure have shifted. AI agents, RAG pipelines, and automated LLM workflows depend on a consistent platform state.

Stop Vibe Coding Everything: The Case for Spec-Driven Dev

Spec-driven development with AI coding agents could change how you build software. In this GitKon 2025 talk, Erik Hanchett, Senior Developer Advocate at AWS, breaks down why AI coding assistants perform dramatically better when they start with structured specifications instead of raw prompts. If you've been vibe coding your way through complex features and wondering why your AI keeps going off the rails, this is the video for you.

Millions of Metrics. Zero Clarity.

Millions of metrics. Zero clarity. That’s the reality many IT teams are facing today. As environments grow more complex, telemetry explodes. Millions of records generated every hour. Dozens of specialized tools for network, storage, Kubernetes, cloud, AI workloads. Each tool is good at its domain. But none of them answers the real question: Where should I focus right now? Fragmented visibility creates predictable failure modes.

Keeping it boring: the incident.io technology stack

At incident.io we run a deliberately simple technology stack. Keeping things boring has allowed us to scale from a few hundred customers to several thousand, while having only two platform engineers. In this post I'll walk through the stack, explain some of the choices we've made, and touch on the challenges we're facing as we grow.

The Ultimate Kubernetes Cost Monitoring And Management Guide

While Kubernetes enables teams to deliver more value faster, understanding and controlling Kubernetes costs remains challenging. You have disposable, replaceable compute resources constantly coming and going across a range of infrastructure types. Yet at the end of the month, you only get a billing line item for EKS costs and several EC2 instances.

Scaling Argo CD Past 50 Clusters: GitOps, Pipelines, & Governance

Is your engineering team hitting the "Argo Ceiling"? Argo CD is incredible at syncing state, but as you scale past 20, 50, or 100 clusters, the maintenance tax skyrockets. In this webinar, we break down why the "hub and spoke" model of GitOps creates isolated silos, leading to "tab fatigue," massive security blast radiuses, and the need for thousands of lines of brittle CI "glue code" just to handle basic release orchestration.

Resolve Webinar: Automating Joiner, Mover, and Leaver Workflows with Agentic Orchestration

Still managing Joiner, Mover, Leaver workflows with tickets and manual handoffs? It’s time to automate them. In this fast-paced session, we show how enterprises use agentic AI and orchestration to eliminate repetitive JML tickets, enforce policy automatically, and deliver secure access from day one. Powered by Resolve’s Agentic Resolution Fabric, AI agents coordinate knowledge, automation, and technician assist to provision, modify, and revoke access without manual triage. Faster onboarding. Zero-delay changes. Audit-ready offboarding.

Resolve's Agents of IT podcast - Ep. 13 - Ari's Secret Fortune (500)

In this episode of Agents of IT, we reflect on scale, strategy, and what it really takes to transform IT inside the world’s largest enterprises. Ari shares lessons from his work with Fortune 500 organizations, breaking down what separates automation pilots from true operational change. From navigating enterprise complexity to driving executive alignment, this conversation explores how large organizations move from reactive ticket management to intelligent, agent-driven operations.

The story behind Konstruct: Lessons learned scaling GitOps

In 2024, Civo acquired Konstruct (formerly Kubefirst) to reinforce our commitment to simplifying cloud computing complexities. When this acquisition was made, it began a whole new chapter for the team behind Konstruct. Over the years, we assembled our team by working with a community of thousands of engineers in what can be a very complex cloud native environment. We were fortunate to join forces with Civo as they aligned with our cloud native and portable vision.

Native Nix Support in Artifactory: The Binary Cache for the Enterprise

The “works on my machine” era is officially over. Nix is changing the way we think about software by treating packages as functional, immutable values, ensuring that a build works exactly the same way every time, on every machine. But while Nix excels on a local laptop, scaling that level of reproducibility across a global enterprise has historically been a challenge.

Getting started with Windsurf and CircleCI

AI coding assistants are transforming how developers write software. Tools like Windsurf can generate entire modules, refactor complex code, and fix bugs in seconds. But speed comes with a tradeoff: AI-generated code can introduce subtle bugs, security vulnerabilities, or breaking changes that slip past even experienced developers. That’s where continuous integration comes in. CI acts as a safety net, automatically testing every change before it reaches production.

What Is VPS Hosting and Do You Need It?

Choosing website hosting feels overwhelming when you're not a tech expert. Shared hosting, VPS, dedicated servers, cloud hosting - the options multiply and the terminology gets confusing. VPS sounds complicated, but the concept is actually simple. Understanding what it is and when you need it can save you from overpaying for hosting you don't need or suffering with hosting that's too limited. Let's break down VPS hosting in plain English and help you figure out if it's right for your website.

Zero Ticket Video Series - Slow Computer Troubleshooting

“My computer is running slow.” It’s one of the most common service desk tickets. It’s also one of the most automatable. In this episode of the Zero Ticket Video Series, we demonstrate how RITA, Resolve’s AI-powered IT agent, autonomously diagnoses and resolves a slow laptop issue in real time, directly within Microsoft Teams. Watch as RITA: No ticket queues. No manual triage. No escalations.

Reimagining Artifact Management for DevSecOps: Harness Artifact Registry GA | Harness Blog

Today, Harness is announcing the General Availability of Artifact Registry, a milestone that marks more than a new product release. It represents a deliberate shift in how artifact management should work in secure software delivery. For years, teams have accepted a strange reality: you build in one system, deploy in another, and manage artifacts somewhere else entirely. CI/CD pipelines run in one place, artifacts live in a third-party registry, and security scans happen downstream.

Kubernetes Node Vs. Pod Vs. Cluster: What's The Difference?

Kubernetes is increasingly the standard for deploying, running, and maintaining cloud-native applications running in containers. Kubernetes (K8s) automates most container management tasks, empowering engineers to manage high-performing, modern applications at scale. Meanwhile, surveys from VMware and Gartner reveal that insufficient Kubernetes expertise prevents many organizations from fully adopting containerization. Understanding how Kubernetes components work removes this barrier.

Secure access at the speed of incident response

Picture this: it's 2am, your pager goes off, and you're staring at a production database that's on fire. You know exactly what's wrong. You know exactly how to fix it. But you can't touch anything because you're waiting on someone to approve your access request. Meanwhile, your customers are down, your SLAs are bleeding out, and you're refreshing Slack hoping someone in security is awake to click "approve." This is the incident response tax that too many teams pay.

Why business context is the missing link in engineering performance

Think about the last time your team shipped something impressive. It was probably on time, clean code, and had great metrics. And yet somewhere along the way, the business priorities had shifted, and what the team delivered was no longer the top priority. The work was solid, but the direction just wasn't quite right anymore. This is usually what happens when engineers are disconnected from business context.

How to Reduce Latency in Your Multicloud Environment

Learn what causes high multicloud latency, and how you can reduce it with a few simple methods – no hardware deployment required. Latency is usually one of those problems that shows up before anyone has time to go looking for it – and troubleshooting it can feel like you’re aiming for a moving target.

AI infrastructure cost optimization for scaling teams

This post is also available in German and in French. The 2026 AI landscape has shifted from "Can we build it?" to "How much will it cost to run it?" For CTOs and engineering leaders, the challenge is no longer just model performance: it is the underlying infrastructure sprawl that silently erodes margins. When AI workloads scale, they often inherit the inefficiencies of legacy cloud models: over-provisioned instances, fragmented data pipelines, and a lack of unified context.

Building quantum-safe telecom infrastructure for 5G and beyond

At MWC Barcelona 2026, coRAN Labs and Canonical are presenting a working demonstration of a cloud-native, quantum-safe telecom platform for 5G and beyond 5G networks. This is not a conceptual exercise. It is a full 5G System (5GS) deployment with post-quantum cryptography embedded across the stack – from radio access to core, from transport interfaces to orchestration and public key infrastructure (PKI).

PostgreSQL Explain Plans in AWS Aurora

I recently wrote about a project I created on AWS Aurora PostgreSQL where I'm capturing APRS data from a radio. I focused on the ease of use, getting a database, some Lambda Functions, and a few schedulers working together with a web page. It was easy. However, I'd like to focus on a slightly different area now, performance.

JFrog Takes Software Resilience to the Next Level with 99.99% Uptime SLA

Software delivery is no longer a back-office function; it’s the heartbeat of the modern enterprise. While a 99.9% uptime SLA for essential software delivery services works for many, the acceleration of software velocity has made the “three-nines” benchmark a possible liability. For high performing software organizations, and those delivering critical services, nine hours of annual downtime represents a dangerous gap in productivity and security.

Code Reviews Done Right: The Framework That Stops Bugs Before Production

Learn code review best practices from experienced developer Shashi Lo at GitKon 2025. Discover how to review pull requests effectively, give constructive feedback using the nit vs. non-nit framework, and leverage AI tools like CodeRabbit and GitHub Copilot to catch bugs humans miss. Shashi Lo shares 20+ years of code review philosophy, demonstrating real PR reviews on his Secret Santa app and showing exactly what makes thorough code review essential for shipping production-ready code.

Best SQL Server ODBC Drivers 2026

Most database problems get blamed on queries, schemas, or infrastructure, and rarely on SQL Server ODBC drivers. Fair enough, those components often break. But in many cases, the real culprit is the connection layer itself. Poor drivers can lead to security gaps, performance hits, cloud connection struggles, Unicode issues, or incompatibility across platforms and modern SQL Server versions.

Getting started with Claude Code and CircleCI

AI-powered coding tools are changing how developers work. Tools like Claude Code can write functions, refactor code, and build features through natural conversation, often faster than you could type them yourself. But speed creates its own risks. AI-generated code can contain subtle bugs, reference packages that don’t exist, or misuse APIs in ways that only surface at runtime. That’s where continuous integration comes in. CI is a safety net that lets you move fast confidently.

Getting started with Gemini and CircleCI

AI coding assistants like Gemini are changing how developers write code. They can generate entire functions, debug tricky issues, and help you move faster than ever before. But with that speed comes a new challenge: how do you make sure AI-generated code actually works? AI assistants are powerful, but they’re not perfect. They can introduce subtle bugs, miss edge cases, or generate code that breaks existing functionality. That’s where CI (continuous integration) comes in.

The path to self-healing: Re-architecting for massive scale on kubernetes

In the world of network assurance, even a few seconds of delay can result in significant business losses. In this session from Civo Navigate India, Dr. Shivananda R Poojara (Head of Cloud Business Unit, Airowire Networks) explains how his team dismantled a massive monolithic service stack and rebuilt it for a high-performance, cloud-native era in just 75 days.

How likely is a man-in-the-middle attack?

Security vendors love the man-in-the-middle attack. It’s the boogeyman of every TLS marketing page. Some shadowy figure intercepting your traffic, reading your secrets, stealing your data. A man-in-the-middle attack is when an attacker positions themselves between two parties on a network to intercept the traffic flowing between them. In the context of TLS, that means an attacker who can present a valid certificate can read everything in plaintext and proxy it on to the real server.

When AI Writes the Code, Who Keeps Production Running?

The production environment has become a minefield of code nobody really understands. Here’s what’s happening: Development teams are using Claude Code, Cursor, and GitHub Copilot to ship features at 10x their previous velocity. Product managers are ecstatic. Business stakeholders are thrilled. And somewhere in a war room at 2:17 AM, an SRE is staring at a stack trace for code that was AI-generated three weeks ago, trying to figure out why the payment service just fell over.

Database Cost Management: How To Control Rising Database Spend

According to CloudZero’s Cloud Economics Pulse, databases are often among the largest and most persistent cloud cost categories. Database costs are notoriously difficult to predict and control. Unlike stateless infrastructure that scales predictably with traffic, databases run continuously and expand behind the scenes, causing costs to rise even when usage appears stable. Because databases run continuously and expand behind the scenes, costs can rise even when usage appears stable.

From Chef to Chief Architect: Navigating the Intersection of AI and Data Security | Harness Blog

In the world of enterprise software, the transition from traditional DevOps to modern AI-driven delivery is less like a flip of a switch and more like a high-stakes kitchen. As Devan Shah, Chief Architect at IBM, puts it: the ingredients have changed from food to code, but the need for a precise, governed process remains the same.

AI SRE in Practice: Accelerating Engineer Onboarding with Contextual Expertise

Onboarding new engineers to complex Kubernetes environments is expensive. Junior engineers need to learn cluster architecture, understand organizational conventions, navigate internal documentation, and build relationships with senior team members who can answer questions. The process takes weeks or months, and during that time, senior engineers spend significant time mentoring instead of working on complex problems.

Database Partitioning: Types, Strategies, and When to Use Each

How database partitioning works in PostgreSQL and MySQL. Range, list, and hash partitioning with SQL examples and guidance on when to partition vs shard. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

AWS vs Google Cloud vs Azure for Cloud-Native and Kubernetes

Cloud adoption is no longer about “moving to the cloud.” It’s about building cloud-native platforms that are scalable, observable, automated, and Kubernetes-driven. This guide provides a deep comparison of with a focus on Kubernetes, platform engineering, DevOps, and modern workloads, aligned with standards pioneered by the Cloud Native Computing Foundation.

Database Sharding: How It Works and When You Actually Need It

How database sharding works, common strategies (hash, range, directory), shard key selection, and the operational cost of running a sharded database in production. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Expert Insight: Why Local Internet Traffic Matters More Than You Think

Imagine sending a letter to your neighbour across the street, only for it to be routed through London or even Amsterdam before landing in their letterbox. This is effectively what happens to much of Scotland's internet traffic. Despite physical proximity between users, businesses and services, digital data is frequently sent on needlessly long journeys, often leaving the country before reaching its destination. This approach is inefficient, costly and poses questions about privacy, resilience and digital sovereignty.

Platform Engineering 101: What It Is, How It Differs from SRE and DevOps, & Why It Matters for Incident Response

Platform engineering has emerged as a response to the growing complexity of modern software delivery. As organizations adopt Kubernetes, microservices, CI/CD pipelines, and infrastructure as code, they are creating dedicated teams responsible for building and operating the internal platforms that power developer workflows.

What feels different about enterprise IT operations today compared to even 3-5 years ago?

Speed isn’t the problem. Speed without shared visibility is. AI compressed release cycles, multiplied dependencies, and pushed accountability to teams who no longer own the full stack. The result? Faster change. Slower resolution. Higher risk. This is why MTTR is moving the wrong way...and why observability has to evolve. : Amit Rathi.

Rancher Live: Konveyor's Cloud Native Modernisation Blueprint

Join Divya Mohan as she hosts Savitha Raghunathan, Konveyor maintainer & Red Hat Senior Software Engineer to learn more about the CNCF Sandbox project, Konveyor. Dive into some of the open source strategies for legacy app migration to Kubernetes using the 6 Rs: Rehost, Replatform, Refactor & learn blueprint tools for analysis, containerization & AI-powered refactoring.

Kubernetes Namespaces: What They Are, How They Work, And What They Don't Solve

Using Kubernetes to manage containerized applications has its fair share of challenges. One of those challenges is managing complexity. Using namespaces can help minimize that complexity. Yet, a common misconception is that using multiple namespaces in a single Kubernetes cluster can degrade performance. Another issue: Kubernetes namespaces can reduce visibility into costs. There’s more to it than that.

Anbox Cloud 1.29.0: what's new?

In this video, the Anbox team covers new features and changes in their latest 1.29.0 release: What is Anbox Cloud? Anbox Cloud lets you run virtualized Android environments securely, at any scale, to any device letting you focus on your use case. Run Android in system containers, not emulators, on AWS, OCI, Azure, GCP or your private cloud with ultra low streaming latency. Tags: Trademark notice Android is a trademark of Google LLC. Anbox Cloud uses assets available through the Android Open Source Project.

Release v2.9: OTEL Logs, Database Functions, SNMP Functions and more.

What’s New in Netdata v2.9 In this video, we walk through the biggest updates in Netdata v2.9, including: Top Tab Database Functions to analyze slow queries and performance bottlenecks without logging into your database SNMP Network Interfaces Function for real-time visibility into network interfaces Microsoft SQL Server Collector with richer MSSQL metrics OpenTelemetry Logs Ingestion to correlate logs and metrics in one place.

AI-Driven Automated Testing for Oracle Applications

As enterprises continue to change rapidly, businesses depend on Oracle-based ecosystems to track their finances, supply chains, HR, and customer operations. With the increase of digital transformation in companies, these environments continue to become more complex. As a result, manual testing is no longer enough for maintaining pace with ongoing updates, integrations and customizations that occur within an organization's systems. This is where AI-powered automated testing for Oracle applications revolutionizes how quality assurance is approached.

CloudZero's FinOps Cost-Per-Unit Glossary

This glossary is a bookmarkable reference for cost-per-unit metrics in FinOps unit economics. It’s designed for engineering, finance, and FinOps teams that need a shared language for understanding how cloud costs behave as usage, customers, and products scale. The terms are organized by category and include real-world context.

Open Source Liquibase MongoDB Native Executor by Harness | Harness Blog

Harness is strengthening the open Liquibase ecosystem by introducing a native MongoDB executor that removes long-standing limitations for Community Edition users. It enables teams to run MongoDB scripts, generate changelogs, and integrate database workflows into CI/CD without relying on paid extensions. The initiative reinforces open collaboration while making MongoDB-based database DevOps more accessible, consistent, and production-ready.

Don't Panic: A Low-Risk Strategy for Ingress NGINX Retirement

The Ingress NGINX project is winding down. For many organizations, this means planning a migration for critical infrastructure. While the HAProxy Kubernetes Ingress Controller is the natural successor for these workloads, a "rip and replace" strategy isn’t always viable. You might have complex configurations, customized annotations, or deployment freezes that make a sudden switch risky. There's a lower-risk path: Place HAProxy in front of your existing Ingress NGINX deployment.

AI Merge Conflict Resolution + Commit Messages in GitKraken Desktop

AI-assisted merge conflict resolution is changing how developers handle Git workflows. Watch GitKraken Ambassador Kevin Bost demonstrate AI-powered features that eliminate merge conflict dread, clean up messy commit history, and generate contextual commit messages in seconds.

Simultaneous multi-cloud deployment to AWS and GCP with CircleCI

AWS recently experienced a significant outage. The outage took down major services, including parts of McDonald’s mobile ordering system, some Netflix features, and many other applications that relied solely on AWS infrastructure. This event perfectly illustrates why relying on just one cloud platform can be risky.

Self-Driving Data Highways: Realizing the Strategic Advantages of Autonomous IP Optical Networks at OFC26

In the telecom industry, leaders are under pressure to deliver more—more capacity, more agility, more reliability—while managing rising complexity with fewer resources. The network is the circulatory system of the modern telco, yet it’s still often operated like a patchwork of manual roads, each requiring constant human intervention. This model worked when traffic was predictable and growth was linear.

The Need for Clean in the AI Era

In the AI era, software and new models are being born at a breakneck pace—but they’re also bringing a lot of “baggage” into the world. While AI coding agents are busy accelerating innovation, they’re also excellent at generating a massive byproduct: “digital dust.” Between obsolete releases, orphaned dependencies, and massive model versions, your repository may soon start to look more like a digital junk drawer than a streamlined machine.

Getting started with Amazon Q Developer and CircleCI

AI coding assistants like Amazon Q Developer are transforming how you write software. They can generate entire functions, explain complex code, and help you move faster than ever. But there’s a catch: AI-generated code isn’t always correct. It can introduce subtle bugs, security vulnerabilities, or break existing functionality in ways that aren’t immediately obvious. That’s where continuous integration comes in.

The AI infrastructure gap: why agents fail on fragmented stacks

The initial hype of AI agents is hitting a hard reality: a clever prompt is not a production strategy. As organizations move from experimentation to operationalizing AI in 2026, a systemic bottleneck has emerged: It is not the model's intelligence; it is the model’s context and its access to the right tools. When an AI agent lacks access to live, grounded platform data, it guesses.

Database Security Failures Don't Start in Security Teams

When a database security incident happens, everyone turns to the security team. We look for a simple root cause analysis, and then we add a control, tighten a policy, and maybe even buy a silver bullet tool. We feel progress! But the incident didn’t start there. It started years earlier, when the organization made a series of perfectly reasonable decisions that quietly expanded the surface area and weakened the consistency of control.

2026 - Redgate Flyway - Starting strong with Oracle

Deploying changes to Oracle databases can be complex from working across multiple schemas, handling dependencies, and accounting for environment differences. Flyway has been helping teams bring order and automation to Oracle development for over 15 years and in 2026 we’re accelerating that investment even further. Here’s a look at the latest enhancements available today and what’s coming next for Oracle users.

AWS EC2 Vs. Azure VMs Vs. GCE: Understanding The Real Cost Of Cloud VMs

AWS EC2, Azure Virtual Machines, and Google Compute Engine (GCE) appear similar on paper but produce different bills due to how each provider prices capacity, discounts, idle time, and commitment terms. The same VM configuration can cost 20-40% more or less depending on which cloud you choose and how your workload runs. On paper, all three offer similar virtual machines. In reality, they price capacity, discounts, and idle time very differently.

5 key takeaways from the 2026 State of Software Delivery

AI has made it easier than ever to write code. Shipping it is a different story. Today we released the 2026 State of Software Delivery report, sponsored by Thoughtworks. In it, we analyzed more than 28 million CI/CD workflows across thousands of engineering teams. The picture that emerged is clear: teams are producing more code than ever, but fewer of them are able to turn that activity into software that actually reaches customers.

Build and test your first Kubernetes operator with Go, Kubebuilder, and CircleCI

Kubernetes operators extend the Kubernetes API with custom logic, automating tasks like provisioning, configuration, and policy enforcement. Instead of managing these tasks manually or with ad hoc scripts, Operators codify your workflows into controllers that run natively inside the cluster. In this tutorial, you’ll build a simple operator using Go and Kubebuilder; a framework that scaffolds much of the boilerplate so you can focus on core logic.

GPU-as-a-Service: The network's critical role in accelerated computing

The explosion of AI has created a continuous demand for computing power. At the heart of this need sits one critical resource: GPUs. They have become the hardware of choice for AI and machine learning, particularly deep learning workloads that operate on enormous data sets. However, as organisations race to train larger models and deliver faster inference, many are discovering that owning and operating GPU infrastructure isn’t always practical.

Predict, compare, and reduce costs with our S3 cost calculator

Previously I have written about how useful public cloud storage can be when starting a new project without knowing how much data you will need to store. However, as datasets grow over time, the costs of public cloud storage can become overwhelming. This is where an on premise, or co-located, self-hosted storage system becomes advantageous: it provides the greatest range of benefits, including cost, performance, security, and data sovereignty.

The 9 Essential NOC Metrics to Master for Operational Excellence

In today's fastpaced IT landscape, modern Network Operations Centers (NOCs) are the backbone of reliable infrastructure for businesses of all sizes. For MSPs, leveraging managed NOC services can dramatically improve uptime, security, and overall client satisfaction. The global NOC as a Service market is projected to grow from about $3.7 billion in 2025 to over $9 billion by 2034, underscoring rising demand for expert, alwayson network oversight.

Cloud Hosting for Crypto: Flexible, Scalable, and Battle-Tested Solutions for the Future

Crypto cloud hosting revolutionizes how projects handle unpredictable market surges, offering elastic pay-per-use scaling, serverless functions triggered by on-chain events, and hybrid public/private cloud architectures that meet strict compliance needs. hosting-bitcoin.com operates Kubernetes 1.30 clusters enhanced with Cilium eBPF networking and Multus CNI for multi-homed pods, effortlessly supporting Solana validators processing 65k transactions per second (TPS) alongside Ethereum Layer-2 sequencers delivering sub-block finality.

How VETRO's Network Inventory Management Software Solves Data Chaos for Fiber Providers

Fiber network operators often grapple with fragmented data, leading to inefficiencies and operational challenges. In many organizations, critical information is scattered across spreadsheets, legacy systems, and proprietary databases. These silos make it difficult for teams to access the latest network information, leading to outdated records, miscommunication between departments, and delays in service delivery. Moreover, reliance on manual data entry and inconsistent data formats can introduce errors, further complicating the maintenance of data accuracy and integrity.

What is Infrastructure as Code (IaC)? Best Practices, Tools, Examples & Why Every Organization Should Be Using It

Infrastructure as code (IaC) is the act of writing infrastructure configurations as code so they can be understood, repeated, and enforced with less manual effort. IaC is also a powerful way to convert institutional knowledge into technical knowledge. It’s a far-reaching and essential part of managing infrastructure at scale, with benefits that have expanded to platform engineering, security and compliance, network administration, and so much more.

NVIDIA Rubin (R100) vs. NVIDIA Blackwell (B200) GPU

Since 1999, when NVIDIA invented the GPU (graphics processing unit), the demand has “skyrocketed”. At CES 2026, CEO Jensen Huang announced their latest GPU, named after Vera Rubin. This follows on from the announcement of their Blackwell lineup only two years ago. Through this blog, we’ll explore what the industry knows about the Vera Rubin so far. Plus, we will take a look at some specs in comparison to the NVIDIA B200 from the Blackwell lineup.

Breaking up with backstage: Why "free" open source isn't always free

We’ve all had that moment where it seems like you've solved your company's biggest engineering challenges after a weekend of hacking something together. Your prototype is so good, you feel, that the obvious next steps are to build a slide deck, rally the team around your work, and prepare the ticker tape parade for your hero's welcome. Jeff Schnitter, a Solution Architect at Cortex, knows this roller coaster of experience all too well after his time at Workday.

A year of documentation-driven development

For many software teams, documentation is written after features are built and design decisions have already been made. When that happens, questions about how a feature is understood or used often don’t surface until much later. A little over one year ago, our team began to recognize this pattern in our own work. Features generally functioned as intended but were difficult to use or explain. Documentation lagged behind releases.

Why Your AI Code is Breaking (And How to Fix It) #speedscale #aicoding #aiagents #code #devops

New data from CodeRabbit shows AI makes 70% more errors than humans—mostly in logic. Stop shipping "AI Vibes" to production. Use the new Testing Pyramid: Deterministic (Validation) Record & Replay (Mocking) Probabilistic (Vibes) Don't let your agents break prod.

MCP: Why AI Needs Git Intelligence

GitKraken CTO Eric Amodio breaks down the Model Context Protocol (MCP) and explains why Git intelligence is critical for AI agents at GitKon 2025. In this session, Eric covers: What MCP is and why every major AI company adopted it Why AI needs Git history, not just file system access How GitKraken MCP removes Git pain safely The future of agentic developer workflows How Commit Composer uses AI to organize commits without losing data.

AWS Data Exchange Guide: Use Cases, Pros, Cons, And Pricing

Third-party data now drives forecasting, analytics, and machine learning across modern cloud teams. But acquiring it has long meant custom contracts, delayed access, and limited visibility into how data costs scale inside analytics workflows. AWS Data Exchange reduces much of that friction by integrating third-party data into the AWS ecosystem.

The 6 Best Performance Testing Tools Guide

In software development, load testing plays a critical role in ensuring that applications perform optimally under any imaginable load condition. To do this, developers subject applications to several types of load tests, including scalability, spike, endurance, and stress testing. The ultimate goal of these performance tests is to pinpoint potential bottlenecks and ensure the reliability of the overall system where the software application runs before reaching production.

A Technical Guide to Controls Engineering

The modern world runs on mission-critical software. It moves our money, drives our cars, diagnoses our illnesses, and fundamentally improves our lives. But, organizations building this critical software face a paradox: they need to move fast to stay competitive, but they also need rigorous governance to manage risk. This has created a lot of tension in regulated industries.

Install and Activate dotConnect in Visual Studio & Any IDE

Learn how to install and activate dotConnect products step by step. In this video, we show all available installation options—including NuGet Package Manager, command-line installation in any IDE, and the Windows installer for full Visual Studio integration. You’ll also see how to activate dotConnect using a license key, including how to get a free 30-day trial key from the Devart Customer Portal. This guide helps you.

Code Is Cheap, Reliability Isn't: Owning Production in the AI era w/ Swizec Teller

In this episode, Swizec Teller, author of the bestselling Scaling Fast, makes a bold claim: code is cheap, reliability is not. As AI coding tools accelerate feature development, the real competitive advantage shifts to operating systems reliably in production. We explore the hidden complexity of SRE work, the addictive nature of agentic coding, and why ownership — not automation — remains at the core of modern software engineering.

Introducing Konstruct: GitOps-powered IDP in minutes

"I wish I knew about this a couple years ago..." Over my seven years as a cofounder, I've heard some version of that line more than any other. Usually, it comes at the end of a demo to someone who has spent a year getting to something not even close to what they're seeing on my screen. The story is always the same. An organization adopts Kubernetes and arrives at the inevitable conclusion that they need a platform.

BygoneSSL happened to us

A few months ago I wrote about BygoneSSL and the 1.5 million domains with valid certificates owned by someone else. Domains change hands but certificates don’t know. The old owner keeps their private key, and the certificate keeps working. It’s an industry problem, but it turns out it’s our problem too. We purchased certkit.dev for internal development and demos.

AI SRE in Practice: Diagnosing AWS CNI IP Exhaustion Before Widespread Outage

IP address exhaustion in Kubernetes doesn’t announce itself with clear error messages. Pods fail to schedule, services degrade unpredictably, and the symptoms look like a dozen different problems before anyone realizes the cluster has run out of available IP addresses. By the time the root cause becomes clear, multiple services are affected and recovery requires coordination across infrastructure layers.

How to eliminate DevOps toil in regulated SaaS

In regulated industries like fintech, healthcare, and government, DevOps teams often find themselves acting as human compliance gateways. The pressure to maintain strict security standards while accelerating release cycles creates a compliance tax: a heavy burden of manual environment setups, security review tickets, and the inevitable scramble for evidence before an audit. This manual labor, or toil, is more than a drain on productivity. It creates a dangerous gap between policy and actual operations.

Introducing Megaport High-Speed Cross-Cloud Encryption

Secure cloud traffic at line rate, without slowing your workloads down, thanks to Megaport’s new encryption solution. Securing data in motion is often a trade-off between performance and privacy, but most traditional encryption models struggle to find this balance – especially when you’re moving large volumes of data between clouds or across regions.

Control your dependencies in Flyway Desktop for Oracle and SQL Server Databases

This is a guest post from Stephanie Herr. One of Flyway’s biggest strengths is its ability to track your database schema as individual SQL DDL scripts on disk. This gives you full version control over every object, along with a complete audit trail of what changed, who changed it, when it changed, and why. For teams working with Oracle and/or SQL Server, this level of transparency is essential and this latest release provides even more support for how you handle changes across dependent objects.

Redgate Test Data Manager Product Updates - January 2025

This is a guest post from James Hemson. The January release of Redgate Test Data Manager brings the launch of our free trial, a completely redesigned installation experience, and powerful new workflow capabilities. These updates make it faster to get started and easier to automate your data provisioning pipelines.

Compliant Test Data Used to Be Hard. It Isn't Anymore.

This is a guest post from Saskia Parks. If you're exploring test data management (TDM) solutions, you probably know your current practices aren't ideal, but you're skeptical investing in a solution is worth the budget and effort. We hear the same concerns. The perception is that proper TDM is expensive, complicated, and takes months of painful implementation.

AWS Elastic Beanstalk 101: A Beginner's Guide To App Deployment On AWS

Imagine you want to launch an application without first building and managing the servers that run it. You write the code, pick how it should run, and then let a platform take care of the rest. That’s the core promise of AWS Elastic Beanstalk. In this snackable guide, you’ll understand AWS Elastic Beanstalk well enough to decide if it belongs in your AWS architecture.

AI for nuclear safety: Predicting component remaining useful life

As industrial systems become more complex in 2026, the reliability of critical infrastructure depends on shifting from reactive to predictive strategies. In this session from Civo Navigate India, Muthukumar Ganesan, a scientist at the Indira Gandhi Centre for Atomic Research (IGCAR), explores the application of AI and machine learning in securing the future of nuclear energy.

Scaling Infrastructure Teams: The Increasing Need for Rust Engineers

The Infrastructure teams have had to continuously improve current systems to make them faster, safer, and more reliable. With the growth of cloud services, the complexity of applications, and the demand for low-latency processing, engineering teams still need the best tools and languages to build these systems. The traditional languages that have been used for decades to power system-level development, i.e., C, C++, and Java, have long been the standard. But as software system complexity becomes unsustainable, the factors that limit safety, memory management, and concurrency are becoming increasingly obvious.

Oracle JDK to OpenJDK: A Guide to Reliable Migration Testing

One of the most common infrastructure changes Java developers and operators are dealing with today is the migration from Oracle Java to OpenJDK. The reason is the licensing changes made by Oracle and the maturity of the OpenJDK distributions. The migration process is quite simple: replace the JDK, recompile the code, and redeploy the application. However, the differences between the two runtimes can lead to unexpected issues that are not caught by unit tests.

Speedscale Named in Gartner Market Guide for API Testing

In the highly dynamic environment of modern engineering, an appropriate strategy for API quality is more important than ever. We are pleased to announce that Speedscale has been named in the latest “Market Guide for API and MCP Testing Tools” report from Gartner. As software development is shifting towards complex distributed architectures, integrating Model Context Protocol (MCP) for AI-based workflows, the need for realistic testing has never been higher.

dotConnect for Dynamics 365 | Secure C# ADO.NET connection with ORM support

Integrate Microsoft Dynamics 365 into your C# .NET applications with dotConnect for Dynamics 365—a powerful, high-performance data provider built for secure enterprise connectivity. From workflow automation to CRM data synchronization, dotConnect enables reliable access with seamless Visual Studio integration and advanced ORM support for modern development workflows. Chapters: Try it free for 30 days and see why thousands of developers trust dotConnect for Dynamics 365 in mission-critical environments.

#053 - The Road to Distributed AI and Kubernetes Infrastructure with Matt Butcher (Fermyon) & Ari...

They share their professional origins, highlighting how Kubernetes transitioned from a complex tool for experts to a foundational technology for global enterprises.. Part of the conversation focuses on the history of Helm, explaining its growth from a simple hackathon project into a standard package manager. Another part takes on the future of distributed computing, specifically how Akamai is integrating infrastructure as a service to support modern workloads.

Redgate Flyway, Oracle, and the Case of the Lowercase Schema

If you’ve spent any time working with Oracle databases, you’ve probably internalized a few expectations so deeply that you don’t even think about them anymore. One of the biggest is this: Database users, tables, views, and metadata are in UPPERCASE. So, when you query the database directly, no matter if using SQL*Plus, SQLcl,SQL Developer, or any other tool, you’re going to format your query in a common way.

Why Security and Stability Matter in Infrastructure Management

In the high-stakes world of modern infrastructure management, "move fast but break things" is not a viable strategy. As organizations scale their digital footprints, the competing demands of velocity and vulnerability have created a new operational reality. Today, the integrity of your infrastructure is synonymous with the integrity of your business. For system administrators and DevOps engineers, the landscape has shifted. It is not enough to simply provision servers and deploy applications.

4 on-call burnout signs (and how to address them)

Being on-call can sometimes feel overwhelming. If that feeling goes unnoticed for too long, it often translates into burnout. And early burnout signs usually show up in ways, like how people respond to incidents or how they feel about the schedule. This guide walks through four such signs that can be useful to watch for before on-call burnout sets in.

The foundations of software: open source libraries and their maintainers

Open source libraries are repositories of code that developers can use and, depending on the license, contribute to, modify, and redistribute. Open source libraries are usually developed on a platform like GitHub, and distributed using package registries like PyPI for Python and npm for JavaScript. These repositories contain pre-written, re-usable code that developers use to add elements or features within their software projects.

Environment support in Terraform Provider for Kosli - v0.2.0

We’re excited to announce support of physical environments in the Terraform Provider for Kosli! What’s Included Environment Management: Full lifecycle support for creating, updating, and managing physical environments types: K8S, ECS, S3, docker, server, and lambda. Manage legacy environments as IaC: Import your existing physical environments to have Terraform manage them.

How To Design AI-Native SaaS Architecture That Scales Without Killing Your Margins

AI-native SaaS products aren’t failing because the models are bad. They’re failing because the architecture can’t keep up with how AI actually behaves in production. What looks affordable in staging can erode your margins once real customers, workflows, and automation come into play. Designing AI-native SaaS architecture is now as much a margin decision as it is a technical one.

Rovo Dev Code Review in Bitbucket and GitHub | Bitbucket Blitz | Atlassian

The demo portion of a recent webinar I did shows how to setup, configure, and use Rovo Dev code review in both Bitbucket and GitHub. Learn how to add custom coding standards to your repositories and see Rovo Dev check for the specific things you care about during code reviews. Learn how to add acceptance criteria to your Jira work items and see Rovo Dev verify them during code reviews.

Syslog Checks: How to find Insights in the Data Flood

Every SysAdmin knows the feeling. They are swimming in logs—terabytes of them. Every daemon, service, and kernel subsystem religiously writing their activities to syslog. The data exists. The signals are there. Yet, somehow, incidents still are still unpredictable. How is this even possible? Here's why this happens: Traditional syslog infrastructure was designed for storage and retrieval, not detection and response.

Why Your Company Will Be Running OpenClaw Next Year

You’ve probably heard of OpenClaw. Maybe you’ve seen the demos where an AI agent opens a browser, navigates to your CRM, fills in a form, and files a support ticket. No API required. Maybe you thought “that’s cool but I’d never run that at work.” Your employees already are. According to Permiso’s research, 22% of enterprise customers have employees running OpenClaw without IT approval.

How AI Coding Is Breaking Synthetic Data Generation

Traditional synthetic data generation approaches, still called “Test Data Management” (TDM) by legacy vendor, were designed for a world where applications were monolithic, databases were the center of gravity and change happened slowly. The world looks a lot different now. Modern systems are distributed, often times event-driven, and increasingly powered by streaming data and AI agents. In this environment, batch-oriented synthetic data generation fails to capture how systems actually behave.

DLP, Traffic Replay, and the Missing Link to Software Quality

In Part 1 and Part 2 we explored why testing modern software is so difficult. Production data is the most valuable input for testing, but it’s locked away because it contains PII and sensitive context. Traditional Synthetic Data Generation (SDG) was built for batch databases, not streaming systems. And AI coding agents amplify every weakness in existing test strategies because they need current, realistic data or they generate buggy code based on outdated assumptions.

How to Generate a New Puppet Module with VS Code and GitHub Copilot

Revolutionize your infrastructure by leveraging AI tooling in the Puppet ecosystem. In this technical demonstration, we explore how to significantly reduce the time required to create new Puppet modules using Visual Studio Code, GitHub Copilot, and the Puppet Model Context Protocol (MCP) server.

Integrating DCIM and ServiceNow: 4 Customer Success Stories

Managing assets, tickets, and workflows across multiple data center sites can be complex and time-consuming. When IT service management (ITSM) and DCIM tools operate in separate silos, teams often face incomplete information, duplicated effort, and limited visibility into the physical infrastructure. Integrating Data Center Infrastructure Management (DCIM) software with ITSM platforms like ServiceNow ensures that asset, configuration, and ticket data remain aligned across systems.

What Are the Benefits of Integrating DCIM with Your Existing Tools?

Modern data centers rely on a growing number of specialized tools—CMDBs, ITSM platforms, network and server management systems, virtualization platforms, and more. Each solves a specific operational problem, but when these systems operate in isolation, teams face inconsistent data, manual updates, and slower decision-making. Integrating Data Center Infrastructure Management (DCIM) software with your existing tools solves these challenges by consolidating information into a single pane of glass.

Surging AI Costs Are Eroding Business Efficiency: New CloudZero Report

What do 475 senior leaders across software, financial services, cybersecurity, and other industries all have in common? They have little to no idea whether their AI investments are paying off. CloudZero just released FinOps in the AI Era: A Critical Recalibration, a report assessing the state of cloud and AI spending. Culled from hundreds of responses from people directly accountable for cloud spending, the report shows that while FinOps maturity is accelerating, cloud efficiency is plummeting.

FinOps Maturity Has Never Been Higher. So Why Is Cloud Efficiency Plummeting?

Whoever thought we’d see the day when cloud cost management (CCM) seemed easy? CloudZero just released FinOps In The AI Era: A Critical Recalibration, an annual report on the state of cloud and AI costs. The report surfaced what looks like a paradox: FinOps maturity is accelerating, but organizational cloud efficiency is plummeting. 72% of organizations now have formal cloud cost management (CCM) programs. That’s nearly double what we saw in our last survey (39%).

The AI-nigma: FinOps Is Maturing - So Why Is Cloud Efficiency Falling?

Q: What do you call it when FinOps maturity surges but cloud efficiency plummets? A: An AI-nigma. I don’t claim to be a comedian. But I do claim to be Fred FinOps, so the paradoxical findings from CloudZero’s new report titled FinOps in the AI Era: A Critical Recalibration, created in partnership with B2B SaaS benchmarking firm Benchmarkit, had me scratching my head. The good news: These numbers tell a story of cloud cost maturity and control. But then there’s the bad news.

Sustainable AI Investment: A Systems Thinking Approach

According to our new report, FinOps in the AI Era: A Critical Recalibration, 40% of companies now spend $10M or more annually on AI. Most can’t tell you if it’s working. That’s not a budgeting problem. It’s a systems problem. And Donella Meadows wrote the playbook for understanding it.

The PaaS Graveyard: Why Platforms Keep Dying and Developers Keep Migrating

I've been in this industry since before the word "PaaS" existed. I founded Cloud 66 in 2012 — the same year Heroku was peaking, dotCloud was pivoting to become Docker, and the idea of "just git push and forget about servers" felt like the future. It was the future. Partly. The deployment experience was revolutionary. The business model wasn't. Last week, Heroku announced its transition to "sustaining mode" — no new features, no new enterprise contracts.

Heroku Moves to Sustaining Mode: What It Means and What You Can Do About It

Last week, Heroku announced it is transitioning to a "sustaining engineering model." In plain English: no new features, no new enterprise contracts for new customers, and Salesforce is redirecting its investment elsewhere. The platform will be maintained for security and stability, but that's it. If you've been in this industry long enough, you know what "sustaining mode" means.

How frictionless development created a trillion dollar mistake

We've all heard from an engineering leader about the exact moment they realized their architecture had gotten too complex. It usually happens when they look at a service map and realize it looks like a box of tangled Christmas lights. This cognitive overload is exactly what Steve Evans, the former SVP of engineering at Chegg, reflected on in a recent post on LinkedIn. He argued that microservices were a trillion dollar mistake because we often over-build for future problems that never actually arrive.

Migration blueprint for moving your application without rewriting

The decision to migrate a production application is rarely about the destination. It is about the friction of the journey. For most engineering leaders, the word "migration" is a synonym for "refactor." The industry has conditioned us to assume that moving to a modern cloud platform requires throwing away years of stable configuration, learning a new proprietary DSL, and rewriting core application logic to fit a specific container or serverless model.

Why Upsun is the multi-cloud PaaS technical leaders are choosing in 2026

In a recent technical evaluation by Journal du Net (JDN), Upsun (formerly Platform.sh) was recognized for its ability to "pull ahead" (tire son épingle du jeu) in a fiercely competitive market dominated by cloud giants and specialized pure players. While hyperscalers offer raw power, Upsun’s strategic fusion of enterprise reliability and AI-ready agility has redefined expectations for modern PaaS.

5 Offbeat on-call rotations that work

Most teams choose standard on-call patterns like weekly or daily rotations. But sometimes a less conventional rotation can solve a specific problem or just fit better with how your team works. This guide walks you through five offbeat on-call rotations. For each, we look at why it might work for you and the challenges involved. This helps you see the full picture before you decide to try them out. Let’s dive in!

Follow-the-sun and other on-call models

Most teams run on-call using rotation-based schedules where responsibility shifts every few days or weeks. But some situations call for different models that change who responds based on time zones, expertise, or the type of incident that triggers. This guide walks you through six on-call models that work outside the standard rotation patterns.

6 Underused Git Commands That Save Time

Git is full of underused powers that most developers never discover. In this GitKon 2025 session, GitKraken Senior Product Manager Jonathan Silva reveals 6 lesser-known Git commands that solve real workflow pain points, from recovering lost commits to managing stashes strategically. Learn how to undo commits without losing work, recover deleted branches with git reflog, cherry-pick without immediate commits, target specific stashes, see contributor breakdowns, and more. Jonathan also demonstrates how GitKraken Desktop makes these workflows visual and intuitive.

Silent Failures: Why AI Code Breaks in Production

You ship a small “safe” change on Friday. The diff is tiny, the tests are green, and the AI assistant was confident. An hour after deploy, your on-call channel lights up. A downstream service is rejecting responses that look fine in code review. Now you’re rolling back and rewriting a fix that should have been obvious if you had real traffic in the loop. This isn’t a hypothetical.
Sponsored Post

Kubernetes Load Testing Made Easy with Speedscale

Everybody knows working with Kubernetes is really hard. It's highly complicated. You have to know how to work with YAMLs, there's lots of stuff to deal with. The classic developer experience with YAML. But what if you could get complete visibility into your Kubernetes workloads and run realistic load tests without touching a single YAML file or running kubectl commands? In this walkthrough, I'll show you how Speedscale makes Kubernetes observability and performance testing as simple as point-and-click.

Top Kubernetes interview questions of 2026: A beginners guide

Having been around for a decade, the world's most popular container orchestrator has set a standard for how we run containers at scale. According to the CNCF, cloud-native adoption has reached 98% across organizations, showing that Kubernetes adoption is not slowing down. Whether you are looking to land your first kubernetes role or you are experienced and are looking to brush up on your knowledge, we’ve put together the top questions to learn more about Kubernetes.

Perspectives from the Edge: Data Sovereignty with KPMG

Data sovereignty isn’t a checkbox – it’s now a board-level priority. Data sovereignty is everywhere right now, but for many organisations, it still feels abstract. In this first episode of Perspectives from the Edge, Assad Noori, Head of Digital Infrastructure Advisory for the UK at KPMG, speaks with Pulsant's Wendy Shearer, about why sovereignty has become a board-level issue, how AI and hybrid infrastructure are reshaping long-held assumptions, and why decisions about where data lives, moves, and is accessed now carry far wider implications than most organisations expect.

Why Monitoring Matters for Modern Hosting Platforms

With all the discussion in the dev community lately about changes made at Heroku, we wanted to use this moment to talk about PaaS (Platform as a Service) providers and how AppSignal can be a vital tool to ensure you're using your app's hosts for everything from optimal performance to lower usage bills.

What is RDMA?

Modern data centres are hitting a wall that faster CPUs alone cannot fix. As workloads scale out and latency budgets shrink, the impact of moving data between servers is starting to become the most significant factor in overall performance. Remote Direct Memory Access, or RDMA, is one of the technologies reshaping how that data moves, and it forces a rethink of some long-held assumptions in data centre networking. This article is the first in a short series.

Why my Azure bill keeps spiking (and how to fix it)

Noticed a sudden spike in your Azure bill? Unexpected Azure cost increases are often caused by hidden usage, overprovisioned resources, scaling changes, or limited cost visibility. In this video, we explain why Azure costs spike, how to identify Azure cost anomalies early, and what steps you can take to prevent budget surprises. Take control of your Azure spend with smarter, proactive cost management.

What "Open Source" actually means in 2026

What does "Open Source" really mean in the age of AI? In the conclusion of her session at Civo Navigate India, OpenUK CEO Amanda Brock shares a fundamental truth for the global tech community. True openness is not about being local; it's about global collaboration and ensuring that technology is accessible for any purpose, without friction. As we build the next decade of innovation, the goal is to build better, together, across the planet.

Top Continuous Integration Metrics Every Platform Engineering Leader Should Track | Harness Blog

Track build duration, queue time, success rate, and cost per build to directly improve developer productivity, control costs, and enhance delivery reliability. Standardize pipeline metadata and automate metric collection to turn raw CI data into actionable insights across teams, services, and cost centers. Pair metrics with intelligent caching, optimized testing, and build acceleration to reduce build times and operational costs while maintaining security standards.

Unit Testing in CI/CD: How to Accelerate Builds Without Sacrificing Quality | Harness Blog

Smart test selection, parallel test runs, and intelligent caching can all speed up builds without sacrificing code quality. Fast, focused, and separate unit tests are very important for quick development. They give you feedback right away and make it easier to refactor with confidence. Unit tests are a quick and cheap way to find logic errors, but they can't check how different parts work together. For full coverage, use them with integration tests and end-to-end tests.

When ConfigMaps Hit Limits: Migrating to CRDs

Over the past few years, Kubex has evolved from a cloud optimization product into a Kubernetes-centric solution, shifting its focus from cost and waste visibility to fully automated resource optimization. As that evolution happened, one of the earliest design decisions we had made began to show its limits: how the product was configured.

Meet the Maintainers: Ned Batchelder (coverage.py) - the story behind Python code coverage

Ned Batchelder (nedbat), creator and long-time maintainer of coverage.py, joins Push to Talk | Meet the Maintainers to share his path into programming and open source and the real story behind one of Python’s most popular testing tools. We talk about the journey to coverage.py, the turning points that shaped it, and why the measurement of the library is only 94%. What's inside? Surprise us: how are you using coverage.py?

Resolve's Agents of IT podcast - Ep. 12 - Bob Strong, AVP of Service Operations at Assurant

What does it really take to move from reactive IT firefighting to intelligent, AI-driven operations? In this episode of Agents of IT, Resolve CCO Sean Heuer and COO Ari Stowe sit down with Bob Armstrong, AVP of Service Operations at Assurant, to unpack what happens when a 25-year service operations leader embraces AI, automation, and agentic workflows.

dotConnect Update: .NET 10 Support, OAuth Integration & Faster Batch Updates

The latest dotConnect release is here, bringing major improvements in performance, security, and cloud connectivity for.NET developers. In this video, we walk through the most important updates: This release is designed for developers who rely on high-performance, secure, and dependable data connectivity in modern.NET environments.

Hyperview DCIM 5.4 Software Release

Hyperview 5.4 brings a host of powerful updates designed to give users more control, better insights, and a smoother experience across the platform. The release introduces per-sensor access control, allowing teams to restrict visibility to specific sensors for tighter data governance. New location dashboard widgets provide at-a-glance insights into rack power rankings, facility power usage, average temperature, and humidity over time. BACnet/IP monitoring has been upgraded to support more complex network topologies, while search functionality has been improved with exact/fuzzy match toggles and advanced filtering options.

Cortex and Semgrep partner to strengthen application security and drive continuous improvement

At Cortex, our mission is to help engineering organizations deliver reliable, secure, efficient software, faster. With Cortex, teams can standardize against best practices and create a culture of continuous improvement to achieve this. Today, we’re excited to announce a formalized partnership with Semgrep, a leader in modern static analysis and code security.

What is sovereignty washing? When cloud control is more marketing than reality

In 2025, European Commission President Ursula von der Leyen announced plans for an EU Cloud and AI Development Act, prioritizing digital sovereignty amidst growing concerns over data security and privacy. These concerns have been fueled by Edward Snowden's 2013 revelations about US surveillance and further intensified by the Trump administration's actions and rhetoric, including its criticism of EU digital regulations and threats to US tech companies.

Your Cloud Economics Pulse For February 2026

Welcome to February’s Cloud Economics Pulse, CloudZero’s monthly look at cloud spend as AI moves from experiment to expectation. Last month, we closed out 2025 with a settling: provider shares locked in, compute softened, and AI claimed more of the mix (big surprise there). January confirmed those patterns weren’t year-end hustle and bustle. They signify a new baseline. Also, the Big Three (AWS, GCP, Azure) barely moved. They’re as entrenched as can be.

Kubernetes Vs. OpenStack: How They Differ, How They Work Together, And When To Use Each

Kubernetes and OpenStack are not competitors. They operate at different layers of the stack and are often used together. OpenStack manages cloud infrastructure such as compute, storage, and networking. Kubernetes runs on top of that infrastructure to deploy, scale, and manage containerized applications. Teams often compare them as alternatives, but in practice, Kubernetes frequently runs on OpenStack.

Are Businesses Leaving the Cloud?

Learn the truth about cloud repatriation, the motivations behind it, and whether it’s really happening as much as you think. For years, the cloud has been the default solution for businesses wanting speed of deployment with quick and easy scalability. And while the cloud promises endless resources at your fingertips, a lot of network teams are having the conversation about whether to pull their workloads back out of the public cloud and run them on their own hardware or private cloud again.

We Measured AI Impact for 12 Months. Here's What Actually Happened.

When we rolled out AI coding tools across our engineering team, the first few weeks felt great. Developers were enthusiastic. Acceptance rates looked healthy. Everyone said they felt more productive. Then my CEO asked me a simple question: “Is it working?” And I realized I didn’t have a good answer. Feeling productive and being productive are not the same thing.

"Crown Jewels In, Crown Jewels Out" - The Hidden Risk of AI

How do you secure data in the age of Agentic AI? In this episode of ShipTalk, Dewan Ahmed sits down with Devan Shah, Chief Architect of Data Security at IBM, to explore the massive shift from traditional DevOps to AI-infused software delivery. Devan shares his journey from being a chef to leading an "army" of 450+ developers at IBM. They dive deep into the technical bedrock of IBM’s "OnePipeline" (built on Tekton and Argo CD), the rise of Data Security Posture Management (DSPM), and the architectural principles required to ship AI features without compromising security or compliance.

Building new revenue streams: 3 strategic cloud opportunities for telcos in 2026

The telecommunications industry is at a turning point: telcos are seeking ways to turn innovation into new opportunities. Looking at the data, the desire is easy to understand. In 2023, PWC projected that the sector’s annual growth rate would slow significantly between 2024 and 2028.

Powering Harness Executions Page: Inside Our Flexible Filters Component | Harness Blog

Filtering data is at the heart of developer productivity. Whether you’re looking for failed builds, debugging a service or analysing deployment patterns, the ability to quickly slice and dice execution data is critical. At Harness, users across CI, CD and other modules rely on filtering to navigate complex execution data by status, time range, triggers, services and much more.

Agentic AI in DevOps: The Architect's Guide to Autonomous Infrastructure | Harness Blog

For the last decade, the holy grail of DevOps has been Automation. We spent years writing Bash scripts to move files, Terraform to provision servers, and Ansible to configure them. And for a while, it felt like magic. But any seasoned engineer knows the dirty secret of automation: it is brittle. Automation is deterministic. It only does exactly what you tell it to do. It has no brain. It cannot reason.

AI SRE in Practice: Tracing Policy Changes to Widespread Pod Failures

Policy changes in Kubernetes are supposed to improve security, enforce standards, or optimize resource usage. But when a policy change triggers cascading pod failures across multiple namespaces, the investigation becomes a race to identify what changed before more workloads are affected.

NoSQL Change Control for Compliance | Harness Blog

NoSQL change control must be integrated into CI/CD to ensure governance, traceability, and deployment safety, while automated versioning, testing, and rollbacks reduce compliance risk and preserve release velocity, enabling structured database DevOps to scale innovation without compromising reliability or audit readiness.

Secure OAuth is easy to demo and hard to operate at scale

Most teams think about OAuth the same way they think about logging. It is necessary, familiar, and supposedly solved. Then it hits production. Suddenly, it is not just one authentication flow. It is a complex web of two or more applications, multiple environments, cookies, redirects, secrets, and route boundaries. The uncomfortable truth is that OAuth security is not just an implementation detail. It is an operational system, and that system is only as strong as the platform it runs on.

The rise of the agentic future: scaling AI workflows with relaxAI and n8n

This blog is based on the webinar, “From idea to agent: Building AI workflows with relaxAI and n8n”. You can watch the full recording by clicking here! AI isn’t slowing down. We’re moving from “ask a chatbot” to agents that run the multi-step workflows, use tools, and are built for real business processes. Most teams aren’t blocked by ideas. They’re blocked by three things: complexity, cost, and control.

AI Vendor Lock-In: How AI Is Creating A New Dependency Problem

Like most SaaS companies, you’re under pressure to ship AI-powered features faster, smarter, and at scale. For many teams, that pressure leads to relying on external AI platforms, managed models, and third-party APIs instead of building everything from scratch in-house. At first, it feels like a win. Your team ships an AI-powered feature in weeks instead of months. No GPU clusters to manage. No models to train. No infrastructure to babysit.

Why leaders are reassessing the role of big tech in 2026

This session highlights a major strategic shift where sovereignty has moved from a technical detail to a top boardroom priority. The data reveals that 84% of leaders are concerned about geopolitical threats to their data access. 82% of respondents are ready to reassess their big tech partnerships specifically to regain data control. This shift is further evidenced by the 71% of decision-makers who now place sovereignty at the heart of their tech partner choices moving forward. The era of "sovereign-by-design" infrastructure is here. Are you ready to build for a more resilient future?

Beyond boundaries: How global collaboration defines AI in 2026

As we move through 2026, the global conversation around AI is shifting from simple adoption to a deeper focus on true openness and sovereignty. In this session from Civo Navigate India 2025, OpenUK CEO Amanda Brock explores the evolving state of AI openness and shares a significant milestone: India is now the world’s number one open-source contributing community.

Reference architecture: The blueprint for safe and scalable autonomy in SRE and DevOps

Everyone wants autonomous incident response. Most teams are building it wrong. ‍ The ultimate goal of autonomy in SRE and DevOps is the capacity of a system to not only detect incidents but to resolve them independently through intelligent self-regulation. However, true autonomy isn't born from automating random, isolated tasks. It requires a stable foundation: a Reference Architecture.

Zero crashes, zero compromises: inside the HAProxy security audit

An in-depth look at the recent audit by Almond ITSEF, validating HAProxy’s architectural resilience and defining the shared responsibility of secure configuration. Trust is the currency of the modern web. When you are the engine behind the world’s most demanding applications, "trust" isn't a marketing slogan—it’s an engineering requirement.

SharePoint Preservation Hold Library: Hidden Cost Trap

Most executives assume that moving to Microsoft 365 simplifies cost control. Storage is “in the cloud”, usage is elastic, and governance is handled through policy. In reality, many organisations face a very different experience. They invest heavily in retention policies to meet legal and regulatory requirements, yet their SharePoint storage costs continue to rise year after year, even after large cleanup programs.

6 Underused Git Commands That Solve Real Developer Problems

Most developers spend hours each week wrestling with Git. Not because they’re bad at their jobs, but because Git doesn’t actively teach you its most powerful features. At GitKon 2025, our Senior Product Marketing Manager Jonathan Silva revealed 6 underused Git commands that solve the workflow problems developers face every day: botched rebases, lost commits, and merge conflict chaos. These aren’t advanced techniques.

API Testing Tools Best Practices Guide

Today’s software testing trends show the growing demand for more efficient and automated API testing. Manual testing is not only time-intensive for internal testing teams, it can also lead to poor customer experiences. When manual testing processes cannot proactively discover issues, your customers may inevitably be the ones finding them. Many of the current test automation solutions today focus on the UI, while most API-level testing is still done manually.

VirtualMetric DataStream + Google SecOps Integration: Pre-Ingest UDM Normalization at Scale

Google SecOps (formerly Chronicle) is widely used for large-scale security analytics, long-term telemetry retention, and detection across diverse environments. Its Unified Data Model (UDM) enables correlation across sources and supports analytics that operate over long time horizons. To take full advantage of these capabilities, security data must arrive in a consistent and well-structured UDM format. In practice, this is rarely the case.

Kubex and Tangoe Partner to Deliver Unified Cloud, Kubernetes, and FinOps Optimization

Enterprises operating at cloud scale today face a growing reality: managing infrastructure performance and cost in silos no longer works. Kubernetes, multi cloud environments, and GPU accelerated workloads deliver immense agility and capability, but they also introduce complexity that outpaces traditional monitoring and cost governance approaches.

Weekly vs. split-week on-call rotations: A guide to finding the right rhythm

When you move past daily rotations but find anything longer than a week feels too stretched out, you often end up choosing between weekly and split-week rotations. Weekly rotations give you a full seven days before handing off. Split-week rotations break that time into smaller chunks like 2-day, 3-day, or 4-day shifts. Each approach creates a different rhythm for your team. This guide compares both patterns across three key criteria.

Closing the Year Strong: Harness Q4 2025 Continuous Delivery & GitOps Update | Harness Blog

Q4 2025 delivered major upgrades across Harness Continuous Delivery, GitOps, and Continuous Verification, focused on safer rollouts, stronger infrastructure integrations, and workflows that scale. Here’s a curated roundup of what shipped and where to learn more. Welcome back to the quarterly update series! Catch up on the latest Harness Continuous Delivery innovations and enhancements with this quarter's Q4 2025 release. For full context, check out our previous updates.

AI Is Forcing A Return To Hybrid And Multi-Cloud (Here's What To Do Now)

For most of the last decade, the direction of cloud strategy was clear: standardize, consolidate, and reduce sprawl. Engineering teams worked to pick a primary cloud, reduce vendor dependencies, and simplify their stacks. FinOps teams unwound years of fragmentation. Platform teams built guardrails to make sure it didn’t happen again. Then AI arrived, and it’s a fundamentally different class of workload. AI demands specialized hardware and, increasingly, diverging providers.

SQL Server 2025 is generally available on Ubuntu 24.04 LTS

Microsoft has announced the General Availability (GA) of SQL Server 2025 on Ubuntu 24.04 LTS, starting with the CU1 release. This milestone allows enterprises to deploy mission-critical workloads on our latest Long Term Support release, benefiting from predictable stability and up-to-date kernels.

Event Intelligence Solutions Part Three: Best Practices for Successful Adoption

As Event Intelligence Solutions (EIS) move from early adoption to operational necessity, many enterprises are realizing that success depends on more than selecting the right technology. For Banking and Financial Services organizations, effective adoption requires a clear strategy, disciplined execution and a strong alignment to business priorities and regulatory demands and not least, customer expectations.

Using Meraki and Megaport Virtual Edge for Multicloud Networking

SD-WAN with Megaport SDCI is now generally available. Here’s how you can use them to optimize your network’s middle mile. Building on our successful collaboration with Cisco Catalyst SD-WAN (formerly Viptela) and Cisco Secure Firewall Threat Defense Virtual, Megaport is thrilled to announce the general availability of our integration with Cisco Meraki.

My AI Agent Stole My Crypto #speedscale #openclaw #aicoding #codingagent #security

I thought I found the ultimate coding shortcut: an autonomous AI agent. Turns out, I just bought a one-way ticket to a digital nightmare. A friendly reminder to my fellow devs: Validation isn't optional—it's survival. Your laptop shouldn't have a higher calling than your production environment. Validate now: speedscale.com.

Getting Started with Harness Database DevOps using Flyway

Database changes shouldn’t be manual, risky, or disconnected from your release process. This walkthrough shows how to get started with Database DevOps using Flyway and Harness, bringing versioning, automation, and CI/CD discipline to your database migrations. Flyway enables SQL-first, versioned migrations, while Harness provides governance, pipeline orchestration, and environment visibility so teams can deploy database changes with the same confidence as application code.

The AI-Empowered Site Reliability Engineer: Automating the Balance of Risk and Velocity

You might expect an AI-SRE agent to target 100% reliable services, ones that never fail. It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a non-linear cost: maximizing stability limits how fast new features can be developed, dramatically increases the operational cost, and reduces the features a team can afford to offer.

Helping Businesses Manage Blocked Calls: How SIP 603+ improves transparency in troubleshooting Call Failures

Imagine pulling up to a gas pump, inserting your credit card, and having the display on the pump say “denied”. You call your credit card company, and they say, “Oh, we don’t know, maybe it’s the merchant’s fault, or the card reader is bad…, we can look into it and get back to you in a few weeks.” Most of us would be pretty upset with that response.

What is the Open Container Initiative?

In this video, we explain the Open Container Initiative (OCI) and how open, vendor-neutral standards make containers portable and interoperable across platforms, tools, and environments. We cover what OCI is, why OCI compliance matters, and how OCI defines the core building blocks of the container ecosystem: container images, runtimes, and distribution.

Architecting Trust: The Blueprint for a "Golden Standard" Software Supply Chain | Harness Blog

We’ve all seen it happen. A DevOps initiative starts with high energy, but two years later, you’re left with a sprawl of "fragile agile" pipelines. Every team has built their own bespoke scripts, security checks are inconsistent (or non-existent), and maintaining the system feels like playing whack-a-mole. This is where the industry is shifting from simple DevOps execution to Platform Engineering.

From Blueprint to Production: Building a Kubernetes MCP Server

As Large Language Models (LLMs) evolve from simple chatbots into agentic workflows, the need for a standardized way to connect them to external data and infrastructure has become critical. In a recent workshop hosted by Nir Adler, Innovation Engineer at Komodor, we explored how to bridge this gap using the Model Context Protocol (MCP).

Backstage Alternatives: IDP Options for Engineering Leaders | Harness Blog

Backstage alternatives fall into three real choices: build and own a framework, buy a fully managed IDP product, or choose a hybrid path that reduces maintenance but keeps Backstage at the core. The trade-off is not "free vs paid" but engineering headcount, governance maturity, time to value, and how actionable your portal is across CI/CD, IaC, and environments. The best commercial IDPs go beyond catalog and documentation.

Your Boss Doesn't Understand Your Work (Here's Why)

Developer productivity metrics create unique anxiety. If your company rolled out tracking systems like DORA metrics or velocity dashboards, you're probably wondering what these numbers mean and how they'll evaluate your work. At GitKon 2025, we assembled senior engineers from GitHub, Cloudflare, Kong, and GitKraken to discuss "Your Boss is Measuring You, Now What?" The panel included both individual contributors and engineering leaders, creating an honest conversation about measurement from both perspectives.

Why MCP is becoming part of your product surface

AI assistants are quickly becoming a primary interface for how people interact with software. Developers ask them how to integrate APIs. Users ask them how products work. Buyers ask them how tools compare. Increasingly, the first explanation someone receives about your product does not come from your website, your documentation, or your sales team. It comes from an AI assistant. That shift has an important consequence that many organizations are only starting to notice.

Why preview environments only work when the platform owns them

Deployments are one of the few moments where software development still feels risky. Teams may have tests, a staging environment, and careful review processes, yet the final step still carries uncertainty. Will this change behave the same way in production? Will it interact cleanly with existing data, traffic, and infrastructure? Will it introduce regressions no one anticipated? Preview environments exist to reduce that uncertainty.

Upsun's AI story: the 5% path from pilots to production value at scale

Here’s the uncomfortable truth: most companies do not have an AI problem. They have a delivery problem wearing an AI costume. MIT’s Project NANDA research has been widely cited for a brutal headline statistic: roughly 95% of corporate generative AI pilots fail to produce measurable business impact or returns, while only about 5% break through to meaningful outcomes. (Yahoo Finance) The models are impressive. The demos are dazzling. The budgets are real.

Intelligent FinOps: AI-Informed, AI-Enabled

AI is the new frontier for FinOps maturity. It introduces fresh spend patterns and new opportunities for value. As GPUs, inference, and retraining reshape costs, FinOps maturity grows through visibility, forecasting, and shared mindset about how these workloads drive business impact. In this 2025 post, I gave my guidelines for implementing AI tagging to give business context and clarity to vague AI invoices. Now, I’m sharing the next level up: how to drive FinOps in AI with AI.

From Chaos To Clarity: How Forcepoint Scaled FinOps Across The Organization

When Anthony Leung talks about FinOps, he’s speaking from operating at real scale — not theory. As VP of Engineering Platforms and Security Research at Forcepoint, he led a transformation that cut cloud spend in half while improving availability, and built a culture where engineers own their economics.

(Tech Talk) Shipping with Context Knowledge Graphs as the Backbone of AI-First Software Delivery

Knowledge graphs are essential to solving the context bottleneck in AI-First software delivery, which occurs because workflows, policies, and dependencies are siloed and invisible to AI agents. In this Tech Talk, Prateek Mittal ((Product Director of AI Core and Data Platform at Harness)) discusses the key concepts: Knowledge Graphs vs. Observability: Observability tells you "what is happening," while knowledge graphs tell you "what does that mean" by modeling structured relationships. They work together to link live signals to affected services or SLAs.

Introducing Harness Artifact Registry | Unified. Secure. Built for the Future Artifact Management

Managing build artifacts today is harder than it should be. Fragmented tools, security blind spots, and disconnected developer workflows make it difficult to keep builds safe, consistent, and production-ready. In this walkthrough, Shibam Dhar, DevRel Engineer at Harness, shows how Harness Artifact Registry unifies artifact management across the entire software delivery lifecycle — from creation to deployment — while improving security and developer experience.

We Built an MCP Server

When I joined Kubex last year, the company was already well aware of the growing power of Large Language Models. As a company focused on intelligent resource optimization for Kubernetes, GPUs, and cloud infrastructure, generative AI didn’t feel like a threat so much as a natural extension of where the industry was heading. Kubex had already invested heavily in machine learning, but it was becoming clear that foundation models could unlock an entirely new class of capabilities for our customers.

How Dartmouth avoided vendor lock-in and implemented LBaaS with HAProxy One

History is everywhere at Dartmouth College, and while the campus is steeped in tradition, its IT infrastructure can’t afford to get stuck in the past. In an institution where world-class research and undergraduate studies intersect, technology must be fast, invisible, and – above all – reliable. That reliability was put to the test when Dartmouth’s load balancing vendor was acquired twice in five years, as Avi Networks moved to VMware and VMware moved to Broadcom.

Custom Dashboard Creation: Step-by-Step Tutorial

Creating a custom dashboard is the best way to monitor metrics that matter most to your systems. Tools like MetricFire make this process straightforward by combining hosted Grafana and Graphite, eliminating the need for self-hosted solutions. Here's how you can build dashboards tailored to your needs.

Top 9 Observability Tools for AI-Assisted Development & Deployment

AI-assisted development is rapidly becoming the default way software is built. Code generation, AI copilots, agentic pull requests, and automated refactoring are now embedded directly into engineering workflows. While this shift dramatically increases delivery speed, it also introduces a new operational reality: production systems are changing faster than humans can fully reason about them. This is where observability becomes mission-critical.

What AI Has Never Seen: The Context Gap in Code Generation

Your AI coding assistant has read the entire internet. It knows every programming language, every framework, every best practice documented in Stack Overflow answers and GitHub repositories. It can generate a REST API handler in seconds that looks perfect with clean code, proper error handling, following all the patterns. But here’s what it’s never seen: your production traffic. Data from a real API request. Someone filling out a form with messed up or incomplete data.

#052 - The "Short Long Path": Mastering Abstraction, Culture, and Kubernetes Scale with Shemer M...

In this episode, Itiel joins forces with Shemer, Director of Platform Solutions at the gaming giant Playtika, and Scott Rosenberg, Lead Architect at TeraSky, to discuss the realities of platform engineering at a massive scale. The trio dissects Playtika’s multi-year journey from a legacy, homegrown Kubespray infrastructure to a modern, holistic platform built on Spectro Cloud, all while running strictly on-premise to support 25+ games and high-volume traffic.

Your Test Data Environment: Build vs Buy - a conversation we need to have

After three decades of working with databases, one thing I’ve seen over and over is this: we don’t treat our development and test environments with the same respect we do our production systems. Not because people don’t care. Far from it. It’s usually because teams are under pressure, everyone’s juggling multiple priorities, and the quickest path forward often wins the day.

2-day vs. 4-day on-call rotations: Which one fits your team

Teams that find a weekly rotation too long and a daily rotation too short often end up choosing between 2-day and 4-day rotations. This guide compares both these rotations across three key criteria. For each criterion, we have discussed how it works for 2-day and 4-day rotations and recommended what to choose when. To make it easy, we also included a comparison table for a quick overview. This gives you all the information you need at a glance. Let’s dive in! Table of contents.

Scalable AI governance: why your policy needs a platform, not just a PDF

Most IT teams don’t lack AI policies. They lack policies that survive a Git push. In many organizations, AI governance is a paper tiger. There are comprehensive documents outlining data usage, approved models, and risk management. On an auditor's desk, these policies look complete. But inside the workflow, the reality is different. AI tools are being embedded directly into IDEs, CI pipelines, and internal automation scripts.

What mid-market IT teams wish they knew before deploying AI agents

AI agents are quickly shifting from experimentation into day-to-day operations. That shift is showing up in the data. McKinsey’s latest State of AI research highlights both broader AI use and the growing focus on “agentic AI,” even as many organizations still struggle to scale safely. For mid-market IT teams, agents can feel like the unlock: automate repetitive workflows, reduce backlog pressure, and deliver more output without expanding headcount.

Building Trust in the Machine: A Guide to Architecting Agentic AI for SRE

The promise of Artificial Intelligence in Site Reliability Engineering (SRE) is seductive: an autonomous system that never sleeps, instantly detects anomalies, and fixes broken infrastructure while humans focus on high-value work. However, the gap between a demo-ready chatbot and a production-grade Autonomous AI SRE is vast. In complex, noisy environments like Kubernetes, a “naive” implementation of Large Language Models (LLMs) is not just ineffective, it can be dangerous.

AI Tags: Why Cloud Tagging Breaks Down For AI Workloads (And What To Use Instead)

Tags have long been the backbone of cloud cost visibility and governance. They help teams understand who owns what, where spend comes from, and how infrastructure maps back to the value the business delivers. However, AI workloads have altered that model, and exposed the limitations of traditional AI tags in the process. In fact, many of the most expensive AI operations don’t run on taggable cloud resources at all.

AI meets SQL Server 2025 on Ubuntu

Since 2016, when Microsoft announced its intention to make Linux a first class citizen in its ecosystem, Canonical and Microsoft have been working hand in hand to make that vision a reality. Ubuntu was among the first distributions to support the preview of SQL Server on Linux. Ubuntu was the first distribution offered in the launch of Windows Subsystem for Linux (WSL), and it remains the default to this day. Ubuntu was also the first Linux distribution to support Azure’s Confidential VMs.

The Dangerous Power of Local AI Agents. #speedscale #proxymock #aiagents #openclaw #localai

I’ve been testing OpenClaw, a fully autonomous agent that lets you remote control your entire system via Signal. It’s incredibly powerful to text your computer from a coffee shop and have it execute tasks, but you’re essentially handing the keys to your digital kingdom to an LLM. The Golden Rule: Trust, but verify. I’m using Proxymock to sniff every single API call going in and out of the agent. If there’s a data leak or a "hallucination" that tries to wipe my drive, I see it first.

Qwiet AI Is Now Harness SAST and SCA | Harness Blog

Modern application security is struggling to keep up with AI-driven development and cloud-native scale, especially when security feels bolted onto CI/CD instead of built in. Harness SAST and SCA bring AI-powered application security testing natively into the Harness platform, reducing noise and alert fatigue. By identifying only vulnerabilities that are actually reachable in production code, teams get findings they can trust and act on faster.

The economics of a sovereign cloud

The BCG recently released a report on the cost of cloud. The findings? Hyperscalers are charging up to 30% more for their sovereign-cloud offerings. It supports an earlier notion that if you want control, compliance, and jurisdictional certainty, you have to pay a premium. At Civo, we think that is broken. As data volumes grow and AI workloads become central to business strategy, the economics of cloud computing are being re-examined.

How Civo is building the "cloud the way you want it"

As we move through 2026, the global cloud landscape is being reshaped by the drive for digital independence first discussed at Civo Navigate India 2025. This keynote featuring Mark Boost, Dinesh Majrekar, Josh Mesout, and Ben Norris laid the groundwork for a future where organizations no longer have to choose between the scale of the public cloud and the security of a private environment.

Best PostgreSQL ODBC Drivers in 2026: How to Choose

PostgreSQL ODBC drivers are no longer background components. For teams running BI, reporting, and ETL on PostgreSQL, the drivers directly affect how fast queries run, how reliably dashboards refresh, and whether data pipelines remain stable as usage grows. As PostgreSQL moves deeper into analytics stacks, these capabilities are driving the demand for these tools, a trend reflected in broader ODBC market growth.

How To Calculate Customer Retention Cost in 2026: The Hidden SaaS Metric

You may have heard that keeping an existing customer is five times cheaper than acquiring a new one. But that isn’t always true. “Hidden costs” often accompany customer retention, loyalty, and increasing “share of customer”. Could you be spending more on customer retention than on winning new customers? This quick guide will walk you through the meaning of Customer Retention Cost (CRC), why it’s important to calculate it, and how to calculate it.

Kosli and Team Topologies - A Strategic Partnership for SDLC Governance

We’re delighted to announce a strategic partnership between Kosli and TeamTopologies - a collaboration that brings together SDLC Governance automation with the world’s leading framework for organizing business and technology for fast flow of value.

The hidden cost of "just using Kubernetes"

Kubernetes has become the default foundation for a lot of modern application infrastructure. It’s powerful, flexible, and widely supported, which makes it an obvious starting point for many teams building a cloud-native application platform (a standardized way for teams to deploy, run, secure, and operate applications in production). But there’s a distinction that often gets lost early in the decision process: Kubernetes is a framework. It is not a platform.

How to choose the right on-call rotation

Choosing an on-call rotation is about finding a rhythm that balances your team’s well-being and your system’s reliability. The right on-call rotation helps prevent burnout and makes on-call duties sustainable over the long run. This guide walks you through different on-call rotation patterns, from daily rotation to after-hours rotations. We’ll look at why you might choose a particular rotation and the challenges that often come with it.

Why a month is too long to be on-call

There is often a temptation to stretch on-call shifts to a month or longer, especially when incident volume is low. The logic seems sound. If the phone rarely rings, it feels unnecessary to hand off on-call duties every week. But looking strictly at incident volume often misses the human side of the equation. Being on-call isn’t just about answering pages. It is also a state of mind. Even when it is quiet, simply being on-call could create fatigue of its own.

AWS IoT Greengrass comes to Ubuntu Core

London, February 3, 2026 — Canonical and AWS are pleased to announce the release of the new snap for AWS IoT Greengrass, making the deployment of your IoT solutions easy and seamless all the way from silicon to the cloud. With the AWS IoT Greengrass agent now available as a snap package from the Canonical Snap Store, Ubuntu Core has become the ideal operating system for all your AWS IoT edge workloads and data ingress.

8 themes shaping engineering in the age of AI

We know that AI has been transformational for engineering and it will continue to be, so stop me if this sounds familiar. Imagine an engineering lead opening a pull request for a critical security patch and finding five hundred lines of AI-generated code. While the solution is (mostly) usable, it follows a pattern no one on the team recognizes. This shift away from manually writing every line of logic has introduced a unique level of complexity for teams.

The 5 Automation Implementation Mistakes That Derail IT Ops (and How to Avoid Them)

Automation has become more than just a "nice to have" choice. It's an essential part of the modern business landscape, promising increased efficiency, reduced costs, and improved accuracy. However, despite its potential benefits, many organizations struggle when trying to implement automation. In this article, we'll explore some of the most common implementation mistakes we've encountered and how to navigate them effectively.

Properly securing OpenClaw with authentication

OpenClaw (née MoltBot, née ClawdBot) is taking over the world. Everyone is spinning their own, either on a VPS, or their own Mac mini. But here's the problem: OpenClaw is brand new, and its security posture is mostly unknown. Security researchers have already found thousands of publicly available instances exposing everything from credentials to private messages.

What is DevOps? Definition, Lifecycle, Best Practices, & Tools

We’ve seen a huge explosion of interest in DevOps over the last few years. But for people who are new to these ideas, it’s not always obvious what DevOps entails and what the benefits are, particularly in larger environments. So, what is DevOps all about? And what do you need to know to succeed? In this blog, you’ll get a breakdown of how DevOps works, its benefits, and the best practices and tools that help teams build and deploy software with speed and confidence.

Every CIO is asking the same question: Am I Next?

Every CIO is asking the same question: Am I next? We’ve seen it across cloud providers, carriers, and global platforms—organizations with enormous scale and investment still experience public, business-impacting outages. The risk isn’t lack of effort. It’s the growing gap between AI-driven complexity and the ability to see, understand, and resolve issues fast enough to protect availability commitments.

Disaster Recovery Testing by Gremlin

Do you know how your system will respond when major outages strike? Disaster Recovery Testing safely simulates real catastrophic failures across your entire system. You can centrally and easily run zone, region, and datacenter-scale reliability tests across your entire organization simultaneously for disaster recovery, business continuity, compliance verification, and more. With Disaster Recovery Testing, tests that used to take engineering-months and dozens of experts can be done safely and securely in hours by a single person.

Andy Wojnarek Appointed Chief Technology Officer

ATS Group and Galileo are pleased to announce the appointment of Andy Wojnarek as Chief Technology Officer. Andy’s appointment reflects the evolution of a technical leadership role he has developed over more than 16 years with the company, grounded in hands-on expertise, cross-functional influence, and a sustained focus on solving complex infrastructure and observability challenges for clients.

Your servers shouldn't need to know ACME

CertBot assumes every server that needs a certificate should also know how to request one, validate domain ownership, handle renewals, and manage failures. This makes sense with a handful of servers. One server, one cert, done. But infrastructures grow. Now you’ve got web farms sharing wildcards, load balancers, mail servers, VPN appliances. The “every server for itself” model doesn’t scale and isn’t sustainable. Even the Let’s Encrypt community knows it.

Komodor AI SRE vs. OSS AI Agent: A Technical Comparison of Agentic AI for Kubernetes Troubleshooting

Gartner predicts that AI agents will be implemented in 60% of all IT operations tools by 2028, up from fewer than 5% at the end of 2024. This acceleration has sparked an explosion of AI SRE solutions, from enterprise platforms to open-source alternatives, all promising faster root cause analysis and reduced MTTR.

3 Tips for a Smoother Software Deployment Process

Just because software releases have become more frequent doesn't necessarily mean they're always smooth. Many teams can push changes on schedule and still lose time to noisy pipelines, brittle handoffs, and production checks that start only after users complain. The result is work that feels fast until it gets slowed down by issues.

Why test data management is becoming increasingly important to senior IT leaders

We recently sat down with James Phillips, Senior IT Leader, to talk about test data management (TDM) and the growing attention it’s getting from the senior IT leaders. It’s been prompted by the recognition that provisioning test and development environments with realistic production-like data improves the quality of code being developed, reduces errors, and deliver new features to customers faster.

Automating Infrastructure as Code changes with an AI agent

The infrastructure management landscape is undergoing a fundamental transformation. Infrastructure as Code has already revolutionized how we provision and manage cloud resources by treating infrastructure as software. The next evolutionary step involves intelligent automation that can understand, adapt, and optimize these configurations independently.

Everything you need to know about ITIL 5, AI and incident management

ITIL 5 launched in January 2026, and for the first time in the framework's 40-year history, AI governance is front and center. If you're running incident management, on-call rotations, or building operational tooling, this matters: the gap between AI adoption and AI governance is about to become a compliance and operational risk issue. I’m not usually a big ITIL fan, but this guidance has some genuinely useful framing and questions.