Operations | Monitoring | ITSM | DevOps | Cloud

Mastering AI Prompts: How to Get the Best Out of SQL Prompt AI | The Tony and Tonie show Ep41

How to get the most value from SQL Prompt AI in day-to-day work, whether you're writing new queries or improving existing code. A little prompt-writing knowledge goes a long way with SQL Prompt AI. Tony and Tonie discuss how to build reusable prompts that give the tool the context it needs to return useful results first time.

Fear, Identity & Flaky Tests: AI in Reliability w/ Dana Lawson (CTO, Netlify)

The self-healing systems that SREs have dreamed about for a decade aren't a distant promise anymore — they're already being built, and the biggest barrier left is cultural. Dana Lawson, CTO at Netlify, has spent over 25 years in the trenches of developer infrastructure, from sysadmin roots to running the platform that powers 5% of the internet.

Kubecost Vs. OpenCost: What's The Difference? (Updated 2026)

Kubernetes (K8s) adoption has exploded over the past few years. But it hasn’t been easy to monitor, manage, and optimize K8s costs. To provide greater cost visibility into Kubernetes clusters and environments, Kubecost launched in 2019 and was acquired by IBM in 2024, while OpenCost debuted in 2022. OpenCost has several founding contributors. But Kubecost developed the cost allocation engine that the OpenCost implementation uses.

How Developers Build a Meaningful Career in the Age of AI

What does a meaningful developer career look like in the age of AI? We brought together four experts to answer exactly that. In this GitKon panel, GitKraken CMO Kate Adams moderates a conversation with Leon Noel (Managing Director of Engineering, Resilient Coders), Danny Thompson (Director of Technology and host of The Programming Podcast), Maggie Hunter (Recruitment Lead, GitKraken), and Dimitry Fonarev (CEO, Testkube) to explore how software engineers can future-proof their careers, grow their skills, and navigate an industry that is changing fast.

Developer guide for migrating to reproducible environments without rewriting

The primary obstacle to adopting reproducible environments is often the assumption that environment parity requires containerizing legacy monoliths from scratch or abandoning stable CI/CD pipelines. In reality, reproducibility is about capturing application intent through configuration rather than rebuilding the application itself. This guide outlines a non-disruptive, incremental path to migrating your workflow to production-identical environments without touching your core codebase.

KubeCon Europe 2026: AI Is Shipping Code Faster Than Orgs Can Govern It

KubeCon + CloudNativeCon Europe 2026 recently brought the cloud native community to Amsterdam. We were there all week bouncing between the booth, a Braintrust event with engineering leaders from across the community, and more hallway conversations than we can count. One talking point dominated the week: AI is shipping code faster than most engineering orgs can govern it. It also became clear that we weren't the only ones talking about this challenge.

Is OpenTelemetry overkill? There's a lazier (and better) way. #speedscale #sre #ebpf #kubernetes

If you "aspire to be lazy" like we do, you know that building staging environments and mocking complex back-ends (like MySQL, AI models, and 3rd party APIs) is a massive time sink. In this demo, we show you how to use Internet Magic (aka eBPF) to: Stay tuned for Part 2, where we take these recordings and spin up a staging environment automatically.

Harness Ships Five Capabilities to Power Confident Releases at AI Speed | Harness Blog

The pace of AI-assisted development has outgrown how most teams actually ship. Harness is closing that gap. Engineering teams are generating more shippable code than ever before — and today, Harness is shipping five new capabilities designed to help teams release confidently. AI coding assistants lowered the barrier to writing software, and the volume of changes moving through delivery pipelines has grown accordingly. But the release process itself hasn't kept pace.

Customer-led, independently proven: Redgate Monitor's G2 Spring 2026 awards

Every quarter, G2 publishes seasonal reports and awards badges based on authenticated customer reviews, not paid ranking, to highlight top-performing B2B software products. For Spring 2026, Redgate Monitor had a standout showing, earning 14 new badges, including several “Best / Most” awards that go to just one single product in every category.

How to manage Ubuntu fleets using on-premises Active Directory and ADSys

The “hybrid fleet” is today’s reality: organizations diversify operating systems while Microsoft Active Directory (AD) remains the dominant identity “source of truth.” IT administrators must ensure Linux machines, like Ubuntu desktops and servers, behave as first-class citizens in this environment.

I Fixed a $30K/Year Anomaly in the Time It Takes to Make Coffee

If you work in FinOps, you know the feeling. You open your recommendations queue on a Monday morning. There are 47 items. You worked through 12 of them last week. You’re back up to 47 again. All represent real money leaving the building, but not all are “bad money” – of those 47, a significant share will be “this is ok, expected, we got value”. That’s what really kills FinOps enthusiasm (and is why the engineer is skeptical towards the FinOps person).

New Release of Delphi Data Access Components Adds Support for Latest IDEs, Databases, and Windows Arm64EC Target Platform

We are thrilled to announce a new release of our Delphi Data Access Components product line. This update delivers broader platform compatibility, enhanced security, and extended support for modern database technologies across multiple providers.

AI Coding Agents Break What Works

Your AI coding agent just made every test pass. Ship it, right? Not so fast. A growing class of AI-generated bugs doesn’t come from writing bad code. It comes from the AI changing working code to accommodate its own mistakes. This isn’t a theoretical risk. It’s happening now, in production codebases, and it’s harder to catch than any bug the AI might introduce from scratch.

Cloud Sovereignty: Location, Access, and Jurisdiction

Cloud residency has moved from a technical preference to a board-level control question, as organisations are being asked to evidence who can access data, under which jurisdictions, and what happens when something goes wrong across borders. A Gartner survey of CIOs and IT leaders in Western Europe found that 61% expect geopolitical factors to increase their reliance on local or regional cloud providers, while also predicting that by 2030, more than 75% of enterprises outside the US will have a digital sovereignty strategy.

We Made Claude Narrate an AI Model Race Like a Sports Commentator | Loop Lab

What if you didn't have to stare at logs while your AI agent worked? In this Loop Lab experiment, Ryan Hamilton built Claude Livecaster, a tool that gives Claude a live voice to narrate long-running agentic processes like a sports commentator. The demo: six AI models (GPT, Gemini, and Claude variants) race through a CI/CD benchmark, and Claude calls the whole thing play-by-play. Rate limit hits, comeback stories, photo finishes, all of it, out loud.

DevOps Workflow Strategy for Startups: 7-Step Guide (2026)

Reliability is the foundation of successful startups. Your product could have the most innovative features, but if it's plagued by downtime or performance issues, customers will eventually jump ship. Fortunately, creating an effective DevOps workflow strategy doesn't have to be complicated. This guide breaks down the essential components and implementation steps that startup DevOps and SRE teams need to focus on.

The SaaS Paradox: Why Companies Must Spend More On AI To Survive

At SaaS Metrics Palooza 2025, CloudZero CEO Phil Pergola delivered a keynote on the software industry’s most pressing question: can SaaS survive the AI revolution, or will AI rewrite the SaaS playbook outright? Phil’s answer wasn’t doom and gloom, but he didn’t sugarcoat the challenges. “Churn rates are up,” he told moderator Ray Rike of Benchmarkit on Oct. 9, 2025. “The payback from a customer acquisition cost perspective is taking longer.

Let's Encrypt simulated revoking 3 million certificates. Most ACME clients didn't notice.

On March 19th, Richard Hicks, one of our customers, emailed us about a certificate that had renewed after only a week. It was a 90-day certificate and he had not initiated the renewal. That’s the kind of thing that sends you straight to the logs. We found the answer right away. The certificate’s ARI renewal window had been shortened dramatically.

Simplify bare metal operations for sovereign clouds

The way enterprises are thinking about their infrastructure has changed. Digital sovereignty of all kinds – data sovereignty, operational sovereignty, and software sovereignty – have begun to dominate the infrastructure discussion. Today, these abstract terms have become practical concerns for platform teams.

How to Harden Ubuntu SSH: From static keys to cloud identity

30 years after its introduction, Secure Shell (SSH) remains the ubiquitous gateway for administration, making it a primary target for brute force attacks and lateral movement within enterprise environments. For system administrators and security architects operating under the weight of regulatory frameworks like SOC2, HIPAA, and PCI-DSS, default SSH configurations are an “open door” that represents an unacceptable risk.

How to Scale Sandbox Environments with an Internal Developer Portal | Harness Blog

Here's a scenario that probably sounds familiar: a developer needs a sandbox environment to test something. They file a ticket. Then they wait. And wait. Maybe a day goes by, maybe three. Meanwhile, your platform team is buried in provisioning requests, and somewhere, someone has already spun up an unsanctioned workaround that bypasses every governance policy you've put in place. It's a lose-lose. Developers lose velocity, platform teams lose their sanity, and security gaps quietly multiply.

It's Time to Rethink Untrusted Code in Your Pipeline | Harness Blog

The catastrophic TeamPCP exploit in March 2026 demonstrated that "open execution" models, in which third-party code runs with full privileges, have made CI/CD pipelines a primary target for global credential harvesting. There are better architectures. On March 19th, the risks of running open execution pipelines — where what code runs in your CI/CD environment is largely uncontrolled — went from theoretical to catastrophic.

3 Biggest Myths of Chaos Engineering

Are myths about chaos engineering preventing your team from building more resilient systems? In this video, Matt Schillerstrom, Director of Product Management at Harness and founding engineer of the chaos engineering program at Target.com, breaks down the three most common misconceptions about chaos engineering. Drawing from his experience building large-scale programs, Matt explains how to move past these myths to build confidence in your infrastructure.

Claude Livecaster Is Now Open Source, Plus a Two-Voice Broadcast Mode | CircleCI Loop Lab

Claude Livecaster is now public on CircleCI Research. In this update, Ryan Hamilton walks through the newly open-sourced repo, seven built-in simulation scenarios, and a new two-voice broadcast format featuring an anchor and a field correspondent narrating the action together. The demo scenario: Pipeline Wars, six CI pipelines racing across three providers, with Claude providing live color commentary on every Docker build failure, OOM kill, and production rollout.

Building an Alert Routing setup that never misses a critical incident

Critical incidents have a direct impact on your business revenue and the trust your customers place in you. The longer a critical incident goes unnoticed, the higher the stakes. A reliable alert routing setup automatically catches these incidents the moment they trigger and gets them to the right person without delay. This guide walks you through how to build that reliable routing setup.

How to handle midnight incidents without waking everyone up

When a midnight incident triggers, the goal is not to wake your entire team. It’s to reach the one person who can act on it. Everyone else should sleep through it undisturbed. The difference between a team that handles midnight incidents well and one that doesn’t usually comes down to a few decisions made ahead of time. Which incidents actually need a midnight response? Who should get the call? And what should happen to everything else? This guide walks through those decisions.

Routing incidents the way their severity and priority demand

Severity and priority are two labels that describe different things about an incident. Severity covers the blast radius: how much of your system or how many customers are affected. Priority covers the urgency: how quickly someone needs to act. Routing rules then use these labels to load the right escalation policy for each incident. This guide covers how to define your severity and priority levels and map them to escalation policies.

Mastering Microsoft Azure Certification Preparation with Reliable Study Resources

The demand for cloud computing professionals has surged dramatically in recent years, and Microsoft Azure stands out as one of the leading cloud platforms globally. Whether you are a beginner stepping into the IT world or an experienced professional aiming to validate your skills, Azure certifications like AZ-900 and AZ-500 play a crucial role in enhancing your career prospects. Preparing for these exams requires not only dedication but also access to high-quality study materials and reliable practice resources. This is where platforms like Exam-Labs.com become valuable for candidates seeking structured and effective preparation strategies.

AZ-500 and DP-203 Certification Path for Microsoft Azure Security and Data Engineering Careers

Microsoft Azure has become one of the leading cloud platforms in the world, powering businesses of all sizes. As organizations continue to migrate to cloud infrastructure, the demand for certified Azure professionals is increasing rapidly. Among the most valuable certifications in this ecosystem are AZ-500 (Azure Security Engineer Associate) and DP-203 (Azure Data Engineer Associate).

Self-service infrastructure promises speed, but without control, it creates chaos.

In this video, we break down what self-service infrastructure with guardrails actually means and why modern platform teams are adopting it to scale safely. Learn how developers can move faster without waiting on approvals, while organizations maintain control through governance, automation, and policy-based guardrails. We cover: This approach is redefining how infrastructure is delivered across DevOps, platform engineering, and cloud environments.

The 4 Golden Signals of Monitoring Explained

As a team, we have spent many years troubleshooting performance problems in production systems. Applications have become so complex that you need a standard methodology to understand performance. Our approach to this problem is called the Golden Signals. By measuring these signals and paying very close attention to these four key metrics, providers can simplify even the most complex systems into an understandable corpus of services and systems.

AWS Proton End of Life: What Teams Need to Know and Do Before October 2026

AWS Proton is reaching end of life. If you're reading this, you probably just found out — either from the AWS console banner, your account manager, or a panicked Slack message from someone on your platform team. Here's what you need to know: your infrastructure is safe, but the tool you use to manage it is going away. You have until October 7, 2026 to find a replacement. That sounds like plenty of time. It isn't.

Migrating from MySQL to PostgreSQL: Performance and Replication Best Practices

Summary: Today, many teams are moving from MySQL to PostgreSQL as they update their database systems and plan for future growth. However, too often, there is extra work after the migration: for example checking that tables and constraints were copied correctly, tuning performance, and confirming that replication works properly. Devart’s PostgreSQL tools help DBAs with these tasks through features like Schema Compare, Data Compare, and other tools that help review and manage PostgreSQL databases.

High-Performance Range Queries in PostgreSQL: Overcoming Bottlenecks in AWS Aurora

Short Summary: PostgreSQL can slow down when range queries and frequent data updates rely on the same indexes. This guide shows how to spot the problem and use Devart tools to reduce B-Tree index conflicts, improve query plans, and manage bi-weekly data updates in AWS Aurora.

How Much Does It Cost To Keep Up With The AI Joneses?

I’ve been an engineering leader for over a decade, and I’ve spent most of those years in private Slack groups with other engineering leaders, comparing strategies and kvetching about Kubernetes. Of the hundreds of threads I’ve taken part in, the one that got the most engagement the fastest was a recent one around AI adoption. “Where are you on this continuum?”, it read. “A. You don’t really care how people use AI; B. You push people to use AI; or C.

Beyond the spreadsheet: Using GitOps to generate DORA-compliant audit trails.

In the 2026 regulatory landscape, manual audits are a liability. This guide explores using GitOps to generate DORA-compliant audit trails through IaC, drift detection, and automated segregation of duties. Discover how the Qovery management layer turns compliance into an architectural output, reducing manual overhead for CTOs and Senior Engineers.

Winning in the AI Era: How Top Teams are Driving Their Velocity Gains with Alloy & Chime

While most teams struggle with the complexity of AI-generated code, Alloy and Chime have built internal cultures and processes that enable them to scale their development while maintaining quality. Join CircleCI’s CTO, Rob Zuber, in conversation with Maciej Makowski, Senior Software Developer at Chime, and Sunny Singh, Senior Software Engineer at Alloy, as they explore the dynamics that set their teams apart. They'll talk through the culture and delivery practices that actually moved the needle.

Product Portfolio Management for New Paradigms - DevOps, AI, and Beyond - Job Task Analysis | Harness Blog

Taking a look back over the last ten years in enterprise technology, paradigm shifts are occurring more frequently. For example, the maturity of DevOps/Platform Engineering and Cloud Native infrastructure has occurred. The new frontier depending where you are in adoption is AI. As your adoption and maturity curve progress, operationalizing these paradigms become important.

AI Cost Management: How To Track, Allocate And Optimize AI Spend

AI cost management is the practice of tracking, allocating, and optimizing the cloud infrastructure costs tied to building, running, and scaling AI workloads. It differs from traditional cloud cost optimization because AI infrastructure behaves differently at every layer of the stack. The biggest problem isn’t overspending. It’s that most organizations can’t see where their AI spending is going.

A Tour of Cortex

Get a guided tour of Cortex, the Engineering Operations Platform built to help teams improve operational maturity and reduce developer friction. This video covers the core features of Cortex: the Catalog, Scorecards, Initiatives, engineering metrics, and Workflows. Each one maps to the three things any great EngOps platform needs to do: provide clarity, drive improvement, and remove friction. Ready to see it in action? Visit our website: cortex.io Book a custom demo: cortex.io/demo.

Real-Time Visibility, Orchestrated Deployments, and More

The latest VirtualMetric DataStream release brings a significant step forward in platform observability and deployment flexibility. Version 1.9.0 gives security and infrastructure teams direct visibility into what’s happening across their pipelines in real time while expanding support for cloud-native environments and broadening connectivity options. Here’s what’s new.

Load Testing: An Essential Guide for 2026 | Harness Blog

This comprehensive guide covers the fundamentals of load testing, key differences from stress and performance testing, step-by-step execution methods, popular tools, and best practices to help teams build resilient systems with confidence. In today's always-on digital economy, a single slow page or unexpected crash during peak traffic can cost businesses thousands or even millions of dollars in lost revenue, damaged reputation, and frustrated customers.

The "scanner report has to be green" trap

In the modern DevSecOps world, CISOs are constantly looking for signals in the noise, and the outputs of security scanners often carry a lot of weight. A security scan that returns a “zero CVE” report often unlocks promotion to production; a single red flag can block a release. This binary view of security has birthed two diametrically opposed philosophies. On one side, we have the long-term support (LTS) approach: stay on a battle-tested version and backport specific security fixes.

What is Disaster Recovery Testing? Explained in 60 seconds | Resilience Testing | Harness

What happens when things suddenly break in your system? In this short video, we explain disaster recovery testing in simple terms. Learn why it matters, how it helps you stay prepared, and how you can make sure your system gets back up quickly when something goes wrong. Watch to understand the basics in under a minute.

Lowering PUE: Building Envelope Efficiency in Edge Computing Units

Edge computing is changing how we handle data across the globe. Smaller units closer to the user need smart cooling to stay efficient. Compact systems handle big tasks in small spaces without needing giant server rooms. Power Usage Effectiveness (PUE) tracks how much energy goes to IT versus support. Improving the outer shell of units helps keep costs low. High efficiency is a goal for every tech site, and it saves money.

The Observability Gap: Why Monitoring Data Should Drive Tests

Most teams already know a lot about production. They have dashboards. They have traces. They have alerts. They have enough telemetry to explain what happened after an incident and enough graphs to argue about it for the rest of the week. Then they go to test a change and start from scratch. The integration tests hit a hand-written mock that returns {"status": "ok"}. The load tests replay a CSV somebody exported months ago. Staging is close enough to production right up until it matters.

The Secret to 10x Faster API Testing #speedscale #apitesting #api #automation #production

Stop living in the past. See how to use real production traffic to automate your API testing with zero code changes. Replay real-world patterns in your CI/CD and catch regressions before your users do. Learn more: speedscale.com.

QA, AI, and the return of the adversarial mindset

The best QA engineers are always asking themselves (and others around them) what might break. When engineering teams shifted to agile delivery, that mindset largely moved out of dedicated roles and into the background. Automated testing took over the repetitive work, developers owned quality end-to-end, and velocity improved. What didn't carry over was the habit of looking at a feature and asking how a real user, an edge case, or unexpected load might expose it.

Arazzo vs Traditional Chatbots: What Actually Works?

What happens when you give an AI agent hundreds of API endpoints and hope it figures out the right workflow? Spoiler: it nearly gets it right... but never reliably. In this talk, Frank Kilcommins (Head of Enterprise Architecture at Jentik and co-author of the Arazzo Specification) breaks down why API documentation quality is the core knowledge problem holding agentic systems back (and how Arazzo solves it).

#054 - From Shiny Objects to FinOps: Taming Cloud Costs in the AI Era with Josh Schlanger (CloudX...

In this episode of the Kubernetes for Humans podcast, we are joined by infrastructure and FinOps expert Josh Schlanger. Drawing on over 15 years of experience across Martech, e-commerce, and health tech, Josh shares why solving core business problems should always take priority over chasing new, "shiny object" technologies.

Women's Day Panel: Navigating the Future of Engineering in the Age of AI

How is AI reshaping engineering—and what does it mean for the future of work? At our first GTA Boston Hub event of the year, we brought together engineering leaders from Boston Consulting Group and Athenahealth to dive into one of the most pressing topics today: the rise of generative AI. In this panel, we explore: Key takeaway: This isn’t “human vs AI”—it’s human augmented by AI. The real advantage lies in how we adapt, collaborate, and lead in this new era.

AWS VPC Peering Vs. Transit Gateway: Which To Choose And Why [2026]

VPC peering can be simple and cost-effective in smaller setups. For growing multi-account platforms, Transit Gateway can offer predictable structure and centralized governance. But that’s not all. AWS VPC peering connects two VPCs directly with no hourly fee — simple and cost-effective at small scale, but it creates an unmanageable mesh as your VPC count grows.

Build Numbers That Actually Make Sense: Branch-Scoped Sequence IDs in Harness CI | Harness Blog

You're tagging Docker images with build numbers. -Build is your latest production release on main. A developer pushes a hotfix to release-v2.1, that run becomes build. -Another merges to develop, build. A week later someone asks: "What build number are we on for production?" You check the registry. -You see,,, on main. The numbers in between? Scattered across feature branches that may never ship. Your build numbers have stopped telling a useful story.

Jensen Huang's warning: lead the AI transition - or finance it

The wrong people got the most attention from Jensen Huang’s comments last week. Huang told the All-In Podcast that he’d be “deeply alarmed” if a $500,000 engineer consumed less than $250,000 in AI tokens annually. Within 48 hours, the discourse collapsed into a compensation debate.

£10M Investment in UK AI Infrastructure | Pulsant CEO Talks to Data Centre Solutions

Join Pulsant CEO Rob Coupland in an exclusive interview with Phil Alsop, Editor at Data Centre Solutions (@datacentres), as they explore Pulsant’s £10 million investment in the Milton Keynes data centre. This upgrade delivers high-density, sovereign computing capacity, helping businesses accelerate AI and tech projects while keeping data secure and local. Rob also shares plans to expand this high-density model across the UK, supporting enterprise AI at scale and boosting local economies.

AI Deployment in Production: Orchestrate LLMs, RAG, Agents | Harness Blog

For the past few years, the narrative around Artificial Intelligence has been dominated by what I like to call the "magic box" illusion. We assumed that deploying AI simply meant passing a user’s question through an API key to a Large Language Model (LLM) and waiting for a brilliant answer.

Groq vs. GPUs: The future of AI inference in 2026

Back in 2016, Jonathan Ross founded Groq, the AI chip startup, which went on to enter a non-exclusive licensing agreement with NVIDIA for Groq’s inference technology (as part of a $20 billion deal). The name ‘Groq’ is commonly confused with X (formerly Twitter)’s Grok, which was launched in 2023 as a Gen AI chatbot. As demand for real-time AI continues to grow, inference has become one of the most important and expensive parts of the machine learning lifecycle.

LiteLLM Compromise: Securing AI Pipelines from PyPI Supply Chain Attacks | Harness Blog

On March 24, 2026, the AI open-source ecosystem was impacted by a critical supply chain attack involving the widely used Python package LiteLLM. Attackers compromised the LiteLLM PyPI distribution pipeline and published malicious versions (notably in the 1.82.7-1.82.8 range), embedding a multi-stage payload designed to steal credentials and execute remote code.

dotConnect for Zoho CRM | Connect Zoho CRM to .NET Apps Easily

Bring Zoho CRM data into your.NET applications with dotConnect for Zoho CRM—a fast, reliable ADO.NET provider built for secure and efficient connectivity. Set up in minutes using a Windows installer or NuGet, then explore and manage your CRM data from Visual Studio. This video walks you through connecting via Server Explorer, authenticating with Interactive OAuth, running queries through standard ADO.NET workflows, and integrating with popular ORMs like EF Core and Dapper.

Edging closer: the tech trends shaping digital ambitions now

Ahead of his participation in techUK’s Digital Transformation from the Edge to the Cloud event, we sit down with Pulsant CTO Mike Hoy to ask him how distributed cloud and edge are reshaping the digital ambitions of UK businesses. Q: So Mike, what are the main issues firms face in designing/redesigning their digital infrastructure in 2026?

How to route incidents based on what their payload says

Every incident arrives with a payload, and that payload usually tells you far more than whether something broke. It points to which service is affected and how serious the issue looks. It also carries context about which customers are on the receiving end of that failure. The service name, severity, customer context — all of it can feed directly into routing decisions. This guide explores how to read those parts of the payload and use them to route incidents automatically.

Automating Employee Offboarding: Simplicity Just One Click Away

Employee offboarding looks simple from the outside. Someone gives notice, HR processes the paperwork, and IT handles the rest. In practice, "the rest" is where things get complicated. A single departure can touch dozens of systems, and the handoffs between them (between HR and IT, between tools, between teams) are exactly where access stays on longer than it should, steps get missed, and audit findings show up months later.

Announcing the Next Chapter for Bitbucket Pipelines Runners

In December 2025, we announced our intention to introduce pricing for self-hosted runners so we could provide stronger support and keep investing in new features and ongoing improvements. You’ve told us that having a free option is important. As a result, we’re introducing a new operating model that lets you continue using self‑hosted runners for free with the option to upgrade to a paid premium runners tier as your needs grow.

In a world built by code, design lives between the lines

Design is the art of solving problems; open source makes that visible. In this video, Open Source Designer Eriol Fox dives into the pragmatic world of design and usability within the FOSS ecosystem. We discuss how product designers and user researchers are driving long-term software sustainability through accessibility and smarter design.

Feature Friday: How to Track GitHub Copilot Adoption with Cortex Scorecards

Are you getting the most out of your GitHub Copilot investment? In this week's, Cortex Engineer Aaron Warrick demonstrates how to turn "AI adoption" from a buzzword into a measurable metric. Using the CQL (Cortex Query Language) Query Builder, you can now pull real-time GitHub Copilot data into your service maturity scorecards. In this video, we cover: How to use the new AI Tools Analysis in the CQL Query Builder.

Rob Zuber on quality, metrics, and what it means to move in the right direction at CircleCI

In this episode of Braintrust, Cortex co-founder and CTO Ganesh Datta sits down with Rob Zuber, CTO at CircleCI. Rob shares how the industry's move away from dedicated QA has cost teams more than they realize, and explains how AI is changing what good software quality actually looks like.

NVIDIA DGX vs. NVIDIA HGX: What is the difference?

While GPUs remain among NVIDIA's flagship products, they also offer a range of other compute products beyond the dedicated graphics cards for which they are known. If you are unfamiliar with the words DGX or HGX, this blog is for you. Throughout this blog, we will cover what these terms mean in practice and when you should be using them.

Application Portfolio Assessment vs AWS Transform

Organizations starting AWS Migration Acceleration Program (MAP) Phase 1 face a tooling choice. Which assessment platform should they use? Most start with AWS Transform. It is built by AWS. It produces migration cost estimates, instance recommendations, and agentic execution for VMware, .NET, mainframe, and other workloads. This is half the picture.

What's under development in Redgate Flyway

In my recent posts, I looked back at the major features we shipped in 2025, highlighted all the exciting things going on in Flyway for Oracle databases, and shared the recent improvements to tracking dependencies for both SQL Server and Oracle databases. But that’s only part of the story. With four development teams working on Flyway, there’s a lot happening. Here’s a look at what’s recently been released, what’s in public preview, and what’s coming next.

Redgate Flyway's Product Updates - March 2026

This is a guest post from Maxime Drobot. This month we’re bringing you official GitHub Actions for Redgate Flyway, usability improvements in Flyway Desktop, and a look at what’s new, what’s in preview. Plus: earlier visibility of code‑review results, helping teams keep quality high and reviews flowing smoothly as AI increases the volume of changes.

How Harness AI Helps Scale Platform-Wide Support | Harness Blog

--- Key Takeaway: Harness AI helped deflect 95% of the platform support tickets for a major financial institution --- These days, success is often measured by what doesn’t happen: When things go right, the software delivery platform is invisible. But what happens when an organization’s delivery velocity increases multifold? Can the platform still stay out of the way?

How to Plan a Successful CI/CD Migration Without Disrupting Developers | Harness Blog

Modern engineering teams run on CI/CD. It’s where pull requests get validated, artifacts get produced, and releases get promoted to production. That also makes CI/CD migration very risky because you're not just moving a "tool"; you're moving the workflow that developers use dozens or hundreds of times a day. The good news: disruption is optional.

A new Host Map for modern infrastructure

A host map is a visual representation of your infrastructure that displays hosts and related resources such as clusters, pods, and containers in a single, interactive view. We introduced the Datadog Host Map more than a decade ago to help you “know thy infrastructure” and answer critical questions: Does everything look healthy? Has anything changed? Does the shape of my environment match what I expect?

CertKit Keystore: Private keys that never leave your infrastructure

When you use CertKit, your private keys live in CertKit’s database, encrypted at rest. We’ve written about why the actual risk is smaller than it sounds. But some organizations have policies that prohibit storing private keys with any third party, regardless of how they’re protected. That policy isn’t going away. The Local Keystore enables those organizations to use CertKit and still keep their keys local.

CloudZero Brings Cloud Cost Intelligence to 13 AI Coding Tools - Cursor, Copilot, and More

Earlier this month, we announced the CloudZero Claude Code Plugin and the CloudZero AI Hub — the first step toward putting your cloud cost data directly inside the AI tools your team already uses. The feedback from customers was clear. They said engineers and FinOps teams wanted more tools and more ways to get answers from CloudZero without switching context. Today, we’re delivering more.

What Are AI Inference Costs? [And How To Manage Them]

If you’re building or running AI-powered features in production, you need a clear understanding of inference costs. Get it right, and you can turn your AI investments into profitable growth. As Larry Advey, Director of Cloud Platform and FinOps at CloudZero and a member of the FinOps Foundation Technical Advisory Council, puts it: “AI investments will only continue to grow.

Resolve's Agents of IT podcast - Ep. 15 - Nora Osman, CEO of Norvana

What separates average IT support from truly exceptional service? In this episode of Agents of IT, Ari Stowe sits down with Nora Osman, CEO of Norvana, to unpack how the best organizations are transforming service delivery by combining AI, automation, and human empathy. Nora shares real-world lessons from leading large-scale service transformations, including how a simple shift in perspective turned a struggling service desk into a high-performing customer experience engine. Her approach is clear. Technology alone is not enough. You need context, empathy, and purpose.

How to Automate Your Entire Cloud Deployment Lifecycle with IaC

In today's digital world, businesses depend on cloud infrastructure to run applications, manage data, and deliver services smoothly. However, managing cloud environments manually can quickly become complex and time-consuming. Teams often deal with repeated tasks, inconsistent setups, and unexpected errors.

Production Data Access for Developers: RBAC and DLP

If you run a software engineering tools team, you have almost certainly had this conversation: a developer asks for production data access to debug a real incident, and someone in the room says no. Not because the request is unreasonable (it isn’t), but because nobody wants to be the person who said yes when something goes wrong. That instinct is understandable. Production environments carry real risk. But the reflex to lock everything down has a cost that rarely gets accounted for.

AI in DevOps: How MCP and Puppet Are Changing Infrastructure Automation

AI adoption in DevOps is accelerating, but trust, accuracy, and real-world usability still matter. In this conversation, Jason St-Cyr sits down with Jessica Gao, Product Manager at Puppet, to unpack how AI is actually being used in infrastructure and operations teams today, and what’s changed over the last 12–18 months. They dive into why enterprises are moving past generic code generation tools and toward domain-specific, MCP-powered AI that integrates directly into existing workflows.

Rolling Deployments Explained: Seamless Software Delivery

In this video, Eric Minick from Harness explains the fundamentals of rolling deployments and how they help maintain a seamless user experience during software updates. Key topics covered include: Whether you are looking for simple implementation or consistent application uptime, rolling deployments offer a powerful strategy for modern software delivery. Learn more about Rolling Deployments and Harness Continuous Delivery.

Bank cloud migration without a feature freeze

How financial institutions can escape the "Big Bang" migration trap and keep shipping features the entire time. Every bank executive knows the math. Legacy core systems cost more each year, slow product launches, and widen the gap between what customers expect and what the institution can deliver. Over 50% of banking executives say their current systems can't support long-term digital strategy. The case for modernization is airtight. So why do most hesitate?

DORA exit strategy for financial services: portable cloud architecture with Upsun

Financial institutions are required to prove they can operate safely in the cloud without becoming dependent on a single technology provider. What happens if your cloud provider fails, or you are required to move? The question used to be theoretical. However, since January 2025, it has become a compliance requirement.

Why Fintechs are moving to automated compliance

Manual compliance work is a hidden drag on delivery speed for fintechs and regulated institutions. There is a faster path. Companies handling payment data know the cycle: every new feature requires security audits, evidence collection, and control verification before release. The traditional approach to building a compliant stack means taking on every layer yourself.

How the Data Center is Evolving in 2026

From status facilities to distributed platforms, we take a practical look at the data center trends shaping 2026 so far. In 2026, data centers are crossing a tipping point. What were once emerging trends—software-defined infrastructure, AI-driven operations, sustainability constraints, and edge expansion—are now widespread reality, shaping real-world designs and buying decisions.

Announcing HAProxy Unified Gateway 1.0

Today at KubeCon Amsterdam, we are announcing the 1.0 release of HAProxy Unified Gateway, incorporating valuable community feedback from our beta users. HAProxy Unified Gateway delivers unified, high-performance, cloud-native application routing backed by an open-source community with 25+ years of experience.

What Is IT Automation & Orchestration (and How Do I Get Started)?

So, you've been tasked with automating one or more of your tedious, time-consuming IT processes… but what exactly does that mean? And perhaps more importantly, where on earth do you start? IT automation and orchestration can cover a broad spectrum of potential use-cases, ranging from the Service Desk to the NOC, to Infrastructure, and well beyond.

The Complexity Rebound

Redgate’s annual “State of the Database Landscape” survey has been published. Like every other year, it paints a really interesting picture. Personally, I love looking through this in order to better understand where people are experiencing pain in the management of their data. If you know where people are experiencing pain, as a technical person, you know where to focus your own skill development.

CI/CD best practices | Harness Blog

Modern software teams are under constant pressure to ship faster without breaking production. That’s why CI/CD best practices have become essential for high-performing DevOps organizations. Continuous integration and continuous delivery (CI/CD) help automate builds, testing, and deployments — but simply installing a pipeline tool isn’t enough. Without the right practices, pipelines become slow, flaky, and difficult to govern.

Flaky Tests: The Quiet Killer of Productivity in Your CI Pipeline | Harness Blog

‍Flaky tests are automated tests that pass or fail inconsistently without changes to the code. In this guide, you’ll learn why flaky tests happen, how to detect them automatically in CI pipelines, and how modern platforms prevent them from slowing teams down. Your test went well three times yesterday. It didn't work this morning. You ran it again without changing anything, and now it works. Congratulations, you've just passed a flaky test, and now someone's day is going to be ruined.

Multi-Agent AI SRE Has Landed and Its Built for Your Most Complex Stacks

Once upon a time, a monolith running on a handful of servers meant that incident management, even at 2:17 AM, was something a single generalist could handle. One person with enough context across the stack could reasonably diagnose whether the database was choking, a config had changed, or a server was running hot. They’d fix it and go back to sleep.

Deployment strategies: Types, trade-offs, and how to choose

A deployment strategy is the method a team uses to move new code into a production environment. It determines how traffic shifts between versions, how much risk each release represents, and how quickly the team can roll back when something breaks. The choice isn’t academic: a mismatch between strategy and system can mean downtime, failed rollouts, or hours of manual recovery.

10 Best Snowflake Monitoring Tools (Updated 2026)

Snowflake is a cloud data platform designed for large-scale analytics, data warehousing, and data processing. It allows teams across an organization to run multiple data workloads on a single platform without managing infrastructure. Snowflake’s architecture is also unique. Compute and storage are completely independent and highly elastic. However, Snowflake’s per-second pricing and elastic compute model make costs highly sensitive to usage.

Kubernetes multi-cluster: the Day-2 enterprise strategy

A multi-cluster Kubernetes architecture distributes application workloads across geographically separated clusters rather than a single environment. This strategy strictly isolates failure domains, ensures regional data compliance, and guarantees global high availability, but demands centralized Day-2 control to prevent exponential cloud costs and operational sprawl.

Hyperview Data Center Asset Auto-Discovery: Real-Time Visibility Starts Here

Get a closer look at how Hyperview’s Asset Auto-Discovery simplifies data center infrastructure management by automatically identifying connected assets across your environment. This tour shows how you can save time, improve data accuracy, and gain the visibility needed to manage capacity, power, and change with confidence.

CMMC Requirements for 2026: How to Stay CMMC 2.0 Compliant & Prove Maturity at Any Level

CMMC requirements have been shifting recently, with a new version of the Cybersecurity Maturity Model Certification (CMMC 2.0) and distinct levels requiring distinct controls. Mandatory for practically any organization doing business with the US Department of Defense (DoD), CMMC is unavoidable all along the DoD’s supply chain.

Stop Vibe Coding Everything: The Case for Spec-Driven Dev

Spec-driven development with AI coding agents could change how you build software. In this GitKon 2025 talk, Erik Hanchett, Senior Developer Advocate at AWS, breaks down why AI coding assistants perform dramatically better when they start with structured specifications instead of raw prompts. If you've been vibe coding your way through complex features and wondering why your AI keeps going off the rails, this is the video for you.

Is Website Hosting Worth It for New Businesses? Security, Risks & Performance

For many new businesses, building an online presence is no longer optional-it is essential. Whether you are offering products, services, or information, a website helps establish credibility and reach a wider audience. In competitive markets like Singapore, having a reliable website can make a significant difference in how customers perceive your brand. However, one common question among startups is whether investing in website hosting is truly necessary. Concerns about cost, security, and technical complexity often lead businesses to delay or overlook this decision.

FastAPI Testing: Mock LLM APIs for Free

Testing a FastAPI app that calls OpenAI, Anthropic, or Gemini gets expensive fast. The problem is not just the API bill in production. It is all the repeated traffic in development: prompt tweaks, CI runs, regression checks, and the load tests you keep putting off because every run burns tokens. Hand-written mocks do not help much once the app is doing multi-step LLM work.

What Is a DevOps Pipeline? Stages, Benefits, and CI/CD Explained | Harness Blog

A DevOps pipeline is a critical part of modern software delivery. It is a series of automated steps that move code from commit to production quickly, reliably, and consistently. At its core, a DevOps pipeline is a system that helps teams build, test, and release apps in an easier way. It cuts down on manual work and mistakes. This helps teams send out updates more often, make better software, and react quickly when the business needs change.

Birol Yildiz on Autonomous Incident Response and the Future of AI SRE | Harness Blog

At SREday NYC 2026, the ShipTalk podcast welcomed Birol Yildiz, Co-founder and CEO of ilert, for a conversation about the next evolution of incident response. In the episode, ShipTalk host Dewan Ahmed, Principal Developer Advocate at Harness, spoke with Birol about how artificial intelligence is transforming reliability engineering—from simply assisting engineers during incidents to autonomously diagnosing and resolving outages.

Code Coverage: Measure, Improve, and Scale Quality in CI | Harness Blog

Most engineering teams know the difference between “we have tests” and “we know we’re well-tested.” Your CI builds may be green, but without code coverage, it’s hard to prove how much of your code is actually exercised by automated tests. Code coverage measures what percentage of your code runs during tests (lines, branches, and functions), and when you wire it into CI gates, it becomes an enforceable quality signal and not a vanity metric.

How to Drive Internal Platform Adoption Developers Love | Harness Blog

Internal platform adoption usually doesn’t fail because developers “hate standards.” It fails because the platform doesn’t make their day easier. If your portal still means waiting, waiting on an environment, waiting on an approval, waiting on the platform team, it becomes one more tab that people stop opening. But if the platform lets engineers get the common stuff done quickly (with guardrails that keep things consistent), they’ll come back on their own.

Drastic RAMifications: how UK businesses can weather the global memory shortage

Tech headlines are being dominated by the perfect storm that has led to a global shortage of Random Access Memory (RAM). As the short-term, temporary memory that handles data for processing and applications, RAM – and specifically Dynamic Random Access Memory (DRAM) – is a foundational business technology. The primary driver of this shortage is an industry-wide shift to High-Bandwidth Memory (HBM). This is the specialised memory required for artificial intelligence (AI).

How Vibe Coding A Self-Help App Made Me An AI Believer

For longer than I’m proud of, I was an AI skeptic. Then, over the holidays, I vibe coded an app whose sole purpose was to make me a better person. The app is a motivator. It’s programmed to send me timely reminders along certain themes, like reading every day, making healthy eating choices, and giving myself plenty of time to plan for anniversaries and birthdays.

NVIDIA's Jensen Huang just described your next big cost problem

On March 18, Jensen Huang took the stage at NVIDIA’s GTC conference in San Jose for a keynote that ran well over two hours — covering everything from CUDA’s 20-year history to humanoid robots that may one day wander Disneyland. But buried inside the spectacle was a remarkably clear-eyed articulation of the economic forces now bearing down on every enterprise that builds on cloud infrastructure.

Certificate distribution is the last mile nobody solved

Certbot is good software in the classic Linux tradition: it does one thing simply and expects you to chain it together with everything else. One server, one certificate, done. The trouble is that most environments are not simple. And the moment yours isn’t, you discover that renewing a certificate and getting it deployed are two different problems, and deployment is your problem.

Why AI Driven Automation Can't Wait

Operators today are navigating unprecedented complexity—rising costs, accelerating customer expectations, and increasingly dynamic networks. In this recent video interview, my colleague Kevin Wade and I explore why AI‑driven automation has shifted from a “nice‑to‑have” technology to a core business requirement for telecom operators and beyond.

Hot code burns: the supply chain case for letting your containers cool before you ship

In September 2025, dozens of popular JavaScript packages, like chalk and debug, were compromised on the npm registry. These packages are so ubiquitous they end up in everything: front-end apps, back-end microservices, and CI tooling. Developers didn’t do anything wrong, they just ran the same command they always do: npm install chalk. But then the malware arrived silently. This wasn’t a bug in an operating system. It wasn’t a virus on someone’s laptop.

What is Virtana Application Observability and how is it different?

Application Observability, Built for Hybrid Reality Modern applications don’t live in one place. A single transaction might span: Traditional APM shows you the trace. But hybrid reality doesn’t stop at the service layer. True application observability ties transactions to the infrastructure that actually delivered them across cloud, on-prem, and everything in between. Because in hybrid environments, the root cause rarely lives in just one tier.

Introducing hosted control planes on Konstruct

For seven years, I've watched the same pattern. An organization decides it needs a platform and assigns two of its best engineers. They estimate it will take three months, but eighteen months later, they're still integrating ArgoCD with their secrets manager, still debugging Crossplane providers, and still arguing about how to structure the GitOps repo. What’s happened is they’ve built something that works for one team and can't be repeated for a second.

New Redgate Flyway GitHub Actions: Faster setup, safer deployments

This is a guest post from Stephanie Herr. If your team uses GitHub Actions to ship application code, you've probably wished your database changes could move through the same pipeline just as smoothly. Today, that's easier than ever. We've launched verified Redgate Flyway GitHub Actions on the GitHub Marketplace, giving you a simple, reliable way to integrate database deployments into your existing GitHub workflows.

Best VPS Providers in 2026: Ranking by Price, Performance, and Support Quality

Choosing a VPS provider directly impacts application performance, service availability, and overall infrastructure costs. While dozens of companies offer virtual servers, the real differences in performance, server geography, and support quality are substantial.

Beyond Traditional Banking: Buying a Dedicated Server with Crypto

As digital projects scale, the standard virtual private server often becomes insufficient to handle the increased traffic, complex database queries, and intensive computational workloads. When a business reaches this critical growth phase, migrating to a dedicated server is the logical next step. However, for those who value corporate privacy and wish to operate outside the traditional, heavily monitored banking system, the ability to buy dedicated server with crypto is an absolute game-changer.

How to set up Incident Alert Routing rules effectively

When an incident triggers, the question is not just what broke but also how urgent it is and who on your team needs to respond. Alert Routing rules answer those questions automatically. You define the conditions once and the right response follows every time an incident triggers. Every Alert Routing rule does one or more of these three things: Three conditions drive all of it: incident payload, time of occurrence, and frequency.

The Hidden AI Bill: Why Non-Prod LLM Costs Spiral

Most teams know they are spending money on AI in production. Far fewer realize how much they are spending outside production. It’s easy to get lost as you evaluate which model has the best responses, is fast enough, and cheap enough to run in production. That is because the AI bill usually shows up as a giant blob. It is easy to see the total.

Harness AI for Argo CD

Managing GitOps at scale shouldn’t feel like an endless game of "Whac-A-Mole." In this 3-minute demo, we show how Harness AI moves beyond simple syncs to provide agentic troubleshooting and automated orchestration for your entire GitOps estate. Watch as we use the Harness DevOps Agent to: Identify Common Failure Patterns: Instead of clicking through individual clusters, we ask the AI to analyze 4 out-of-sync applications simultaneously.

FinOps Leaders Who Will Win The AI Era Are Already Experimenting

Engineering teams are shipping faster than ever. AI coding tools like Claude Code and OpenAI’s Codex have quietly removed some of the biggest friction points in the development cycle — and the result is that FinOps teams are being asked to keep up with a pace most practitioners haven’t fully reckoned with yet. That acceleration has a cost consequence. More shipping means more services, more experiments, more infrastructure spun up without review cycles.

Resolve's Zero Ticket Minute - Ep. 15 #agenticai #itautomation #aiautomation

Agentic AI can act, decide, and resolve on its own. Powerful, but it comes with real risk. Without the right guardrails, autonomy can lead to data exposure, compliance issues, and unintended actions at scale. This episode shows how to keep control with governed, deterministic automation.

Feature Friday: Using Entity Relationship Types to Map the Perfect March Madness Bracket

Can an EngOps Platform help you win your family March Madness pool? It’s that time of year again, and the stakes are high. Despite watching hundreds of hours of basketball, we're tired of losing bets to a mom who barely follows the sport. This year, we’re leveling the playing field using Cortex. In this, we’re moving beyond microservices to show you the power of Entity Relationship Types. Watch as we take a chaotic tournament bracket and turn it into a structured, navigable engineering ecosystem.

Day 2 operations: an executive guide to Kubernetes operations and scale

Kubernetes success is determined by Day 2 execution, not Day 1 deployment. While migration is a bounded project, maintenance is an infinite loop that often consumes 40% of senior engineering capacity. To protect margins and velocity, enterprises must transition from manual toil to agentic automation that handles scaling, security, and cost.

How to Set Up AWS Direct Connect with Megaport

This step-by-step guide shows how easy it is to connect to AWS using Megaport. In this video, we cover: Provisioning an AWS Direct Connect Port in your chosen data center Selecting your AWS region Finding your ASN and AWS Account ID in the AWS Management Console Specifying your VLANs Attaching the virtual gateway for your VPC.

Intelligent Caching for CI/CD Build Optimization | Harness Blog

‍ We've all been there. You push a PR, grab coffee, check Slack, maybe start a side conversation — and your build is still running. Multiply that across a team of 50 engineers, and you're looking at hours of lost focus every single day. Slow CI/CD builds don't just waste time. They generate a steady stream of "CI is slow" tickets that eat into your platform team's roadmap. Intelligent caching is one of the fastest ways to break that cycle.

Parallel Execution in Modern CI: Best Practices & Results | Harness Blog

Definition: Parallel execution in CI is the practice of running independent build, test, or deployment tasks concurrently to reduce feedback time, improve resource utilization, and control infrastructure costs. Developers often spend almost half their time waiting for builds that could be faster. Simply adding more resources is not enough. Real improvements come from planned parallelism, using concurrency together with test intelligence, caching, and strong governance.

VirtualMetric DataStream + Splunk: Pre-Ingest CIM Normalization Without the TA Tax

Splunk is built around a deceptively simple premise: get your data in, search it, and act on it. In practice, the gap between “get your data in” and “data that actually works in Splunk ES” is where most of the engineering effort goes. CIM normalization is non-trivial. Technology Add-on development is slow. Volume-based licensing penalizes growth. And the combination means that as environments expand, Splunk becomes harder to operate efficiently.

Back to fundamentals: 7 insights from Kelsey Hightower at HAProxyConf

Early in his career, Kelsey Hightower made a bet. The load balancer his team was running was consuming too much memory, and he was convinced he knew the fix. He told his manager: “If it doesn’t work, fire me. But I think I can make it work.” The fix was HAProxy. It was a story he shared publicly for the first time at HAProxyConf 2025, where he delivered a keynote address, “The Fundamentals.”

Margaret Hamilton Coined "Software Engineering" Because Code Deserves the Same Rigor as Bridges

During International Women’s Month, we celebrate women whose technical work changed entire industries. But the lessons from engineers like Margaret Hamilton aren’t seasonal, they’re fundamental to how we should approach software development every single day. Margaret coined the term “software engineering” and built the code that landed humans on the moon. Her approach to rigor is as relevant to your next Git commit as it was to Apollo 11’s descent engine.

How to migrate your paging tool without breaking your team

Most engineering teams don’t migrate their on-call and paging systems unless absolutely necessary. No matter how painful their current solution, it's one of those changes that people put off for as long as possible because the cost is real. The disruption, the retraining, the risk of missing a critical page during the transition. It's not something you do on a whim.

Introducing MicroCloud Cluster Manager

Today, we’re excited to introduce the beta release of MicroCloud Cluster Manager, a new way to discover, organize, and operate your MicroCloud environments from a single, unified interface. MicroCloud is an open source cloud platform that makes it simple to create lightweight, resilient clusters anywhere. As teams scale from one cluster to many, visibility and coordination quickly become essential. Cluster Manager is built to solve exactly that.

Real-Time Data: The Engine of Efficient, Sustainable Data Centers

Imagine knowing every detail of your data center as it happens. Real-time data makes this possible. You can monitor systems, track performance, and adjust resources on the fly. This proactive approach leads to smoother operations and reduced downtime. By constantly having up-to-date information, you can maintain peak efficiency in your facility. Such insights allow you to optimize cooling and power use, which are crucial to keeping costs down.

The Future of UK Digital Infrastructure | Pulsant CEO Rob Coupland Interview

In the latest Platform Insight interview, Pulsant CEO Rob Coupland discusses Pulsant’s evolution into the only truly UK‑wide, interconnected data centre platform. The conversation with Nicola Hayes of @PlatformMarketsGroup explores the growing momentum behind edge and sovereign infrastructure, and why real‑world enterprise demand is shaping the next phase of AI. If you want to understand where the UK digital infrastructure landscape is heading – and why regional platforms matter more than ever – this is a must‑watch.

CI Pipeline Optimization Guide for Platform Engineering Leaders | Harness Blog

Definition: CI pipeline optimization is the practice of reducing build and test time and the cost per build by running only what matters, reusing unchanged components, and enforcing standardized governance. Platform teams are wasting thousands of hours every year because their pipelines aren't working right. Developers wait 45 minutes for builds. Jenkins consumes 20% of your team's capacity on maintenance.

Architecting MCP for AI Agents: Lessons from Our Redesign | Harness Blog

-- Key Takeaways: The Harness MCP server is an MCP-compatible interface that lets AI agents discover, query, and act on Harness resources across CI/CD, GitOps, Feature Flags, Cloud Cost Management, Security Testing, Resilience Testing, Internal Developer Portal, and more. -- The first wave of MCP servers followed a natural pattern: take every API endpoint, wrap it in a tool definition, and expose it to the LLM.

How to Manage Icinga with Ansible Webinar

Managing monitoring environments shouldn’t be a manual chore. In this hands-on webinar, we show you how to fully automate your Icinga infrastructure using the Ansible Collection for Icinga. We take you step by step through everything from installing Icinga 2 to configuring master instances, setting up monitoring agents, building core objects, and integrating common components like Icinga Web, all driven by Ansible.

What's New in Turbo360 - AI agents for Azure cost optimization, Azure cost pulse summary report...

Turbo360 brings a suite of enhancements added to elevate your Azure management experience. Hit play to hear what's in store for this month. 00:00:00 - Intro 00:00:13 - Cost Pulse Summary Report 00:00:49 - Configuring Cost Pulse Summary 00:01:17 - New AI Agents (4 New Agents) 00:01:54 - Accessing AI Agents 00:02:18 - Related Resources Feature 00:02:40 - Budget Planner 00:02:59 - Setting Up Budget Planner Permissions 00:03:11 - Multi-Subscription Onboarding 00:03:43 - AI Agents Role-Based Access 00:04:10 - New RA-GRS Optimization Recommendation 00:04:30 - Summary & Call to Action.

Our key takeaways from NVIDIA GTC 2026

Every year, NVIDIA GTC offers a glimpse into the future of computing. But this year felt different. The conversations from the past few days point to something bigger than faster GPUs or larger models. The industry is shifting its mindset entirely. GTC 2026 made it clear that the goalposts for AI haven't just moved, they’ve been uprooted. We’re past the point of talking about "faster chips." Everything points to a total shift in the industry's DNA.

What to Expect When Attending Your First Network Operator Group (NOG)

Your first NOG meeting doesn't have to be daunting. Here's what to expect, how to prepare, and how to make the most of every session and conversation. Co-authored by Rob Parker and Gavin Tweedie If you’re trying to optimize your organization’s peering and Internet Exchange (IX) traffic engineering as your network grows, you may be searching for more ways to improve your network or customer experience.

Resolve Reels - Ep. 1 - The Agentic API Caller

What if you could go from request to result, instantly? No workflows to build. No APIs to chain. Just describe what you need. Resolve handles the rest. It selects the APIs, orchestrates the steps, and delivers the outcome in real time. This is agentic automation in action. Welcome to the Autonomous Enterprise. Watch now and see it in action#AgenticAI.

A Faster Way To Spot What's Slowing Down Your PostgreSQL Database

This is a guest post from Kellyn Gorman. Kellyn Gorman is a Database and AI Advocate and Engineer at Redgate She's the previous director of Data and AI at Silk, and the Oracle SME in Azure at Microsoft. With a robust background in cloud technology and a passion for promoting its merits and potential, I am thrilled to spearhead conversations and actions that help shape the future of this industry.

Agentic AI at Scale: Building the Kubex Agentic AI Platform

In the modern cloud infrastructure landscape, we don’t have a data problem; we have an actionable interpretation gap. Engineering teams are often drowning in metrics that describe a crisis without providing a clear path to remediation. Traditional FinOps, SRE, and DevOps work has become a reactive loop of dashboard-watching and manual firefighting.

How to Catch AI Code Mistakes Before They Reach Production

AI can write code fast, but it makes mistakes humans often don't. In this session from Ole Lensmar, CTO of Testkube, breaks down the real quality risks of AI-generated code and how engineering teams can build guardrails before those bugs hit production. What you'll learn: Common mistakes LLMs make (and which ones are unique to AI) Whether you're a developer leaning on AI to ship faster or a QA lead trying to keep up with the pace of AI-generated code, this talk gives you a practical framework for staying ahead of quality issues.

FinOps In Action Playbook For Engineering Personas

In 2025, many teams built strong FinOps foundations: These practices created visibility and control. Now it’s time to elevate. FinOps in Action is a three-part series focused on applying that foundation in real engineering scenarios. Each post highlights a different persona and shows how to move from visibility to operational discipline. Today, we focus on Engineering. Engineering teams influence cost through architecture decisions, scaling policies, and workload design.

Amazon Lex Pricing in 2026 Explained (And Practical Cost Saving Tips to Use Immediately)

If your SaaS product handles 1 million chatbot interactions per month, Amazon Lex alone could cost between $4,000 and $7,500. That range assumes current Amazon Lex V2 pricing of about $0.00075 per text request and $0.004 per speech request. Multiply the requests by the rate, and you’re done. Or are you? Conversational AI services rarely behave that neatly in production — and that includes AWS Lex. Amazon Lex is AWS’s conversational AI service for building chatbots.

Webinar Recap: Building The Finance Function For The Future

Women leaders from CloudZero, Campfire, and Preql AI sat down to talk about what it actually takes to modernize finance in 2026 — AI spend, smarter tooling, and the skills that matter now for finance practitioners and executives looking to manage cloud and AI spend in a rapidly changing and unpredictable financial environment. On March 19, 2026, CloudZero and Campfire co-hosted a virtual panel in honor of International Women’s Month, called Building the Finance Function for the Future.

The Great Cloud Repatriation: Why UK Businesses Are Bringing Data Home

More UK organisations are treating cloud location as a governance risk decision, because incidents and audits expose questions around jurisdiction, access and evidence. Recent research found that 87% of respondents plan to partially or fully move workloads away from the public cloud over the next two years, with 54% considering private cloud, 38% exploring greater reliance on their own data centres, and 36% assessing colocation.

What Are Blue-Green Deployments? | Understanding the Trade-offs

In this video, Eric Minick from Harness explains the fundamentals of blue-green deployments and how they help maintain a seamless user experience. Key topics covered include: Whether you are looking for fast rollbacks or safer production testing, blue-green deployments offer a powerful strategy for modern software delivery. Learn more about Blue-Green Deployments: If you enjoyed this video, consider subscribing to the channel for more videos.

What is operational excellence?

Engineering teams are great at innovating and delivering products, but the work that's required to maintain them over time and keep them running well tends to get deprioritized. Planning processes are designed to move features forward, not to catch whether those features are generating too many alerts, degrading in performance, or creating compliance exposure over time. As a result, that class of work accumulates quietly.

How Does DCIM Software Support Edge Computing, IT Closets, and Distributed IT Environments?

DCIM software supports edge computing, IDF closets, and distributed IT environments by providing centralized asset management, real-time power and environmental monitoring, 3D digital twin visualization, capacity planning, and physical security management across every site from core data centers to remote sites and IDF closets.

Network Documentation: Excel vs. DCIM Software

Spreadsheets and Visio diagrams may work in small, static environments, but they cannot maintain accurate, real-time records at the port level, track relationships between assets, or support the pace of change in modern operations. DCIM software is purpose-built for those demands. In this blog post, we'll cover what network documentation actually requires, where Excel and Visio fall short, and how DCIM software addresses those gaps.

The Art of Prompting in AI Test Automation | Harness Blog

E2E Testing Has a New Bottleneck, and It's Not the Code End-to-end (E2E) testing has always been the hardest part of a QA strategy. You're simulating real users, navigating real flows, validating real outcomes across browsers, environments, and data states that never hold still. Traditional test automation tackled this with scripts: rigid, deterministic sequences tied to element selectors and hard-coded values. They worked until the UI changed. Or the data changed.

Resilience Testing Is Non-Negotiable in the Enterprise SDLC | Harness Blog

Outages in distributed systems are inevitable, making resilience testing essential in the SDLC. It must be continuous, covering failures, load, and disasters. Delayed validation creates “resilience debt,” increasing risk. A holistic approach—combining chaos, load, and DR testing—plus cross-team collaboration and AI-driven insights improves reliability and reduces impact. Modern software delivery has dramatically accelerated.

What are test hooks in AI-native development?

Summary: A test hook connects a test or lint command to an event in your AI coding agent’s workflow. When the event fires, the agent runs the command automatically. If it fails, the agent’s action is blocked. You can wire your existing test commands into your agent’s lifecycle hooks to get deterministic local validation before code ever reaches CI. AI coding agents write code at a pace where stopping to manually run tests breaks your flow.

The silent infrastructure tax: why AI agents will break your legacy cloud

For the first time in a decade, humans are the minority on the open web. In 2025, automated traffic officially crossed the Rubicon to account for 51% of all web activity, while generative AI-driven referrals to retail sites surged by a staggering 693% year-over-year. As we move through 2026, these are no longer just "bot" statistics to be handled by a WAF. They represent a fundamental shift in user behavior. The fastest-growing segment of your audience is now agentic.

How Catalog changes the game for long-term maintenance

Every incident platform needs to know who owns what. Which team owns which service. Which backlog to send follow-ups to. Which escalation path to page when something breaks. The problem is that most platforms encode this ownership logic separately in every configuration: alert routing, workflows, ITSM ticket syncing, and more. Each one maintains its own copy of the same information, in its own format.

Komodor Introduces Extensible, Autonomous Multi-Agent Architecture for AI-Driven Site Reliability Engineering

Out-of-the-box and bring-your-own AI agents that encode operational knowledge boost troubleshooting speed and accuracy across cloud native infrastructure TEL AVIV and SAN FRANCISCO, March 18, 2026 — Komodor, the autonomous AI SRE company for cloud-native infrastructure, today announced a new extensibility framework that transforms its Klaudia AI technology into a universal multi-agent platform for troubleshooting and optimizing performance of complex cloud native infrastructures and applications.

How A Finance Director Found $30K/Month In AI Savings In 10 Minutes

A real workflow showing how Claude + CloudZero MCP turns plain-English questions into actionable cost intelligence — no dashboards, no tickets, no waiting As Director of Finance and Accounting at a software company, my job can be described simply: Understand what we’re spending, who’s responsible, and whether we can get more efficient. But as anyone who’s had to wrangle AI costs knows, doing so for AI is anything but simple.

Introducing FlexCore: Private cloud, zero complexity

Join us for a practical introduction to FlexCore, a fully managed private cloud appliance from Civo that delivers the simplicity and experience of public cloud, directly on your own premises. FlexCore brings managed Kubernetes, compute, storage, networking, databases, and GPU acceleration together in a single, self-contained platform, operated end-to-end by Civo. You provide the space and power; Civo handles everything else.

Let's Tune Our AWS Aurora PostgreSQL Database

In case you don't know the back story, in order for me to play with radios and label it "work," I've created a PostgreSQL database running on AWS Aurora. The db is fed from API calls to aprs.fi through Lambda functions on AWS. Some of the DDL & code is mine. Some is from Claude. Neither of us paid much attention to indexing when we were putting things together.

Who Broke Staging? How to Eliminate Staging Contention Forever

It's 4:17 PM on a Friday. You've been working on this feature for three days. The code is clean, the tests pass locally, and you're ready to ship. You push to staging and... nothing works. The API returns 500s. The database schema doesn't match. The feature flags are in some impossible state. You open Slack. Someone in-environment has already posted: "Hey, who deployed to staging? My tests are all failing now." Three other developers respond.

Code Optimization: The Cloud Always Collects Its $2,000 Tuition Fee

We hear a lot of war stories from the teams we work with. Horror stories about cloud bills, surprise overages, and the infrastructure decisions that seemed perfectly reasonable at the time. This one comes from Erik Dasque, CTO at Allure Security. It involves a junior developer, a Kubernetes CronJob, and a recurring bill that, if not caught, would have happened on a yearly basis.

5 AI And Cloud Cost Problems That Are Now Everyone's Problem

Not long ago, cloud cost was an engineering problem. FinOps teams owned it, finance leaned in occasionally, and everyone else stayed out of it. Now, that’s changed. AI changed who has skin in the game. CFOs get asked about it in board meetings. CEOs field questions on earnings calls. The audience for cloud cost management has exploded — and that means the conversation CloudZero is built to enable isn’t only a technical one, it’s a business one.

Nine Ways to Connect to Cloud Using Private Connectivity

Struggling with cloud complexity? Compare dedicated, partner, and IPsec connections to find the right private connectivity solution. Multicloud environments bring complexity, and how you connect to your CSPs can make or break performance, cost, and reliability. Here’s how dedicated, partner, and IPsec connections compare — and which might be right for your business. There are three main methods of connecting to the cloud with private connectivity.

The hidden reliability risks in your agentic AI workflows

Artificial intelligence recently took a major leap from “saying” to “doing.” Instead of simple back-and-forth chats, we’re now allowing automated AI processes to take action on our behalf—from responding to emails to building and deploying complete applications. This shift from “assistant” to “actor” can make applications more capable, but it also creates additional failure modes.

Building a dry-run mode for the OpenTelemetry Collector

Teams continuously deploy programmable telemetry pipelines to production, without having access to a dry-run mode. At the same time, most organizations lack staging environments that resemble production – especially with regards to observability and other platform-level services.

The next wave of AI: Balancing innovation with sovereignty

This blog is based on the webinar, “AI panel: The next wave of AI technology”. You can watch the full recording by clicking here! The pace of AI innovation is reshaping research, business, and everyday life. However, as breakthroughs in Large Language Models (LLMs) and high-performance computing accelerate, they bring new technical challenges around scale, efficiency, and reliability.

Re-Inventing Network Operations: Are AI Extensions the Right Path?

For decades, telecom network operations have depended on traditional OSS tools – complex, services-heavy platforms that take years to modernize and even longer to deliver measurable business impact. This year at MWC, the leading OSS vendors showcased a variety of new AI extensions for their portfolios and marketed them as the fastest path to autonomous network operations. They are not.

Event Intelligence for Agentic IT Operations

Modern IT teams are experimenting with AI agents. But individual agents, working in isolation are not enough. To truly achieve Agentic IT Operations, organisations need a platform — one that coordinates, governs, and contextualises AI-driven actions across the entire IT landscape. That’s where Interlink Software comes in.

12 DevOps Tools You Should Be Using in 2026 (SREs Included)

When everything on the internet comes with an “AI-powered” tag attached and AI fatigue is in full gear, we come to the rescue with a list of tools and services for DevOps and SREs. No AI included. Twelve tools across infrastructure, security, observability, and incident management. Mostly open source. All of them solving specific problems without a chatbot in sight.

Smarter, Greener Data Centers Start Here: Why Spring Is the Best Time to Upgrade with Hyperview

Spring is the perfect season to rethink how you manage your data center. Many operators struggle with outdated tools that slow down capacity planning and energy use optimization. Hyperview’s AI-powered, cloud-based data center infrastructure management (DCIM) offers clear real-time insights and agentless asset discovery to cut costs and carbon footprints.

Redgate Monitor is now available as a fully managed SaaS edition

This is a guest post from Phil James. Database teams are already juggling a lot. Monitoring the performance of complex, multi-platform estates takes expertise and focus — and that's before you factor in installing, maintaining, and updating the monitoring tooling itself. That's the tension we've been hearing from database teams for a while. The monitoring solution is supposed to reduce operational burden, yet the infrastructure that runs it adds more.

The Now, New and Next in Data Center Infrastructure Management

Bill has built something truly special. For nearly a decade, this has been the place where data center leaders move beyond the basics and tackle real challenges: driving adoption, demonstrating ROI, navigating organizational change, and turning infrastructure data into strategic advantage. Following in his footsteps is both an honor and a responsibility I don’t take lightly. What we’ll cover in this Edition: This isn’t a product demo.

5 Database Monitoring Tips Every DBA Should Use to Reduce Firefighting

This is a guest post from udara.ratnakumara. In a recent webinar I hosted with my colleague Chris Hawkins, Inside a DBA’s Day: What Really Happens and How to Stay Ahead, we talked through the realities of a typical DBA day and the practical ways teams can stay ahead of issues rather than constantly reacting. For many DBAs, the day doesn’t start with coffee. It starts with an alert. A report is suddenly slow. An application query is timing out.

AI Merge Conflict Resolution + Commit Messages in GitKraken Desktop

AI-assisted merge conflict resolution is changing how developers handle Git workflows. Watch GitKraken Ambassador Kevin Bost demonstrate AI-powered features that eliminate merge conflict dread, clean up messy commit history, and generate contextual commit messages in seconds.

Knowledge Graphs: The Backbone of AI-First Software Delivery | Harness Blog

--- ‍Key Takeaways --- AI can generate code in seconds. It still can’t ship software safely. That gap isn’t about model quality or prompt engineering. It’s about context, and most software organizations don’t have a system that accurately reflects how pipelines, services, environments, policies, and teams actually relate to each other. Without that context, AI doesn’t automate delivery. It amplifies risk.

The Incident You Never Had: Deterministic Simulations w/ Will Wilson (Antithesis CEO)

Most reliability engineering happens after something breaks. Will Wilson thinks that's the wrong place to be. As co-founder and CEO of Antithesis, the autonomous testing platform that just raised $105M in a Series A led by Jane Street, Will has spent years building the infrastructure to catch failure modes before they ever reach production. His starting point is uncomfortable: the testing practices most teams rely on are structurally incapable of finding the bugs that cause real incidents.

Securing AI and Securing With AI: AI Security from Code to Runtime With Harness | Harness Blog

AI is changing both what you build and how you build it - at the same time. Today, Harness is announcing two new products to secure both: AI Security, a new product to discover, test, and protect AI running in your applications, and Secure AI Coding, a new capability of Harness SAST that secures the code your AI tools are writing.

Top 10 Container Orchestration Tools & Platforms Worth Checking Out in 2026

Sources: G2 reviews, vendor documentation, 2026 market data. Docker's release in 2013 made Linux namespaces and cgroups accessible without deep kernel expertise, and container adoption took off fast. The value was clear: one portable unit with everything the process needs, running consistently across any host. Teams that were previously shipping VMs with bundled OS, runtime, and application code finally had a better option, and they took it.

Top 10 Platform Engineering Platforms for 2026 (March Edition)

Platform engineering is rapidly evolving as businesses look for more efficient ways to manage infrastructure, automate workflows, and improve developer productivity. In this edition, we’ll explore the top 10 platform engineering platforms for March 2026, optimized for scalability, automation, and ease of use. These platforms empower developers to focus on building code while platform engineers handle infrastructure with reduced complexity.

Announcing HAProxy Fusion 2.0

Today, we announce the release of HAProxy Fusion 2.0. This release marks a generational leap for the authoritative control plane that orchestrates HAProxy Enterprise’s high-performance application delivery and security. With a combination of new headliner features, structural changes, and improvements to the performance of the underlying API, HAProxy Fusion has jumped from version 1.3 to version 2.0.

AWS App Runner: How It Works, Pricing, And Best Practices For Cost Optimization Today

Back in May of 2021, containers had already won. Kubernetes adoption was surging. ECS and EKS were powerful. But for many teams, deploying a simple containerized web service still meant stitching together clusters, networking rules, scaling policies, load balancers, IAM roles, and CI/CD pipelines. It felt heavier than it should. Developers no longer wanted more orchestration power. They wanted less operational drag.

New HCP Terraform (Terraform Cloud) Migration Wizard from env zero

Migrating from Terraform Cloud (TFC), now known as HCP Terraform, or Terraform Enterprise (TFE) can quickly become complex. Workspaces, variables, remote state, and project structure all need to be recreated carefully to avoid breaking infrastructure workflows. env zero’s new migration wizard dramatically simplifies that process, enabling teams to migrate from TFC to env zero within a matter of hours.

The Agent-Native Repo: Why AGENTS.MD is the New Standard | Harness Blog

This is part 1 of a five-part series on building production-grade AI engineering systems. Across this series, we will cover: Most teams experimenting with AI coding agents focus on prompts. That is the wrong starting point. Before you optimize how an agent thinks, you must standardize what it sees. AI agents do not primarily fail because of reasoning limits. They fail because of environmental ambiguity.

Becoming an Azure Expert MSP

Recently, Wortell achieved the Microsoft Azure Expert MSP designation, a milestone that places them among a select group of managed service providers recognized for their Azure expertise and operational maturity. In this webinar, Alex Tilgenkamp (Azure Cloud Architect at Wortell) shares insights into what it takes to achieve this designation and what it means for organizations building and scaling their Azure managed services practice.

Liquibase MongoDB Extension Tutorial | Install & Use the Harness Community Extension

Discover how to manage MongoDB schema changes using Liquibase with the Harness Community Liquibase MongoDB Extension. In this step-by-step tutorial, you will learn how to install, configure, and run MongoDB database migrations using Liquibase Community Edition. This extension enables DevOps teams to bring database version control and CI/CD practices to MongoDB, making schema changes easier to track, automate, and deploy.

Always Audit Ready: Streamlining SOC 2 & ISO Compliance with Cortex

Stop dreading your next audit. If you’ve ever survived a SOC 2, ISO, or regulatory audit, you know the drill: weeks of manual data pulling, hunting for owners, and frantic spreadsheet updates. In this video, we show you how Cortex transforms compliance from a painful annual event into a continuous, automated process. What we cover: The "Audit Scramble" vs. "Always Ready": Why manual snapshots fail and how a living service landscape keeps you compliant 365 days a year.

Is Your Org Ready for Peak Season? How to Automate Production Readiness with Cortex

Stop relying on hope as a production readiness strategy. Whether it’s Black Friday, tax season, or a major product launch, engineering leaders need to know—with 100% certainty—if their systems can handle the load. In this video, Becka demonstrates how Cortex replaces manual, time-consuming readiness reviews with a continuous, automated framework. What we cover: The Readiness Gap: Why manual self-reporting and static spreadsheets fail during high-traffic periods.

Bridging the gap between mobile networks and the cloud

When it comes to IoT connectivity, it’s no longer enough for Mobile Network Operators (MNOs) and Mobile Service Providers (MSPs) to provide coverage, capacity and SIM cards. As enterprises accelerate their digital transformation, the focus has shifted from basic connectivity to seamless, secure and scalable device-to-cloud integration.

ACME Renewal Information (ARI) solves mass certificate revocation

In July 2024, DigiCert discovered they’d been issuing certificates with improper domain validation for five years. They gave customers 24 hours to replace 83,000 certificates. CISA issued an emergency alert. Critical infrastructure operators couldn’t meet the deadline. Some customers sued. That’s what mass revocation looks like in practice. The CA finds a compliance problem, the clock starts, and everyone scrambles. ACME Renewal Information (ARI) is the fix.

FinOps in the Age of Kubernetes: When Everyone Owns the Bill

A FinOps analyst walks into a Monday morning meeting with a detailed spreadsheet showing $2.3M in potential Kubernetes cost savings. The recommendations look straightforward: reduce memory limits by 40%, scale down replicas during off-peak hours, consolidate workloads onto fewer nodes. The numbers are compelling, the methodology is sound, and the savings would make a material impact on quarterly cloud spend. The SRE team immediately objects.

New dotConnect and Entity Developer Release: EF Core 10, AI Vector Types, and Expanded Database Compatibility

We are thrilled to announce a set of product updates across our dotConnect data providers and Entity Developer. The new release adds support for Entity Framework Core 10, introduces AI-focused vector data types across major databases, and extends compatibility with the newest platform capabilities such as SQL Server 2025, Oracle 26ai, and Microsoft Entra authentication.

Prompt, Deploy, Pray Is Dead: Validating AI Code with Proxymock

Recent outages tied to AI-assisted code changes have pushed companies into a corner. After several incidents with massive “blast radius” impacts, organizations like Amazon introduced stricter controls—mandating that senior engineers manually review all AI-generated code before it hits production. That response makes sense on paper, but it exposes a fatal flaw in the modern development pipeline.

Test Data Management and SOC 2 Compliance

Using live data outside production is one of the fastest ways to create compliance risk, because it quickly becomes harder to control who can access it, how it is handled, and how long it is kept. A Test Data Management (TDM) approach provides exactly the kind of controls SOC 2 auditors look for in this situation: an automated, traceable end-to-end process for protecting, provisioning, and removing customer data so it can be used safely in non-production environments.

Why mid-market IT teams lose control as dev velocity increases

At a certain point, faster delivery stops feeling like progress and starts feeling like risk. When engineering teams scale from 10 to 50+ developers, the volume of infrastructure changes, database schemas, environment variables, and networking rules, no longer grows linearly. It scales exponentially. This is the scaling inflection point where manual governance breaks.

Test your AI model training reliability, too

Training is at the heart of every LLM model, but it’s still an application running on an infrastructure, which means it can fail. Our GPU test helps you test your training GPUs so you don’t lose that valuable work. TRANSCRIPT: One of the things we built recently was the GPU Gremlin. So if you are training a bunch of models and you're doing a bunch of GPU testing. You know, we want to give you the tools to be able to go test that, to understand how training the model could fail.

ABI recognises Civo as a global leader in NeoCloud innovation

Civo has been identified by ABI Research as one of the world’s leading NeoCloud providers. The report underscores our focus on cloud and AI infrastructure that blends high performance, technical innovation, and strong sovereignty. Being included in the ABI NeoCloud report validates the work we have done to support modern AI workloads while giving organisations control over where and how their data is handled.

API Failure: 7 Causes and How to Fix Them | Harness Blog

APIs have revolutionized how web and web app developers interact with data, whether for personal use or business. One of our most profound responsibilities as API developers is to protect our endpoints from being hacked. Even with essential safeguards in place, our websites can be vulnerable. This post discusses seven causes of API failures and how to fix them.

How CloudZero Measures Cost Per Customer (Step By Step)

Like most SaaS companies, CloudZero uses its own product. When we released cost per customer reporting, we tested it on ourselves first. And today, we use cost per customer reports regularly. Why? Because they help leadership answer board and renewal questions, including customer-level margins. Cost per customer is valuable and hard to get right. Multi-tenant systems and Kubernetes can hide the link between shared infrastructure (like EC2) and the customers using it.

The Best FP&A Software For 2026: 21 Tools To Know

Modern FP&A tools help SaaS businesses analyze financial performance in real time, forecast accurately, and align spend with business priorities — especially cloud spend, which can quickly spiral without visibility. Choosing the right FP&A tool for your business means understanding what you actually need it to do — whether that’s cloud cost visibility, more accurate forecasting, or tighter alignment between finance and engineering.

Becoming an Azure Expert MSP

Recently, Wortell achieved the Microsoft Azure Expert MSP designation, a milestone that places them among a select group of managed service providers recognized for their Azure expertise and operational maturity. In this webinar, Alex Tilgenkamp (Azure Cloud Architect at Wortell) shares insights into what it takes to achieve this designation and what it means for organizations building and scaling their Azure managed services practice.

GreenOps in Practice: What It Means and How to Get Ready for 2026

In this informational webinar, Freddie Booth, FinOps Consultant at Capgemini Invent, explains what GreenOps means in the context of modern cloud operations. You’ll learn why GreenOps is gaining attention across organizations using Azure, how it connects cost management with sustainability, and what steps teams can take today to start preparing for 2026. The session focuses on practical ways to improve cloud efficiency, reduce unnecessary Azure usage, and align FinOps practices with emerging sustainability goals.

Streamlining your NIS2 and DORA compliance solution with HAProxy

With NIS2 and DORA now in effect, EU organizations face a fundamental shift in how they approach security. Compliance is a standard built into every layer of your environment, from your hardware and OS to your software configuration. HAProxy alone doesn't make an organization compliant, yet it serves as a critical technical component of a strong security strategy.

The platform engineering playbook for velocity, quality, and AI readiness at SIXT

Cortex co-founder and CTO Ganesh Datta sits down with Boyan Dimitrov, CTO at SIXT. Boyan shares how SIXT went from releasing software once or twice a month to nearly 10,000 deployments per month, and explains the platform engineering philosophy that made it possible.

Westminster is waking up to technology sovereignty. The UK must be a maker, not a taker.

Westminster is starting to recognise the importance of technology sovereignty. The recent Westminster Hall debate on technology sovereignty was encouraging to see. For those of us working in the UK technology sector, it felt like an important moment. Conversations about cloud infrastructure, data control and platform dependency have been happening inside the industry for years.

How to Optimize Your CI/CD Pipeline with AI (CircleCI Chunk Tutorial)

As AI-assisted coding tools increase the amount of code, commits, and builds, optimizing your CI pipeline becomes more important than ever. In this tutorial, we walk through how to use Chunk, CircleCI’s autonomous agent that validates your code at AI speed, to analyze your pipeline history, identify performance bottlenecks, and suggest optimizations to your CI/CD configuration. Chunk leverages critical CI/CD context like build history, test results, and execution data to keep pipelines healthy and moving at AI speed.

Building a secure golden path: Cloudsmith x Octopus Deploy webinar

What does it take to build a "Golden Path" that developers actually want to use? In this expert-led webinar, Cloudsmith and Octopus Deploy team up to explore the missing link in your software supply chain: turning artifact creation and management into an automated, trust-backed journey from source to ship.

Beyond the build: How DataHub uses Cloudsmith to power worldwide software distribution

You’ve built a world-class platform – now how do you get it into the hands of your users without "download friction"? In this video, we look at how DataHub, the leading open source metadata platform, uses Cloudsmith as its cloud-native distribution engine to deliver high-performance software artifacts to a global audience with zero downtime and zero maintenance.

Redgate Test Data Manager Updates - March 2026

This is a guest post from James Hemson. Redgate Test Data Manager's latest release adds Entra ID authentication, multi-target anonymization, and direct treatment code editing, with workflow improvements to make pipeline management faster and more flexible. Entra ID Authentication You can now connect to SQL Server using token-based authentication via Azure Entra ID, for both anonymization and subsetting.

Seven early warning signs you're heading toward a governance crisis

Governance failures rarely start with a major outage or a failed audit. They start with small, localized signals that teams treat as isolated annoyances. By the time a crisis becomes visible, the structural breakdown is already expensive to fix. If you are in IT leadership or platform engineering, you have likely seen these signs. The risk is ignoring them until they consolidate into a systemic failure.

Turning team knowledge into Alert Routing rules

Over time, on-call teams build up a quiet layer of knowledge about their systems. Someone learns that a specific error code always means phone calls are failing. Someone else figures out that a particular background job fires a warning every night and has never once needed attention. That knowledge shapes how your team responds to incidents every day. But when it only lives in people’s heads, your response depends entirely on the right person being available at the right time.

Why 200k Developers Ditched Big Tech AI #openclaw #openai #claude #aicoding #aiagents #speedscale

Is architectural purity dead? The big labs are racing for enterprise control, but developers are flocking to OpenClaw for one reason: ergonomics. It treats AI like a human, not a restricted tool. Are you sticking with the corporate harnesses or going unfiltered? Let’s talk in the comments. Learn more: speedscale.com.

Why SSIS will never die - with Tim Mitchell

Steve is joined by business intelligence architect and author Tim Mitchell. They discuss why SSIS will never die, the general pros of cons of integration services, the evolution from XML to JSON, how AI can help with coding, and taekwondo – among other topics! Recorded on-site at PASS Data Community Summit 2025.

The Analyst View on Data Sovereignty with TechMarketView

Perspectives from the Edge: Episode 2 Data sovereignty isn't a solo effort. It's a symphony. Data sovereignty is moving fast up the agenda. But who's orchestrating it? The second episode of Perspectives from the Edge explores the subject through an analyst’s laser lens, in conversation with Kate Hanaghan, Chief Research Officer at TechMarketView. Find out how AI and platform consolidation make data harder to control, how to bake sovereignty into your business from the start – and why the organisations getting it right treat ecosystems as a strategy, not just a procurement exercise.

How Developers Build a Meaningful Career in the Age of AI

What does a meaningful developer career look like in the age of AI? We brought together four experts to answer exactly that. In this GitKon panel, GitKraken CMO Kate Adams moderates a conversation with Leon Noel (Managing Director of Engineering, Resilient Coders), Danny Thompson (Director of Technology and host of The Programming Podcast), Maggie Hunter (Recruitment Lead, GitKraken), and Dimitry Fonarev (CEO, Testkube) to explore how software engineers can future-proof their careers, grow their skills, and navigate an industry that is changing fast.

Cortex and Syntasso join forces to bridge the gap between automation and visibility

I've spent a lot of time talking to platform teams who feel like they're running in circles. They build incredible automation to speed up service delivery, but even when it's running perfectly, nobody actually knows what's happening across the organization. It's hard to see who owns which service or if those services even meet basic company standards. Automation's a great start, but it usually hits a wall when you try to scale it.

Code Compare 5.5 R1 Adds Integration Support for Visual Studio 2026

We’re excited to share Code Compare 5.5 R1, the latest update to our code comparison and merge tool. This release adds integration support for Visual Studio 2026, so teams can compare changes and resolve merge conflicts directly within the IDE workflow they already use. With Code Compare 5.5 R1, developers can review differences, apply merges, and handle conflicts in Visual Studio 2026 using the same comparison experience they rely on across projects and repositories.

Your Flaky Tests Are a Data Problem, Not a Test Problem

Your tests are not flaky. Your test data is. That 401 Unauthorized that fails every Monday morning? The OAuth token in your test fixture expired 72 hours ago. The order_id that works in staging but not in CI? It was hardcoded six months ago and the format changed from integer to UUID in January. The timestamp assertion that passes at 2pm and fails at midnight? You are comparing a hardcoded 2026-01-15T14:30:00Z against Date.now(). These are not test infrastructure problems. Retrying them will not help.
Sponsored Post

Runtime Validation vs Static Analysis: Why You Need Both

Runtime validation does not replace static analysis. They solve different problems. Static analysis catches structural defects in code before it runs. Runtime validation catches behavioral failures by testing code against real production traffic. Enterprise teams adopting AI coding tools need both layers because AI-generated code introduces a new class of defects that neither layer catches alone. According to CodeRabbit's State of AI vs Human Code Generation report, AI-generated pull requests contain roughly 1.7x more issues than human-written ones. Many of those issues pass static checks cleanly.

AI Coding Agents Have a UX Problem Nobody Wants to Talk About

The pitch was simple: let AI write your code so you can focus on the hard problems. Three years into the AI coding revolution, and developers are focused on hard problems alright, just not the ones anyone expected. Instead of designing systems and solving business logic, engineers in 2026 spend a startling amount of their day managing the AI itself. Should you use Fast Mode or Deep Thinking? Haiku or Opus? Cursor or Claude Code or Windsurf? Should you write a SKILL.md file or a custom system prompt?

How to choose a secure private cloud provider for your enterprise

Enterprise private cloud procurement tends to generate impressive security documentation. SOC 2 reports, penetration test summaries, ISO 27001 certificates, detailed descriptions of network segmentation and encryption standards. What it doesn't always generate is clarity on the question that actually matters: does this infrastructure make it possible to operate securely at the level your organization requires, given your specific workloads, your regulatory context, and your threat model?

Developer workflow fragmentation and what's really happening behind the scenes

In the current landscape of enterprise software delivery, a profound paradox has emerged: as the variety of specialized development tools and cloud services increases, the actual velocity of innovation frequently stagnates. For IT leaders, this phenomenon is known as developer workflow fragmentation. It’s a state where parallel, unstandardized processes create a pervasive "operational drag" that consumes the very agility these tools were intended to provide.

How to set up Alert Routing rules effectively

Different incidents need different levels of attention. Some need a phone call at 3 AM and others can wait until morning. Alert Routing rules are what let you act on that understanding without doing it manually every time. An effective routing setup does three things: Getting all three of these working is what makes a routing setup useful.

When Faster Code Starts to Break the Delivery System | Harness Blog

Speed is exposing the cracks. Our research shows that 69% of heavy AI users now face frequent deployment issues. To capture the ROI of AI, leaders must shift focus from code generation to delivery modernization. standardizing foundations and automating the "manual middle" that leads to developer burnout. Over the last few years, something fundamental has changed in software development.

Why DevOps and SRE Teams are replacing 3-4 monitoring tools with Atatus?

Your on-call engineer gets paged. A critical service is down. Error rates are spiking. They open Sentry for errors. Flip to Grafana for metrics. Pivot to Kibana to search logs. Then jump to Lumigo, but that only covers the Lambda functions, not the Node.js backend throwing the actual errors. Three tabs become five. Five become eight. Half the incident is gone and your team is still piecing together what happened instead of fixing it. Sound familiar?

The fallacy of complacent distroless containers

Join us on our deep dive into Chisel: the tool that brings enterprise-grade traceability to ultra-minimal container images. In this video, we explain why Chisel was created, and how it helps address security challenges in modern container images. We cover why container images often include unnecessary software and dependencies, why building minimal distroless containers can be difficult, and how missing metadata can lead to false confidence in vulnerability scans.

Update Management, Content Hub Expansion, and KQL Support

The latest VirtualMetric DataStream release introduces several important capabilities across platform security, data management, and operational workflows. This update strengthens access protection, simplifies infrastructure management, and expands the ways security teams can work with live telemetry. It also extends platform connectivity and improves the user experience across many areas of the interface. Let’s take a closer look.

The bare metal problem in AI Factories

As AI platforms grow in scale, many of the limiting factors are no longer related to model design or algorithmic performance, but to the operation of the underlying infrastructure. GPU accelerators are key components and are responsible for a large part of the total system cost, which makes their continuous availability and stable operation critical to the output and efficiency of the entire AI platform.

Resolve's Agents of IT podcast - S2Ep5 - Ari's Hot Takes #itautomation #claude #aiautomation #ai

In this episode of Agents of IT, Ari Stowe and Ian Coppock unpack the recent Claude outage and what it reveals about our growing dependence on AI at work. From developers suddenly returning to Stack Overflow to the infrastructure challenges behind AI scaling, the conversation explores what happens when AI becomes critical enterprise infrastructure. They also discuss how organizations should prepare for AI outages, why “stampede adoption” is the new reality of AI releases, and what resilient, multi-agent architectures could look like going forward.

MCP vs. CLI for AI-native development

Summary: The CLI vs. MCP question is really a question about where you are in the development loop. CLIs fit the inner loop: fast, local, zero overhead. MCP servers fit the outer loop: external systems, shared infrastructure, structured access. Most teams need both. AI has put a new kind of scrutiny on developer tooling. When a developer works alongside an AI coding assistant, the tools that assistant can reach, and how it reaches them, directly affect the quality and speed of the work.

Infrastructure Under Scrutiny: Turning Visibility into Cost Control

A practical discussion with infrastructure leaders on how visibility is shaping cost control, renewal planning, and financial accountability across hybrid environments. Runtime: 41:32 The conversation around infrastructure has shifted. IT teams are no longer measured only on uptime or performance.

Beyond Mirroring: 5 Reasons Your DevOps Strategy Depends on Repository Federation

For today’s leading enterprise computing environments, the concept of “centralized headquarters” is a relic. Today, R&D happens on different continents, spanning cloud, on-prem and hybrid environments, while stretching across multiple regulatory jurisdictions. But here is the hard truth: Most global organizations are still managing their binaries using legacy mirroring or “blind” infrastructure-level syncing. They treat artifact delivery like a basic file-transfer mechanism.

The governance playbook for mid-market IT teams

The contemporary IT landscape for mid-market organizations is defined by a paradoxical pressure: the mandate to accelerate digital transformation and AI integration while operating under the most stringent cost discipline observed in decades. For firms positioned between the nimble agility of startups and the vast resources of global enterprises, the "complexity of data lineage" and "legacy modernization paralysis" have emerged as primary barriers to progress.

Dependency Firewall for Harness Artifact Registry

Harness Artifact Registry’s Dependency Firewall protects your software supply chain by enforcing security policies at the moment dependencies enter your environment. Instead of discovering risky packages later in the pipeline, Dependency Firewall evaluates every dependency at ingest using policy-as-code and blocks packages that violate security rules.

How to stop guessing where developer friction lives

Most platform teams know friction is a problem. They also struggle to figure out exactly where that friction lives. Developers lose time in ways that rarely show up on a roadmap. In many organizations, creating a new service can require multiple approvals and several Slack threads. Spinning up infrastructure can mean filing a ticket and waiting days. Onboarding to a new codebase involves a scavenger hunt through stale Confluence pages. None of these feel like emergencies in isolation.

How Techdome accelerates AI-led product delivery with Civo Kubernetes

Accessing cloud infrastructure shouldn’t slow down product innovation. Yet for many engineering teams building AI-driven platforms, traditional hyperscalers often introduce unnecessary complexity, high costs, and slow provisioning cycles. At Civo, we’ve seen a different approach emerge. Our cloud platform enables teams to move faster with Kubernetes, compute, and networking designed for simplicity and speed.

The data context gap: an evaluation guide for agent-ready infrastructure

Why do AI agents that look brilliant in a sandbox fail the moment they hit production? For platform leaders, the answer is a lack of environmental parity: the ability to interact with the exact data state and service topology where the actual bugs live. When an agent attempts to modify a schema, optimize a query, or reproduce a bug without access to the real-world data state, it hits the Data Context Gap.

Release software with confidence using Datadog Feature Flags

In this technical product demo, see how Datadog Feature Flags helps teams release software with confidence by connecting every feature flag to real-time observability data. Configure progressive, multi-step rollouts with automated guardrails tied to APM, RUM, and Product Analytics so you can pause or roll back instantly if latency, errors, or key business metrics degrade.

Smarter Postgres Monitoring: Compare Queries, Spot Unused Indexes, and Diagnose Waits

This is a guest post from Adrian Tan. Over recent months, we’ve been steadily improving PostgreSQL monitoring in Redgate Monitor, with a singular focus: to help Postgres users diagnose performance problems faster, with less manual investigation. The latest updates and new features tackle this problem in a few different ways.

Database Governance with OPA in Harness DB DevOps | Harness Blog

Harness Database DevOps integrates Open Policy Agent (OPA) to enforce database governance through policy as code. By embedding compliance rules directly into CI/CD pipelines, teams can automatically prevent risky database changes, maintain audit trails, and meet regulatory requirements without slowing down development. Database systems store some of the most sensitive data of an organization such as PII, financial records, and intellectual property, making strong database governance non-negotiable.

Making the Case for Vendor-Backed Puppet Core

Thousands of organizations rely on open source community builds for infrastructure automation. But if you're tasked with certifying, maintaining, and patching those builds yourself, you know the burden firsthand. The reality is that managing open source internally consumes time, introduces risk, and diverts resources from higher-value initiatives. When critical vulnerabilities emerge, your team scrambles to assess, test, and deploy fixes, all while keeping production environments stable.

GitKraken Explains: How AI is Changing Your Commit History

AI commit message generation is fast, accurate, and consistent. It's also missing the most important thing: the why. AI-assisted Git workflows can summarize a diff in seconds, but they optimize for description, not decision-making. In this video, we break down what AI commit messages do well, where they fall short, and how to use them without quietly erasing the context future teammates (and future you) actually need.

6 Underused Git Commands That Save Time

Git is full of underused powers that most developers never discover. In this GitKon 2025 session, GitKraken Senior Product Manager Jonathan Silva reveals 6 lesser-known Git commands that solve real workflow pain points, from recovering lost commits to managing stashes strategically. Learn how to undo commits without losing work, recover deleted branches with git reflog, cherry-pick without immediate commits, target specific stashes, see contributor breakdowns, and more. Jonathan also demonstrates how GitKraken Desktop makes these workflows visual and intuitive.

Why API Documentation Is a Core Engineering Discipline, Not an Afterthought

Developers rarely cite documentation as the most exciting part of building an API. Yet it is frequently the factor that determines whether an integration succeeds in days or drags on for weeks. Poor documentation creates friction at every stage of the API lifecycle. Consumers misunderstand endpoints, send malformed requests and file support tickets that a well-structured reference would have made unnecessary.

Counting Consecutive Repeating Segments in URL Strings via Trino SQL

Short Summary: Sometimes URL paths repeat by mistake because of tracking problems or redirect loops.. In this guide, you’ll learn how to find and count those repeats using Trino SQL. By using simple SQL tools like CTEs, arrays, and window functions, you can break a long link into smaller pieces and clean up the data without using complex regex.

WireMock vs MockServer vs Proxymock: Java Mocking in 2026

Your WireMock stubs are lying to you. They were accurate when someone wrote them six months ago, but the payment API added a metadata field in January, the inventory service switched from REST to gRPC in February, and nobody updated the stubs because the tests still pass. Meanwhile, production is breaking in ways your mocks will never catch. This is not a WireMock problem. It is a hand-written mock problem.

Sensor-Level Access Control: A Game-Changer for Colocation Providers

Enter Hyperview’s sensor-level access control—a revolutionary approach that transforms this dynamic. By enabling granular access to individual sensors within shared environments, Hyperview empowers colocation providers to deliver the visibility their clients need while maintaining strict security and operational simplicity.

Migrate Your On-premises to the Cloud: A Step-by-Step Guide

Is it time to migrate your data center applications to the cloud? Get key considerations and a step-by-step migration path. Migrating your on-premises data center to the cloud can seem like a daunting task, from deciding on the right cloud deployment model to ensuring that your network connectivity is secure and scalable. For many companies, the question isn’t whether to migrate, but how to do it efficiently and with minimal disruption.

Step 2 to Web App Deployment: Back-End Deployment

Front-end deployment can embarrass you. Back-end deployment can wake you up at 2:13am. This is where web app deployment stops being about assets and starts being about state, uptime, traffic, and data that absolutely refuses to forget what happened five minutes ago. Back-end deployment is where complexity compounds. Quietly. Patiently. And then all at once. Let's talk about what's actually happening when you "just deploy the API.".

You Bought the AI Licenses. Why Is Only One Developer Getting 10x Results?

Here's something nobody talks about at the AI strategy meetings. Your organization just spent six figures on Cursor licenses, Claude seats, and Copilot subscriptions. Ninety percent of your engineers have access. By most internal measures, the rollout was a success. But somewhere on your team, one developer is running circles around everyone else.

3 Simple EC2 Cost Optimization Strategies That Actually Work

Amazon Web Services (AWS) has been the leader in cloud computing for more than 10 years. Despite a decade of innovation, no AWS service encapsulates cloud computing principles better than Elastic Compute Cloud (EC2). Through EC2, AWS can offer flexible and scalable virtual infrastructure that can be ‘rented’ to run applications and workloads.

What is an Internal Developer Platform (IDP)?

Over the past year, the term Internal Developer Platform has appeared everywhere in engineering discussions. At first glance, it might sound like another buzzword for a fancy dashboard. But the growing interest reflects a real shift in how organizations manage developer productivity and infrastructure. In this post, we will unpack Internal Developer Platforms (IDP), why they exist, what problem they solve, and whether it is worth considering adopting one.

How to verify certificate renewal actually worked

On May 21, 2019, LinkedIn’s URL shortener went down. The certificate had expired. Millions of people cried out in terror when they couldn’t click on AI link bait. The interesting part: LinkedIn had renewed the certificate ten days earlier. The renewal succeeded. The certificate just never made it to the server. The renewed cert existed somewhere, but the server still served the old one. Most certificate automation is built to prevent the “I forgot to renew” problem.

Continuous Integration (CI) Testing Best Practices in 2026

Continuous Integration (CI) is an essential practice in modern software development that enables developers to continuously integrate their code changes into a shared repository. Through automated building and testing processes, CI ensures that every code change is tested thoroughly, preventing integration issues and improving software quality. In this article, we will explore the best practices for effective Continuous Integration testing.

Your Cloud Is Transparent - Just Not to You

Many organizations that have moved to the cloud enjoy the flexibility and scalability it provides. But as cloud environments grow, a new challenge emerges: how to maintain real visibility and control over the environment that runs the business. In a new article published on People & Computers, we discuss one of the most common blind spots in modern cloud operations — situations where the same external provider both operates the environment and reports on its performance.

A compass for setting up your escalation policy

Setting up an escalation policy for the first time can feel like standing at a crossroads with no clear sign pointing the way. You could escalate based on severity, by team, or by who’s available and all of them are valid. Knowing which one fits your situation is the hard part. Think of this guide as your compass for that decision.

Sovereign clouds: enhanced data security with confidential computing

Increasingly, enterprises are interested in improving their level of control over their data, achieving digital sovereignty, and even building their own sovereign cloud. However, this means moving beyond thinking about just where your data is stored to thinking about the entire data lifecycle. In this blog, we cover the differences between data residency and data sovereignty, how confidential computing works to enhance the security of your data, and can support you in achieving digital sovereignty.

Webinar recap: FinOps In The AI Era - A Critical Recalibration

In March 2026, CloudZero’s Ben Austin, Director of Product Marketing, sat down with Ray Rike, Founder and CEO of Benchmarkit, to walk through findings from FinOps in the AI Era: A Critical Recalibration, a joint survey of nearly 500 organizational leaders on how they’re managing or, rather, struggling to manage AI costs.

AI at Superhuman (before it was cool) feat. Loïc Houssier

What does it actually look like to build an AI-native product and lead an engineering team through the AI era when you've been doing it longer than most? Rob Zuber sits down with Loïc Houssier, CTO at Superhuman, to talk about what it meant to be an AI company before AI was everywhere, and how that early foundation shapes the way they build, ship, and think today.

Why the AI market is shifting

The AI revolution is getting expensive. Ben Norris (AI Engineer at Civo) breaks down a staggering statistic: AI token usage has jumped from 9.8 trillion to 1.3 quadrillion in just under two years—a 130x increase. As businesses scale, the "closed source" premium is becoming a bottleneck. Watch as Ben explains why enterprises are turning toward democratized, open-source AI and smaller vendors like relaxAI to maintain power at a fraction of the cost.

Harness AI + MCP server: A Single Prompt to Accelerate the Software Development Lifecycle

Pipeline Creation: Using a single prompt in the IDE, a CI/CD pipeline is created and triggered via the agent connected to the Harness MCP server. Failure Diagnosis and Fix: When the pipeline fails, the agent is used to diagnose the issue (a failed dependency) and propose a fix, which is then committed, pushed, and the pipeline re-triggered to succeed. Deployment: After a successful build, the artifact is deployed into a Kubernetes cluster. Incident Response.

GPU Fragmentation Is Killing AI Economics

By 2026, the GPU shortage isn’t a supply-chain hiccup anymore. It’s baked into the system. Even after pouring billions into CapEx, most enterprises still want 40% more GPU capacity than they actually have. And it’s not because they’re chasing moonshots. Technology companies are training foundation models while serving inference for millions of users on the same clusters. AI labs are juggling fine-tuning, evaluation, and real-time experimentation side by side.

Service Status Update: March 5, 2026

On March 2, 2026 at 23:30:24 UTC, we experienced an issue where the Zoom AI scribe was unable to join calls, rendering Zoom meeting transcription unavailable for all users. On March 2, 2026 at 23:30:24 UTC, we experienced an issue where the Zoom AI scribe was unable to join calls, rendering Zoom meeting transcription unavailable for all users. The issue persisted from approximately February 28 through March 5, 2026.

Escalation policy for critical incidents

When a critical incident triggers, there’s no time to figure out who to call. That decision needs to be made well before the incident arrives. A dedicated escalation policy for critical incidents gives your team a clear path to follow the moment things go wrong, rather than leaving it to whoever happens to be around. This guide covers the key decisions involved in building that policy.

Optimizing PostgreSQL Performance for Large Tables with Boolean Filters

Short Summary: In this guide, we address performance problems that occur when PostgreSQL queries very large tables containing low-cardinality boolean fields. It also demonstrates how composite indexes, created and tested with Devart tools, allow PostgreSQL to locate the required rows directly instead of scanning the entire table.

Scaling Your CLI Query Tool: Prioritizing Database Support for Maximum Impact

Short Summary: This article explains how to expand your CLI query tool so it can handle multiple databases. First, we focus on key relational systems, then move to data warehouses. We’ll use dbForge Edge as a reference point, showing how different databases can be managed in one place while keeping automation simple.

Choose the Right SQLite ODBC Driver: Practical Comparison for 2026

Picking an SQLite ODBC driver is like choosing a key; you need one that opens every door cleanly, without snapping or jamming. The right driver must connect your data effortlessly—no errors, no limitations. It also has to withstand real challenges: complex queries, cross-platform environments, and large datasets. In this guide, we’ll show you which 2026 drivers stand up to these demands, so you can pick the one that won’t let you down.

Practice Vs. Performance: Two Reports On The State Of FinOps

This month, two FinOps research reports landed in close proximity. One from the FinOps Foundation — their 6th annual State of FinOps, drawing on a broad global practitioner community. One from CloudZero: FinOps in the AI Era: A Critical Recalibration, built on responses from 475 senior leaders at cloud-mature, AI-active organizations, with a focused lens on how AI is reshaping cloud cost management. Read each alone, and you get a useful snapshot for your business.

Invisible Lifelines: DCIM Empowers Healthcare Teams

DCIM (Data Center Infrastructure Management) software plays a critical role in supporting this mission. By providing real-time monitoring, streamlined incident response, automated compliance tracking, and predictive maintenance capabilities, DCIM platforms offer healthcare teams the tools needed to safeguard patient safety and meet complex regulatory requirements.

JFrog Earns Microsoft Solutions Partner with Certified Software Designation for Azure

We’re excited to announce that JFrog has officially earned the Microsoft Solutions Partner with certified software designation for Azure. This status is granted to partners who complete a technical review audit for interoperability with Microsoft products and demonstrate a consistent track record of customer success.

Regression Testing: What it is, why it matters, and how to automate it with CI/CD

Regression testing is the practice of re-running existing tests after a code change to confirm that previously working functionality hasn’t broken. It answers a single question: did this change break something that used to work? In CI/CD pipelines, regression tests run automatically on every commit, giving teams immediate feedback before code reaches production.

Measuring Developer Productivity: Prove Impact | Harness Blog

The best engineering teams rely on data-driven frameworks like DORA metrics and SPACE to measure developer productivity and demonstrate business impact. This guide explores proven measurement approaches that move beyond vanity metrics to capture real engineering value and team performance. Your developer productivity initiative didn't collapse because the data was wrong. It stalled because it couldn't answer the business question. Leadership asked, "So what?".

Making Security Invisible for Game Developers

Security that developers never have to think about. That's the goal Audrey Long, Senior Gaming Cloud Security Architect at Microsoft Gaming Security, set out to achieve, and then actually built. In this GitKon session, Audrey walks through how Microsoft Gaming tackled a massive identity security challenge across double-digit Entra ID tenants spanning independent game studios. With no existing tooling that fit the pace of game development, her team built the Entra ID Tenant Security Scanner from scratch using the Maester Framework, custom PowerShell, and GitHub Actions.

Harness Artifact Registry: Your Unified OCI-Compliant Gateway for Secure Artifact Management | Harness Blog

If you've worked with builds and deployments, then you already know how central Docker images, dependencies, and containers are to modern software delivery. The introduction of Docker revolutionised how we package and run software, while the Open Container Initiative (OCI) brought much-needed standardisation to container formats and distribution. Docker made containers mainstream; OCI made them universal.

What is engineering operations? A guide to the discipline transforming software teams

Engineering teams are writing more code than ever. AI coding tools have made individual developers dramatically more productive, yet most organizations report moving only about 20% faster than before. The real constraint has always been the operational fabric surrounding the act of writing code. The processes, standards, visibility, and coordination that determine whether hundreds of engineers and thousands of services ship reliable software at speed have always been where the real work happens.

What's new with Console Connect Private Label?

Last year, we launched a powerful new way for partners to transform how they resell and manage connectivity globally. With Console Connect Private Label, partners can deliver on-demand NaaS capabilities to their customers under their own brand, without the cost, risk or complexity of building and maintaining a SaaS platform.

AI-ready sovereignty playbook 2026: how to run gen-AI workloads (ethically) in the EU

Sovereignty is a concept that can have shown nuances in the way it is currently used by states and industry to describe some services. The term “strategic autonomy” has also been used, as to describe the need for governments to ensure that they have a hand on the full value chain (or at least know the gaps and accept the risks) and can apply their rules while it seats in its jurisdiction (autonomy derives from the greek autos (self) nomos (rule).

The post-mortem problem

Post-mortems are one of the most consistently underperforming rituals in software engineering. Most teams do them. Most teams know theirs aren't working. And most teams reach for the same diagnosis: the templates are too long, nobody has time, and nobody reads them anyway. These aren't wrong observations. But they're symptoms, not causes. The actual problem is that somewhere along the way, the post-mortem stopped being a piece of communication and became a compliance artifact.

How to Build AI-Native Security Resilience (And Finally Get Developers And Security On The Same Team) | Harness Blog

Developers and security professionals have struggled to get on the same page for what seems like forever and AI is only making that divide larger, according to results from our State of AI-Native Application Security 2025 research report.

Hot Takes: What the AI Hype Gets Wrong About Software Engineering Excellence | Harness Blog

Ahead of the DevOps Modernization Summit, Matthew Skelton, CEO & CTO of Conflux shares his takes on output-driven AI, how DORA metrics aren’t enough, and why governance and compliance must be built into the platform. ‍ Matthew Skelton is the CEO & CTO of Conflux and a featured speaker at this year’s DevOps Modernization Summit. Ahead of our annual summit, Matthew has shared his hot takes on AI, DORA, and the key to successful automation.

Burnout Doesn't Ask Permission: Recognizing, Recovering, and Rebuilding w/ Stephen Townsend

Burnout doesn't announce itself. For Stephen Townsend, SRE team lead and host of the Slight Reliability podcast, it crept in over months of mounting pressure on a massive transformation program, and announced itself overnight with an inability to sleep. In this episode, Stephen shares his personal burnout story with rare honesty: the physical symptoms he dismissed, the org structure that left him without autonomy, and the full year it took to recover.

Inside Pandora's Box: How CloudZero AI Hub Cracks Cloud Cost Intelligence

Years in the FinOps trenches taught me one thing: The data has never been the problem. The data exists. It’s out there, scattered across provider invoices, buried in tagging gaps, locked behind dashboards that maybe three people in your org actually know how to navigate. The real problem? Nobody can get to it when they need it. Engineers ship features without understanding what they cost the business, let alone whether they improved margin.

Inference Economics: What It Is And Why It Matters Now

Somewhere between a model’s first demo and its first production workload, the cost conversation changes completely. Training is a big number, but it’s a finite one. Inference isn’t. Every user interaction, every query, every API call triggers compute behind the scenes — and unlike training, inference never stops billing. That shift from one-time expense to ongoing operational cost is where inference economics begins.

GitKraken Desktop 11.10: From Top Requests to Today's Release

Seven developer-requested features. Tighter control over branches, history, and large repos. No CLI detours required. If you have been using GitKraken Desktop in a complex repo, you already know what it feels like when the commit graph turns into a wall of branches. When rebasing requires more ceremony than it should. When you just need one file back from three commits ago but have to orchestrate a whole checkout to get it. GitKraken Desktop 11.10 is built for those moments.

OpenTelemetry traces for Bitbucket Pipelines via webhooks

Continuous delivery is only as good as your ability to understand what’s happening inside your pipelines. When a build is slow, flaky, or burning through capacity, you need more than a green/red status and a wall of logs — you need traces. Bitbucket Pipelines now exposes pipeline execution as OpenTelemetry (OTel) traces via webhook events. This lets you stream detailed pipeline spans into your own observability stack and correlate them with the rest of your system. This post walks through.

How to Lower Your Egress Fees in 2026

Egress fees can quietly drive cloud costs. Learn practical ways to reduce your cloud egress fees in 2026 without redesigning everything. Cloud egress fees can sneak up on you. One month your cloud bill can look reasonable, and the next it’s clear that data movement is causing your cloud spend to fluctuate. For many network teams, egress is still treated as a fixed cost or something you only revisit during a major architecture change, but that approach doesn’t hold up in 2026.

Database Schema Evolution: Designing for Continuous Change | Harness Blog

Modern database design is no longer a one-time activity but an ongoing process that evolves as business needs, scale, and system behavior change. Instead of large redesigns, teams rely on incremental and backward-compatible schema changes, such as adding columns, indexes, or new tables, to safely adapt the database without disrupting production.

AI SRE in Practice: Enabling Non-Experts to Troubleshoot Kubernetes

Kubernetes troubleshooting traditionally requires deep platform expertise. Understanding pod lifecycle, decoding error messages, correlating events across resources, and identifying root cause all demand experience that takes years to build. This expertise gap creates a bottleneck where only senior engineers can handle production issues, limiting how quickly teams can resolve incidents.

How Gremlin makes disaster recovery testing easier and faster

There’s a common saying: “A backup isn’t a backup until you’ve tested it.” The same is true whether it’s a simple database failover or an entire data center/cloud provider failover. You simply won’t know if it works if you don’t test it. When it comes to disaster recovery testing, that can be an expensive, painful, and arduous process. But it’s required by companies for a reason. And not just for disasters like hurricanes, flooding, or earthquakes.

Beyond "Reactive" Accessibility: Meeting the 2026 ADA Title II Mandate in Higher Ed

For decades, digital accessibility in state-funded higher education has largely been a "reactive" game. If a student with a visual impairment reported an issue with a tuition portal, the university would scramble to provide an accommodation. As long as the institution could show "meaningful progress" toward compliance, it was generally shielded from significant legal repercussions. That era is officially ending. The U.S.
Sponsored Post

The art of software engineering management

Like any leadership role, leading an engineering team in a mature, compact company like Raygun comes with both honor and responsibility. Leading a major development project is a bit like conducting a symphony orchestra, where every individual plays a crucial role and has a great impact on the work they release to customers and end-users.

Spring Boot API Testing: A Practical Guide for Enterprise Teams

Enterprise Spring Boot APIs should be tested at three levels: unit tests for business logic, integration tests for external service behavior, and traffic replay for production edge cases. Most teams only do the first. This guide shows all three using a real Spring Boot application that calls external APIs (SpaceX, US Treasury) with JWT authentication. The kind of service that looks simple in development and breaks in production.

Debugging Encrypted Microservice Traffic with Speedscale's eBPF Collector

Production bugs that only reproduce in actual traffic can be some of the most frustrating bugs in software development. You can stare at your logs, add traces to your code, add instrumentation – and still not be able to see the actual requests that went over the wire. And that gets even harder when the requests are encrypted and the system is a black box. You can use tools like Wireshark or Kubeshark to capture the requests.

The Ultimate Black Friday Technical Checklist: Prepare your infrastructure for Black Friday

Updated March 03, 2026 One of the things that keep online shop owners awake at night is – will my website withstand the Black Friday traffic? As this is one of the most important days of the year, a downtime of even a few minutes can translate into thousands of dollars in losses. This is why we’ve decided to come to your aid with a hands-on article where we discuss the most common Black Friday problems eCommerce websites should avoid, and how you can avoid them.

Agent vs. Agentless: What is better for Infrastructure Management?

The “agent vs agentless” debate usually comes up when teams are trying to choose an infrastructure automation approach that will not compromise security, compliance, or impact day-to-day operations. You need to manage a hybrid estate, avoid creating more work for already stretched teams, and ideally do it without stitching together multiple tools. That pressure often turns the conversation into a binary choice: which approach is better?

Beyond the horizon: How Ana Cidre turned a non-linear path into leadership

Welcome to Beyond the horizon, a monthly series celebrating the people who shape Upsun’s culture, innovation, and heart. In this month’s edition, we’re featuring Ana Cidre, Director of Developer Advocacy at Upsun and one of the key voices shaping how developers experience our platform today and in the future.

How to Achieve Data Sovereignty in Europe

European organizations are facing growing data sovereignty demands. Discover options, benefits, and best practices for keeping your data local. Across Europe, organizations are rethinking how they manage their data. The reasons for this push are multi-layered. The U.S. Government’s approach to data privacy often contrasts with Europe’s generally tighter regulations – but recent geopolitical and regulatory shifts in the U.S.

Introducing Cortex as the Engineering Operations Platform

Software Engineering is once again being forced to evolve. We are entering the era of infinite code where the cost of writing code tends to zero. The data tells us that companies are only moving 20% faster than when humans wrote code by hand. We’re writing orders of magnitude more code than ever, yet our processes are barely keeping up with what we had before. The chaos and complexity is only being amplified by this new shift in how we work as developers.

CloudZero Launches Claude Code Plugin To Bring Cost Intelligence Into Engineering Workflows

Today we’re announcing the CloudZero Claude Code Plugin, a new capability that puts CloudZero’s full cost intelligence model directly inside Claude Code, where engineers and technical FinOps practitioners already work. The plugin connects a Model Context Protocol (MCP) server and nine pre-packaged investigation skills to CloudZero’s cost data, covering cloud and AI spend across AWS, GCP, Azure, Snowflake, MongoDB, OpenAI, Anthropic, and more.

GitKraken Desktop 11.10 Release: Pin Branch, Sparse Checkout, Word Wrap and MORE!

GitKraken Desktop 11.10 is live! This release focuses on control over the commit graph, rebases, and more. The kind that makes working in complex repositories feel intentional instead of chaotic. Here’s what’s new: These updates bring long requested structural improvements directly into the GitKraken Desktop UI. If you care about clean history, readable diffs, and managing serious repos without jumping to the CLI, this release is for you.

Data Sovereignty: What Infrastructure Leaders Must Know

Data sovereignty refers to the principle that data is subject to the laws and governance of the country in which it is stored or processed. Put simply: where your data lives determines who has jurisdiction over it. This isn’t a new concept, but the urgency around it has intensified significantly. The geopolitical climate has pushed governments to treat data infrastructure as a strategic national asset. The EU has GDPR with strict cross-border transfer restrictions.

What does investigation look like when data lives in multiple tools?

War rooms don’t fix fragmentation. They expose it. Incident hits. App checks traces. Infra checks hosts. Cloud checks dashboards. Network checks packets. Everyone sees their layer. No one sees the system. So we guess. Rollback. Add capacity. Freeze change. The noise stops. The constraint doesn’t. Modern failures don’t live in tools. They live in dependencies. If your platform can’t follow a transaction across hybrid and AI infrastructure — to the exact constraint — you don’t have observability.

Solving the Cartesian Count Problem: Efficient Multi-Table Aggregation in Complex Databases

Short Summary: Counting related records across multiple dependent tables can produce wrong totals or slow queries when standard joins are used. This article explains a more reliable way to do this by counting each table separately, applying date filters, and checking that the query still runs well on large datasets. Tools such as dbForge Edge can also help if needed.

Mastering Cross-Database Date Manipulation: Subtracting Days in MySQL and H2

Short Summary: Different databases use different SQL syntax for simple tasks like subtracting days from a date. In this post, we show how these differences appear across databases such as MySQL, PostgreSQL, SQL Server, Oracle, and H2. We also explain how dbForge Edge helps teams work with them in one place while keeping application logic consistent.

What Is Lift And Shift? Is It Right For You?

Cloud migration strategies range from quick rehosting to complete application refactoring. Lift and shift migration (also called rehosting) moves applications from on-premises to the cloud with minimal changes, promising speed and lower initial costs compared to other strategies. But does lift and shift actually deliver long-term savings? This guide examines when lift and shift makes sense, when it creates hidden costs, and what alternatives exist.

Last call on 398-day certificates

The bell rings. Last call for 398-day certificates is March 15. After that, every CA is required to cut you off at 200 days. Some have already stopped serving them early. The rest follow in two weeks. The irony of good certificate management is that when it works, nobody notices. No alerts, no outages, no 2am pages. The only time it gets attention is when something expires. Which means the teams doing it well rarely have the budget or the political capital to fix the process before it breaks.

The Tide of AI - Surfing the Tsunami of Binaries

AI is creating an overwhelming surge of digital artifacts and software components. The key to success is learning how to ride, secure, govern, and manage that wave – rather than being overwhelmed by it. This weekend, I asked my team to watch Chasing Mavericks. Jay Moriarity (not J-Frog, but stay with me) was one of the most driven and determined surfers imaginable. His courage and spirit were extraordinary. But those virtues were shaped and refined by his mentor, Frosty Hesson.

Cloud-native Android infotainment: your CI pipeline shouldn't depend on hardware

More and more often, infotainment systems are being developed and delivered like software, yet often they are still tested and validated using hardware-centric processes. This is far from ideal: access to devices is limited, environments are difficult to reproduce, and iteration slows down as soon as multiple teams need to work in parallel. These challenges become even more visible as cockpit systems move toward wide displays and high resolutions.

Step 1 to Web App Deployment: Front-End Deployment

Front-end deployment is usually introduced as the easy part of web app deployment. "Build the assets and push them." Seems simple, right? Yet on the contrary, those 6 words have most likely caused more production issues than most error messages. Because while front ends do look simple, they are actually a delicate stack of assumptions wearing a UI that is misleadingly easy to use.

Stop Managing Infrastructure: How BHS Corrugated Scaled Artifact Management with Cloudsmith

Are you spending more time maintaining your artifact servers than building software? In this video, we explore how BHS Corrugated–a global leader in manufacturing technology with a presence in 20 countries–transformed their developer experience by moving from fragmented, self-hosted GitHub repositories to Cloudsmith: the world’s leading cloud-native artifact management platform.

The modern JFrog alternative: Why ConstructConnect switched to Cloudsmith

Is your artifact management slowing down your development velocity? In this video, we dive into how ConstructConnect migrated from JFrog Cloud to Cloudsmith–the world’s leading cloud-native artifact management platform–to eliminate hidden costs, simplify their CI/CD pipelines, and secure their software supply chain.

When AI Writes the Code, Who Pays the Cloud Bill?

This is part two of a series of the implications of AI generated code becoming mainstream. We recently wrote about how AI-generated code is overwhelming SRE teams with production complexity they can’t manage. Turns out that’s only half the problem. The other half shows up on the cloud bill. A prospect reached out to us last month. They’d been using Cursor and Claude Code for six months, shipping features at unprecedented velocity. Product was thrilled.

The Strategic Shift to Managed Optical Fiber Networks (MOFN)

As digital transformation accelerates, the underlying infrastructure supporting your enterprise or service provider network faces unprecedented pressure. The exponential growth of data, driven by cloud computing, 5G, and particularly Artificial Intelligence (AI), demands a fundamental rethink of how you approach connectivity.

The Rise of Technical Virtual Assistants: QA Testing, Documentation, and DevOps Support in 2026

Virtual assistants are no longer just handling emails and scheduling meetings. In 2026, a growing number of tech companies are outsourcing QA testing, technical documentation, project coordination, and even DevOps support tasks to skilled virtual assistants - at 60-70% less than hiring equivalent local talent.