Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

What Do You Use for AI Agent Infrastructure? The Complete Guide to Building Production-Ready Agent Systems

The question "what do you use for AI agent infrastructure?" has become one of the most searched queries in the DevOps and platform engineering space. And for good reason: the global AI agent market is projected to grow from $5.1 billion in 2024 to $47.1 billion by 2030, representing a compound annual growth rate of nearly 45%. With 85% of enterprises expected to implement AI agents by the end of 2025, getting the infrastructure right has never been more critical.

Database Administration Made Easy With dbForge Edge

Simplify database administration across multiple platforms with dbForge Edge. This video shows how dbForge Edge helps you handle performance tuning, backups, security, and migrations across multiple database systems. Optimize SQL performance Automate database backups Manage users and permissions Migrate databases in minutes.

Track OpenAI Spend: Explain Where Your OpenAI Budget Goes

The inevitable happened. A while back, Gartner projected that in 2026, 30–50% of all new SaaS product features would use LLM inference. That meant OpenAI-style costs would become a standard part of SaaS COGS. Today, OpenAI has become one of the most operationally significant line items for SaaS companies. But for many teams, this creates an uncomfortable gap. Engineering sees OpenAI as a fast path to innovation.

Resolve's Zero Ticket Minute - Ep. 8 #itautomation #agenticai #aiautomation

Proof of concept season is over. In Zero Ticket Minute Ep. 8, Ian Coppock explains why AI is now judged by results, not demos. If it’s not reducing downtime, cutting costs, or stopping 2 a.m. pages, it’s not delivering value. The shift is here. From experimentation to execution.

How to Mature Your FinOps Capabilities Without Rushing Optimization

In this episode of the FinOps on Azure Podcast, Nicole Boyd breaks down how the FinOps maturity model works in practice, why many teams struggle in the early stages, and what needs to be in place before optimisation can deliver real value. The conversation covers measurement, data trust, ownership, and why FinOps is an operating model not a one-time project.

SSIS Data Flow Components 4.0: Ready for Visual Studio 2026, SQL Server 2025, and Beyond

We are excited to announce the release of SSIS components Data Flow Components version 4.0, an update that delivers expanded compatibility with the latest development tools and database platforms. Version 4.0 introduces full support for Visual Studio 2026 and Visual Studio 2026 Insiders, ensuring developers can seamlessly adopt Microsoft’s newest IDE while continuing to work with familiar workflows.

Refactor Safely with AI: Using MCP and Traffic Replay to Validate Code Changes

So as software engineers using AI coding assistants, we’re quickly learning of a new anti-pattern: Hallucinated Success. You give your agent (e.g. Claude via terminal or various IDE code assistants) the command “refactor the billing controller.” The agent happily complies, churning out nice clean code. The agent even goes so far as to write a new unit test suite that passes at 100%. You integrate it. Your test suites pass. Your production code breaks. Why?

Who should be on-call

There usually isn’t a hard and fast rule about who should be on-call. Teams often look for criteria like seniority, experience, or expertise. While those factors certainly help, they might matter less than you think. It is often more useful to look at whether your processes are ready. When incident responses rely on memory and intuition rather than documentation, even experienced engineers can struggle. They might handle things through internal knowledge that isn’t available to everyone else.

Introducing Megaport On-ramp as a Service

Megaport On-ramp as a Service is the fastest way for service providers to offer secure, private connectivity to mission-critical applications, and it’s now available. Considered the gold standard, private connectivity is the superior way to reach enterprise applications, storage services, or security services. But for most service providers, offering a private on-ramp for customers to connect to isn’t straightforward.

Oracle Cloud Pricing: A Comprehensive Guide To Oracle Cloud Costs

In 2025, Oracle shocked the market. Its cloud growth was so aggressive that Oracle’s stock surged, briefly making founder Larry Ellison the world’s richest person. That didn’t happen by accident. Oracle closed fiscal 2025 with $57.4 billion in revenue, mainly driven by cloud services. Oracle Cloud Infrastructure (OCI) grew roughly 50% year over year, driven by enterprise databases, AI workloads, and network-intensive applications migrating from more expensive platforms.

Webinar Recap: What It Really Takes To Make AI Profitable

Right now, 48% of organizations say they’re being asked to measure or report on AI-related costs. The problem is that they’re still figuring out how to do it. That was a very telling stat from a recent CloudZero webinar on AI and profitability, and speaks loudly to the reality that many organizations are still struggling to get a grasp on AI spend which our data shows to be rising sharply as a part of total spend in recent months.

Code Reviews Done Right: The Framework That Stops Bugs Before Production

Learn code review best practices from experienced developer Shashi Lo at GitKon 2025. Discover how to review pull requests effectively, give constructive feedback using the nit vs. non-nit framework, and leverage AI tools like CodeRabbit and GitHub Copilot to catch bugs humans miss. Shashi Lo shares 20+ years of code review philosophy, demonstrating real PR reviews on his Secret Santa app and showing exactly what makes thorough code review essential for shipping production-ready code.

Harness AI January 2026 Updates: Human-Aware SRE and Smarter API and Application Security | Harness Blog

Harness AI is starting 2026 by doubling down on what it does best: applying intelligent automation to the hardest “after code” problems, incidents, security, and test setup, with three new AI-powered capabilities. These updates continue the same theme as December: move faster, keep control, and let AI handle more of the tedious, error-prone work in your delivery and security pipelines. ‍

Telus Communications Migrates from TDM services to VoIP Services

In this video, Curtis Sperle, Director of Voice & Collab Services at TELUS Communications, discusses the company's transition from traditional TDM-based networks to modern VoIP and cloud solutions. Key points include: Modernization Strategy: TELUS is migrating its copper-based services to an IP-based infrastructure to improve scalability and reliability. Strategic Partnership: Ribbon serves as a primary vendor, providing essential components like Session Border Controllers (SBCs) and PSX for call control.

We're Past Human-Scale Operations. Here's Why.

Ever been on a 100-person P1 call where everyone says, “It’s not us”? That’s not a people problem. It’s a broken operating model. More tools. More data. More teams. And somehow… slower resolution. This is what happens when observability is fragmented across silos. Each team has data, but no one has shared truth—and human-scale operations can’t keep up with modern IT complexity. This clip breaks down why the old model no longer works.

Secure Fleet Management with MDM: Lockdown Solution for Truck Drivers

Are you struggling with device theft and misuse in your fleet management? Discover how to take control with AirDroid Business! With cutting-edge features like App management, kiosk mode and customized alerts, you can limit access to work-related contents only and quickly respond to unusual device activity Role-based Device Grouping.

To change your engineering culture, start by asking your team what sucks

Most engineering leaders have a very known and very annoying "normal error." It's the log entry or deployment glitch that has been around so long that it is simply accepted as part of the status quo. Jeff Schnitter, a Solution Architect at Cortex, describes this as a form of organizational Stockholm syndrome. This mindset is unsustainable for several reasons.

Resolve's Agents of IT podcast - Ep. 11 - Sean and Ari's Hot Takes #itautomation #agenticai

Agentic AI is moving fast, but expectations are moving faster. In this episode of Agents of IT, Resolve CCO Sean Heuer and Ari Stowe, Resolve COO, cut through the noise around agentic AI, AIOps, and automation in modern IT environments. They react to recent articles from Forbes, TechCrunch, and others to unpack what’s real, what’s hype, and what actually works today.

What it takes to build and scale developer platforms | Canonical x Qualcomm

Ubuntu on @qualcomm platforms simplifies access to Qualcomm’s advanced AI accelerators and capabilities, empowering developers and enterprises to innovate with confidence. Learn about the work we’re doing to bring Ubuntu to Arduino and other Qualcomm-powered boards. The Qualcomm and Canonical partnership provides developers with a reliable, security-focused, high performance operating system. In this video, Qualcomm breaks down how upstreaming, open standards, and Canonical solutions help developers move faster, from first boot to production.

How Cisco Revolutionized Platform Engineering with Komodor's Agentic AI

In the world of cloud-native infrastructure, complexity is the silent killer of innovation. For Cisco Outshift, the company’s incubation engine, managing a sprawling environment of AWS EKS clusters and edge-based MicroK8s workloads created a classic bottleneck: the Platform Engineering team was drowning in toil. Facing SRE burnout and the limits of human scaling, Cisco embarked on an ambitious journey to evolve its internal operations from standard DevOps to Agentic AI.

[Open beta] Introducing Tests in Bitbucket Pipelines

If you’ve ever watched a pull request sit for hours because of a flaky test, you know how quickly test suites can turn from safety net into bottleneck. As teams grow, test suites tend to grow even faster because every new feature, bug fix, and regression adds more tests, while old or redundant tests are rarely cleaned up, so over time you end up running far more tests than are strictly needed for reliable feedback.

Stateful Vs. Stateless Applications: What's The Difference (And Why It Matters)

Think of a stateful application like a conversation with a barista who remembers your order every time you walk in. They know what you had yesterday, how you like it prepared, and what you’ll probably want next. That memory makes the experience smoother, but it also means that if that barista isn’t around, your experience can break down entirely. A stateless application, on the other hand, is similar to ordering from a self-service kiosk.

Boost your test coverage with CircleCI Chunk AI agent

Test coverage is one of those metrics everyone agrees matters until it’s time to actually write the tests. Between shipping features, fixing bugs, and handling production issues, writing comprehensive tests for edge cases and error paths often falls to the bottom of the backlog. The result is coverage gaps that accumulate technical debt and leave your codebase vulnerable to regressions. As AI-powered development tools reshape how we write code, the volume and velocity of changes is accelerating.

Fix bugs faster with CircleCI's Chunk AI agent

Bugs hide in plain sight. A date validator that rejects February 29th on leap years. An edge case that slips through code review. A flaky test that passes locally but fails in CI. These issues erode trust in your codebase and waste hours of debugging time. In the era of AI-assisted development, code is being written faster than ever. But speed creates risk.

Designing an automated SDLC control

For anyone shipping software in regulated industries, the word “control” gets thrown around all over. Compliance frameworks demand controls, auditors verify controls are used, engineering teams implement controls, and there are even Control Owners. But what exactly is a control? And more importantly, how do we design controls that actually serve their intended purpose while enabling rather than hindering delivery velocity?

How Qovery uses Qovery to speed up its AI project

Discover how Qovery leverages its own platform to accelerate AI development. Learn how an AI specialist deployed a complex stack; including LLMs, QDrant, and KEDA - in just one day without needing deep DevOps or Kubernetes expertise. See how the "dogfooding" approach fuels innovation for our DevOps Copilot.

#051 - Surviving the Shift: From Legacy Monoliths to Day 2 Chaos with Hayato Shimizu (Digitalis)

From the early days of "neural nets" and WebSphere to the modern complexities of Kubernetes, Hayato Shimizu has seen the evolution of infrastructure firsthand. In this episode of Kubernetes for Humans, the co-founder of Digitalis joins the show to discuss the harsh realities of enterprise platform engineering and his personal journey from corporate employee to consultancy owner.

The Hidden Cost of 30% AI-Generated Code #speedscale #aicoding #devops #technews #ai

AI now writes 30% of Big Tech’s code, but the resulting surge in defects is crashing platforms like AWS and GitHub. Manual testing can no longer keep up with this velocity; it's time to deploy AI Quality Agents to save our systems. Is AI speed worth the decline in code quality, or are we headed for a breaking point? Let me know if you’ve noticed more bugs in your workflow lately. Video collab with @ScottMooreConsultingLLC.

DataReader vs DataSet: A Guide to Connected and Disconnected Data Access

DataReader and DataSet are two significant data access models that can greatly impact the performance, scalability, and responsiveness of your.NET application. The connected model, powered by DataReader, keeps a live connection open and streams data forward-only to maximize speed and minimize memory usage. The disconnected model, implemented through DataSet, takes the opposite approach. It loads data into memory so you can edit and reuse it without constant database interaction.

Part Two: Turning Event Intelligence into Action - Real-World Value for Financial Enterprises

Event Intelligence Solutions are redefining how organizations manage complexity and risk across digital ecosystems. Their true power lies not only in detecting anomalies or suppressing noise, but in providing actionable, explainable intelligence that connects IT events to business impact.

How to Reduce Service Desk Workload with AI and Automation

For many IT directors, the service desk feels permanently stretched. It’s a math problem that is forever in motion. Every quarter brings new apps, new devices, new access rules, and new ways for small issues to become daily interruptions. Even when tooling improves, the queue still grows because the work expands with the environment. The pressure shows up in familiar places, like rising ticket counts, tighter SLAs, and a large backlog of projects that need help.

Recapping our webinar on the Engineering in the Age of AI: 2026 Benchmark Report

I remember the first time I used an AI coding assistant. I watched the cursor dance across my screen and generate a hundred lines of code in seconds. It felt like I had finally found a cheat code for software engineering. That initial rush of productivity is a dopamine hit that's intoxicating and makes you think you can do anything with just a simple prompt or two.

Optimize your CI/CD pipeline with CircleCI Chunk AI agent

A slow CI/CD pipeline costs more than just time. Developers context-switch while waiting for builds, feedback loops stretch longer, and compute costs add up with every inefficient run. Most teams know their pipelines could be faster, but optimizing configurations requires deep knowledge of caching strategies, parallelism, and resource allocation. The challenge compounds with AI-assisted development. As AI coding assistants help teams ship code faster, pipelines run more frequently.

Refactor your codebase with CircleCI Chunk AI agent

d function there, and before long you’re navigating a codebase full of inconsistent patterns, repeated logic, and code that’s harder to maintain than it should be. Refactoring is essential, but finding the time to clean up code while shipping features is a constant challenge. The rise of AI-assisted development has accelerated this tension. AI coding assistants help teams ship features faster, but they don’t always produce consistent code.

How dbForge Edge Helps With Database Administration

Looking for a way to simplify database administration across multiple database systems and cloud services? In this video, you’ll learn how dbForge Edge helps DBAs, developers, and data architects manage databases more effectively across SQL Server, MySQL, MariaDB, Oracle, PostgreSQL, Amazon Redshift, and other platforms.

How Deployment Pipelines Power Continuous Deployment

Let's be honest: everyone says they want continuous deployment. "Ship all the time! Move fast! Break absolutely nothing!" But the only reason any of that is even remotely possible is because of one unsung hero quietly doing the heavy lifting behind the scenes: Ah, the deployment pipeline - aka your code's obstacle course / gauntlet / walk in the park… depending on how well it's behaving.

Clustered Directors, Pipeline Debugging, and More Integrations

Over the past two months, VirtualMetric DataStream delivered a substantial update cycle focused on resilience, productivity, and platform extensibility. This release strengthens the core architecture, makes pipeline development and troubleshooting significantly easier, and expands integration coverage across schemas, SIEMs, and cloud platforms. Let’s take a closer look.

Actionable Network Device Monitoring with Automated Anomaly Detection and AI Troubleshooting

Network device monitoring is often a mess of polling, graphs, and alerts that don't lead to answers. In this webinar, we'll show how to monitor routers, switches, and firewalls in a way that quickly surfaces what matters: interface health, errors, drops, saturation, latency signals, and performance regressions—without drowning in noise. You'll learn how Netdata turns raw SNMP metrics into high-signal insights using automated anomaly detection and AI-assisted troubleshooting, so your team can move from 'something is wrong' to 'here's the root cause' faster.

Faster, compliant delivery on regulated cloud with Upsun and IBM Cloud for Financial Services

We are continually enhancing our offering to support enterprises looking to modernize without the pain of modernization. We partnered with IBM to bring our highly flexible cloud application platform to the IBM Cloud Marketplace to give financial service organizations a cloud option that meets both workload and organizational requirements.

Build vs Buy IaC: Choosing the Right IaCM Strategy | Harness Blog

Have you ever watched a “temporary” Infrastructure as Code script quietly become mission-critical, undocumented, and owned by someone who left the company two years ago? We can all related to a similar scenario, if not infrastructure-specific, and this is usually the moment teams realise the build vs buy IaC decision was made by accident, not design.

How to Scale GitOps Without Hitting the Argo Ceiling | Harness Blog

The Argo ceiling is a predictable scaling challenge, not a failure of Argo CD or GitOps. As clusters and teams grow, visibility, governance, and orchestration fragment without a control plane. Script-heavy workflows and manual processes slow delivery and increase risk at scale. A GitOps control plane enables unified visibility, structured workflows, automated guardrails, and secure secret management. GitOps has become the default model for deploying applications on Kubernetes.

Kubernetes Cost Traps: Fixing What Your Scheduler Won't | Harness Blog

Kubernetes cost overruns usually come from small, invisible scheduling decisions—not the platform itself. Over-provisioned requests, poor bin packing, and fragmented node pools quietly waste cloud spend. Cost-aware scheduling, right-sizing, and smarter node selection can deliver major savings without hurting performance. Treat cost as a first-class metric with visibility into why scaling decisions happen—not just when.

Gemini Cost Per API Call in 2026: What You'll Actually Pay (And How to Control It)

On paper, Gemini pricing looks straightforward. You pay per token. Input tokens cost one amount, output tokens cost another, and different models come with different rates. But once Gemini is wired into a production SaaS product, that simplicity disappears. Fast. That’s because token usage compounds across context, retrieval, and output — not across requests. The same “API call” can cost pennies in one feature and dollars in another.

Webinar (Jan 15 2026): Take Back Control of Your Infrastructure (feat. nvisia)

Learn how leading teams are reducing complexity, controlling costs, and building resilient environments with modern private cloud patterns.. What we covered: If you’re evaluating private cloud, hybrid infrastructure, or looking to take back control of your infrastructure in 2026, this session provides a clear, actionable starting point. Reach out to our team to learn more today!

Can We Still Trust the Code? #speedscale #qualityassurance #digitaltwin #trust #devops

The "Velocity Gap" is real. AI like Claude and GitHub Copilot are pumping out code faster than ever, but there’s a catch: Engineers don't trust it yet. We’re moving away from the old days of "clicking around" in a test environment, but how do we verify code at the speed of light? Ken breaks down why the future of QA isn't just "testing," it’s simulation. Video collab with @ScottMooreConsultingLLC Learn More: speedscale.com.

The 4 pillars of AI in 2026: Agents, cost, observability & sovereignty

AI is no longer just about "one-shot" prompts. In this session from our "From Idea to Agent" webinar, Ben Norris (AI Engineer at Civo) breaks down the four key priorities dominating the enterprise space in 2026. From the 130x explosion in token usage to the "vibe-coding" revolution, learn why businesses are turning away from US hyperscalers in favor of democratized, secure, and UK-sovereign AI infrastructure. We explore how autonomous agents are solving multi-step problems and why "Chain of Thought" reasoning is unlocking AI for heavily regulated industries like finance and healthcare.

Let's Encrypt is moving to 45-day certificates before everyone else

The CA/Browser Forum set 47-day certificates as target for 2029. Let’s Encrypt decided to implement it a year earlier. In December 2025, Let’s Encrypt announced their roadmap to cut certificate lifetimes from 90 days to 45 days by February 2028, a full year ahead of the industry mandate. It’s exactly what we’d expect from the CA that made automation mandatory from day one.

AI SRE in Practice: Resolving Node Termination Events at Scale

When a node terminates unexpectedly in a Kubernetes cluster, the immediate symptoms are obvious. Workloads restart elsewhere, services experience partial outages, and alerts fire across multiple systems. The harder question is why it happened and how to prevent it from recurring. This scenario walks through a node termination event where the entire node pool was affected, requiring investigation across infrastructure layers to identify root cause and implement lasting remediation.

AI Hosting: The Colocation vs. Cloud Dilemma for Your Next Project

Organisations running AI workloads, like banks training fraud detection models, hospitals testing diagnostic tools, or manufacturers using predictive analytics, all face the same problem: hosting them is costly and resource-intensive. They require dedicated GPUs running non-stop, vast amounts of data moving in and out, and far more power and cooling than a typical IT system.

Stop Flying Blind: Synthetic Monitoring, Host heat-maps, and Process-Level Visibility

January 2026 Release Here's a dirty secret about observability: most teams find out about outages from their customers. Not from their dashboards. Not from their alerts. From angry tweets and support tickets. The excuse is always the same: "We have metrics! We have dashboards! We even have that AI thing now!" And yet, somehow, your checkout endpoint has been returning 502s for forty-five minutes and you're learning about it from the VP of Sales who just got off a call with your biggest customer.

Console Connect Ecosystem Update January 2026

In this ecosystem update, we share the latest additions to the Console Connect platform, including our expansion into Malaysia with eight new data centre locations, enhancing connectivity across the Asia Pacific region and worldwide. Most new locations are in Cyberjaya, Malaysia’s prime data centre hub near Kuala Lumpur, offering robust dark fibre networks, redundant power and high-speed data transmission for secure, high-performance enterprise connectivity.

How Do I Integrate DCIM With My Existing ITSM System?

In many organizations, ITSM tools and data center infrastructure tools operate in separate silos, leading to incomplete records and limited visibility. CMDB records are often incomplete or out of date because updates rely on manual entry, while incidents, changes, and service requests in ITSM lack full visibility into the physical infrastructure. Integrating DCIM with ITSM closes this gap, ensuring CMDB data matches reality and linking service workflows to accurate, actionable information.

Feature Friday: Personalized Context in One Click. New Cortex MCP My Workspace

Stop digging through tabs and explaining your role to your tools. In this Feature Friday, we’re unveiling 'My Workspace' for the Cortex MCP, designed to give you instant, personalized identity the moment you start your day. Before today, you had to manually fetch data from Jira, GitHub, and internal docs just to figure out your priorities. Now, you can simply ask, "What should I work on this week?" and get a parallel, high-speed pull of your entire ecosystem.

From Trough to Traction: 10 Real-World Lessons in Cloud and AI Efficiency

When CloudZero CTO Erik Peterson joined the SourceForge podcast in January 2026, he didn’t just talk about cloud costs. He reframed them as a launchpad for innovation, survival, and competitive advantage. Whether he was describing the “trough of lost innovation,” the “freemium tax,” or why efficiency is the next frontier of engineering culture, Erik’s expert insights go beyond FinOps hygiene.

Stop wasting time on Postgres migrations. #speedscale #postgresql #postgres #database #programming

If you're spinning up a whole container just for one test, you’re doing it wrong. Old way: Full DB container + pg_restore New way: speedscale + proxymock It records actual DB traffic and mocks it "on the wire." Test smarter, not harder.

AI Can't Prove Compliance by Itself

AI is moving fast, and it’s tempting to believe it can automate software governance end to end. But compliance and security aren’t probabilistic problems. They don’t accept “close enough.” They don’t accept summaries. They can’t tolerate hallucinations. Governance depends on facts. Irrefutable, provable evidence of how systems actually changed.

Governance Doesn't Stop at Deploy

Most governance models focus on what happens before production. Approvals. Tickets. Change records. But software delivery doesn’t end at deploy. Runtime is where change management is validated. It’s where systems prove whether controls actually work and where risk becomes real. If governance stops at deployment, you’re not managing change. You’re managing intent. In this video, Mike Long (CEO & Co-founder, Kosli) explains why runtime is the true source of control, why approvals alone don’t reduce risk, and how modern teams build governance that reflects reality, not paperwork.

Cloud sovereignty vs. Cloud innovation: Why India doesn't have to choose

As we witness the rise of AI, the need for sovereignty is no longer optional. For organizations deploying larger models with access to sensitive data, it is a requirement. Research has shown concerns around sovereignty ‘hindering innovation’ and having ‘knock-on consequences for innovation’. We don’t see it that way. Sovereignty isn’t a trade-off for innovation; in fact, for India to scale securely, the two must work in tandem.

How Does Website Infrastructure Impact Operational Efficiency in Growing Teams?

Growing teams don't struggle because of big strategic questions first. They stumble on slow dashboards, broken logins, and sites that freeze during peak traffic. Website infrastructure either clears the runway or scatters debris across it. When systems respond fast, teams ship faster, support fewer fires, and argue less about whose tool failed. Poor infrastructure does the opposite. It multiplies tickets, adds delays, and burns morale. The pattern shows up in every scale-up: technology either amplifies discipline or exposes chaos instantly, sometimes in a single intense quarter of growth.

An introduction to GPU time-slicing

GPUs are no longer a niche component. Gamers know them for immersive graphics, workstation users rely on them for balanced performance, and in the age of AI, GPUs have become one of the most in-demand resources in modern infrastructure. They are also expensive. That reality creates two immediate constraints, for individuals and enterprises alike: GPU-backed instances should be provisioned deliberately, and once provisioned, they should be used efficiently.

A cleaner, customizable Bitbucket navigation is here

Last month we shared that a new navigation system is coming to Bitbucket, and we know many of you have been eager to see what it looks like. Today, we’re happy to share that the new navigation is available for to all Bitbucket users. This article covers what’s changing in Bitbucket, when it’s happening, and how you can share feedback with us.

AI Anomaly Detection: Catch AI Cost Surprises Before They Kill Margins

Consider this: traditional cloud cost monitoring was like checking your fuel gauge once a month — after the trip was already over. That model worked when infrastructure scaled slowly. You provisioned resources predictably and paid for stable, linear usage. AI breaks that model. Today, AI costs behave like a high-performance engine with a hypersensitive throttle. A small input, like a prompt change or a single power user, can dramatically increase your fuel burn in seconds.

Drift Under Control: Keep Your Infrastructure Consistent with Continuous Detection, Intelligent Analysis, and Safe Remediation

In cloud-native environments, infrastructure is in constant flux. Teams move fast, leveraging Infrastructure-as-Code (IaC), ephemeral resources, and automation to iterate quickly. But speed brings a cost: configuration drift. A single manual change in the cloud console, an untracked automation script, or an out-of-band fix can cause your infrastructure to fall out of sync with code. Over time, this erodes trust, breaks pipelines, and introduces silent risk.

MCP: Why AI Needs Git Intelligence

GitKraken CTO Eric Amodio breaks down the Model Context Protocol (MCP) and explains why Git intelligence is critical for AI agents at GitKon 2025. In this session, Eric covers: What MCP is and why every major AI company adopted it Why AI needs Git history, not just file system access How GitKraken MCP removes Git pain safely The future of agentic developer workflows How Commit Composer uses AI to organize commits without losing data.

GitKraken Insights | Engineering Intelligence in Minutes

Most software intelligence tools take months to implement, cost a fortune, and end up collecting dust. GitKraken Insights is different. It helps engineering leaders measure what matters: AI impact, code quality, delivery performance, and developer experience, all in one place. It’s the latest evolution of the GitKraken DevEx platform, trusted by over 40 million developers. Insights connects data from across your GitKraken tools to give you a complete picture of engineering health and value. We're talking DORA metrics, pull request metrics, and AI impact.

From idea to agent: Building AI workflows with relaxAI and n8n

Join us for this live online webinar as we explore how to design, build, and deploy practical AI agents using n8n’s workflow automation platform powered by relaxAI’s UK sovereign infrastructure. Our speaker, Ben Norris, AI Engineer at Civo, will guide you through the real-world process of creating intelligent agents that automate tasks across tools and services, all without deep coding expertise.

Enforcing web performance budgets in CI/CD with Sitespeed.io and Slack

Keeping your website fast as new features are introduced is a challenge. Performance regression is common issue that continues to plague websites, especially those of SaaS companies. In performance regression, newly shipped features introduce bloat, leading to slow page loads and reduced user conversion rates. This is exactly what setting performance budgets helps prevent.

[Webinar] Building Quality-Driven Agentic AI in Noisy Big Data Environments

Watch as Itiel Shwartz, Komodor CTO and Co-Founder as he shares hard-won lessons from developing an AI agent that processes millions of K8s events daily to deliver autonomous troubleshooting that reached 95%+ accuracy in benchmarking. This webinar covers: Building production ready systems that maintain reliability when 90% of your data is noise. How Komodor developed an AI SRE agent that processes millions of K8s events daily to deliver autonomous troubleshooting that reached 95%+ accuracy in benchmarking.

Why Infrastructure Stability Is Critical for Reliable DevOps Pipelines

Automation in DevOps helps teams move code from a commit to production faster. But it only works when the infrastructure is reliable and consistent. If servers fail, configurations drift, or scaling behaves unexpectedly, even a well-built pipeline can break. Stable infrastructure is what lets teams deploy many times a day with confidence instead of spending hours fixing failed releases. Often, the biggest difference between strong DevOps teams and struggling ones is how dependable their infrastructure is for continuous delivery.

HubSpot and QuickBooks Integration: Best Practices for 2026

HubSpot QuickBooks integration is now a core operational requirement as data silos continue to disrupt workflows across 82% of enterprises. It addresses a critical failure point between CRM and accounting, where disconnected systems fragment customer, invoice, and revenue data. In practice, this separation often leads to duplicated customer data, invoice corrections, and revenue reports that lag behind real activity.

How to Use PostgreSQL AI for Query Writing and Optimization

PostgreSQL AI is gaining attention as SQL complexity increases in production environments. It addresses a common problem: extended queries that accumulate joins, nested logic, and edge cases. Without AI assistance, these queries are often harder to write and review, driving 20–40% of developer time into debugging. In practice, these challenges affect PostgreSQL users in different ways.

How GitKraken's AI-Powered Commit Composer Eliminates Git Cleanup Headaches

As developers, we’ve all been there: a frantic coding session, a few hasty commits, and suddenly our Git history looks like a patchwork quilt of “fix,” “oops,” and “stuff.” While git rebase -i is a powerful tool for cleaning up, it’s also a source of anxiety for many, often leading to more headaches than it solves. What if you could achieve a pristine, meaningful commit history without the fear of breaking things or hours spent squashing and rewriting?

Reliability Resolutions: How to build effective reliability programs that won't fade away

Did you know the third week of January is the most common time for people to fail New Year’s Resolutions? It doesn’t matter whether it’s exercising more, learning a new language, or just trying to drink less coffee, that initial surge of fresh New Year’s energy is fading, and if you want to make a resolution stick, this is the key time to make a lasting change. The same is true with any reliability resolutions you might have made.

Harness AutoStopping - FinOps Automation for Intelligent Cloud Cost Optimization | Harness Blog

Harness AutoStopping helps FinOps teams eliminate up to 70% of idle cloud spend through intelligent, policy-driven automation. By automatically stopping and restarting unused resources without disrupting developers, organizations move from reactive cost reporting to continuous, proactive cloud cost optimization.

2026 insights into the Indian cloud market

India is no longer just a fast-growing cloud market; it is becoming a strategically vital one. What was once a race for cost efficiency and global hyperscaler expansion has evolved. Today, India’s cloud landscape is being reshaped by a new reality: the need for AI infrastructure, true data sovereignty, and the ambition to own its digital future. Following the discussion at Civo Navigate India 2025, one thing is clear: the status quo is shifting.

Reducing Alert Noise with Composite Alerts in Hosted Graphite

Traditional alerts are simple by design: if a metric crosses a threshold, fire an alert. While that simplicity makes alerts easy to configure, it also leads to alert noise, because single metrics rarely tell the full story and often trigger during non-actionable conditions. Hosted Graphite Composite Alerts solve this by allowing you to combine multiple alert conditions using logical expressions like AND (&&) and OR (||).

Mastering waits and timeouts in Playwright

If you have written any kind of end-to-end tests or UI tests you probably know that the greatest headache to deal with is test flakiness due to browser actions not behaving in the way that you expect them to behave. This flakiness can be a major bottleneck especially in CI/CD pipelines due to constant failures.

Certificate permissions with CertKit Applications

When you’re managing a handful of certificates, one big list works fine. Add a few dozen more and things get messy. Add multiple teams or projects and you’ve got a problem. Who should have access to the production certificates? What about staging? Does the contractor working on the marketing site really need to see your internal infrastructure? CertKit now supports multiple applications from our roadmap to help you sort this out.

Why Cost-Cutting Usually Breaks Your Product (And What to Do Instead)

Reactive cloud cost-cutting leads to “Infrastructure Atrophy,” sacrificing performance and reliability for short-term savings. The 2026 solution is cloud cost optimization, leveraging scale-to-zero and pay-per-use architectures to eliminate idle waste without compromising product health.

ROI of Digital Twin Testing: Cut Testing Costs by 50%

When engineering leaders review their cloud bills, they often focus on production costs—the infrastructure serving real users, processing real transactions, generating real revenue. But there’s a shadow cost lurking in every cloud environment that often goes unnoticed until it becomes painful: non-production infrastructure.
Sponsored Post

Digital Twins Gone Wild: My Unexpected AI Doppelgänger

I recently tried using AI to create a digital twin of myself. I uploaded a photo, expecting a futuristic, slightly improved version of me... and what did I get in return? A picture of Kim Jong Un. Clearly, AI has a sense of humor-or a very different definition of "twin." Forget Arnold Schwarzenegger and Danny DeVito. Digital Twins 2-Now Starring My AI Doppelgänger From Speedscale's perspective, a digital twin is built from real production traffic, continuously updated, and executable in your test and CI/CD environments.

Moving Our Observability Data Collector from Sidecars to eBPF

For years, the Kubernetes sidecar pattern has been a practical way to capture observability data. Running a collector alongside each application pod gave us deep visibility into traffic, including full request and response payloads across supported protocols. However, as cloud-native environments have grown more complex, the limitations of sidecars—such as resource overhead, operational complexity, and scaling challenges—have become more apparent.

Announcing the Harness Human-Aware Change Agent | Harness Blog

AI that understands human insight and connects it to the changes that drive real incidents. At Harness, our story has always been about change — helping teams ship faster, deploy safer, and control the blast radius of every modification to production. Deployments, feature flags, pipelines, and governance are all expressions of how organizations evolve their software. Today, the pace of change is accelerating.

AI In 2026: Autonomous, Invisible, Expensive

With all we’ve seen from AI in the last several years, it can be easy to forget that it’s still in its very early days. As torrid as its evolution has been thus far, it will only intensify. As SVP of Engineering at a B2B SaaS company, I’ve had a front-row seat for much of this evolution. Here are three ways I see AI heading in 2026.

How AI amplifies your entire engineering culture

Anyone who has ever attempted to learn the guitar knows the lure of buying high-end gear. Surely, an expensive guitar and a best-in-class amplifier will hide the fact that you only know a few chords and maybe the lead line to that one song you keep hearing on the radio. What most players find out, however, is that spending thousands of dollars on gear doesn't change the fact that you're not that good yet.

How is the next wave of AI impacting the Indian cloud scene?

Gartner has predicted that 2026 will see a 10.6% increase in India’s total IT spend from 2025 (2025: USD 159 billion vs 2026: USD 176.3 billion), with data centres, cloud infrastructure, and AI-enabled technologies driving this growth. This isn’t just a budget increase; it’s a fundamental shift in where innovation happens, who owns the infrastructure, and how we translate AI potential into scalable impact.

How CIOs Build the Business Case for IT Automation ROI

CIOs rarely struggle to find automation ideas. What they struggle with is getting those ideas funded, then keeping support once the first few workflows go live. We built this guide for IT leaders who need a credible, repeatable way to present IT automation ROI in language that resonates with the C suite. If you are shaping an automation program across service desk, IT operations, and network operations, our Agentic Automation for CIOs & CTO’s hub lays out the strategic lens.

ServiceNow Without the Ticket Hell

ServiceNow is the system of record for change and approvals in most regulated enterprises. And yet, for many teams, it has become the place where delivery slows to a crawl. Not because ServiceNow is broken. But because the evidence model underneath it is. Developers ship fast through modern CI/CD pipelines, automated tests, and security scans, only to hit a wall when changes reach approval. Tickets bounce back. Evidence is questioned. Screenshots do not tell the full story. CABs hesitate. Releases wait.

Evidence, Not Screenshots. How Teams Stay Always Audit-Ready in ServiceNow

In regulated environments, slow change is often blamed on process. Too many approvals. Too much governance. Too much red tape. But in reality, most delays are not caused by regulation itself. They are caused by missing, fragmented, or untrusted evidence. Screenshots pasted into tickets. Proof assembled weeks later. Approvals stalled because no one can confidently say whether a change actually meets policy. When evidence is an afterthought, compliance turns into chaos.

Event Intelligence Solutions - A New Era for IT Operations

In an era where digital performance defines business success, large enterprises are embracing Event Intelligence Solutions (EIS) to keep services available, resilient, customer-facing operations protected from disruption. According to Gartner, Event Intelligence Solutions use AI and advanced analytics to enhance and automate how organizations respond to signals generated by digital services.

Evidence, Not Screenshots

In regulated environments, slow change is often blamed on process. In reality, it’s caused by missing, fragmented, or untrusted proof. Screenshots. Tickets. Manual approvals. Evidence assembled after the fact. In this video, we show what changes when compliance policies are embedded directly into release workflows — and when immutable, machine-readable evidence is captured automatically across CI/CD.

ServiceNow Without the Ticket Hell

ServiceNow is the system of record for change and approvals in most regulated enterprises. But when evidence lives elsewhere — scattered across CI tools, scanners, tickets, and screenshots — approvals slow down and audits become painful. Developers waste hours chasing proof. CABs approve changes without confidence. Auditors reconstruct history months later. In this video, Matt Bailey shows what changes when evidence is produced continuously, directly from the delivery pipeline, and linked into ServiceNow workflows.

Resolve's Zero Ticket Video Series - Software Installation Request #itautomation #aiautomation

Software installs and access requests do not need tickets. In this Zero Ticket Video Series demo, see how RITA, Resolve’s AI-powered IT agent, automates a Jira access request end to end. RITA understands the request, validates role and policy, provisions temporary access, routes approvals automatically, and delivers full access once approved. No ticket queues. No manual handoffs. Just real resolution through agentic automation.

Delegated DNS validation: proving domain ownership without exposing credentials

It seems like every service wants proof you control your domain. Certificate authorities need it to issue certificates. Email platforms need it to authorize sending. Analytics needs it to gather data. Just add this magic TXT record to your DNS, wait for propagation, click verify. It works fine when it’s a one-time setup, but certificate lifetimes are dropping to 47 days, and you won’t be able to keep up on that schedule.

High Cardinality Metrics: How Prometheus and ClickHouse Handle Scale

TL;DR: Prometheus pays cardinality costs at write time (memory, index). ClickHouse pays at query time (aggregation memory). Neither is "better":they fail differently. Design your pipeline knowing which failure mode you're accepting. -- Every month, someone posts "just use ClickHouse for metrics" or "Prometheus can't handle scale." Both statements contain a kernel of truth wrapped in dangerous oversimplification.

The next wave of AI: Open source, robotics & the future of India's tech powerhouse

As we kick off 2026, the tech landscape is being reshaped by the very breakthroughs discussed at Civo Navigate India 2025. This panel, featuring Josh Mesout, Murthy Chitlur, Chirotpal Das and Anjali Batra, laid the groundwork for the AI-driven world we are operating in today. From the rise of agentic AI and small language models to the massive shift toward open-source parity, these experts didn't just discuss trends; they provided the blueprint for building resilient, sovereign, and scalable AI infrastructure in India.

VirtualMetric's Hybrid Security Data Collection Architecture: Performance and Scale Without Compromise

Modern security operations face a growing architectural challenge: collect telemetry from everywhere, process it in real time, and route it to multiple platforms while maintaining data sovereignty, avoiding agent sprawl, and keeping costs under control. Single-model collection strategies force security teams to make compromises. Agent-only models create operational overhead and maintenance risk. Agentless-only approaches simplify operations but limit depth and flexibility.

5 things to do before you go on-call for the first time

Going on-call for the first time can feel a bit overwhelming, but a little prep work makes it smooth and stress-free. This guide covers five things to set up before you start your first on-call shift. They help you stay on top of your schedule, get on-call notifications, and have a backup in place. By the end, you’ll be ready to handle your first on-call shift with confidence.

Why data sovereignty has become a strategic imperative for India

Data is the backbone of the modern economy, but if you don’t have control of the infrastructure, you don’t control the data. Historically, data sovereignty was a compliance checkbox, something for the legal team to handle. Today, it’s a strategic national priority. At Civo Navigate, we sat down with industry experts to unpack why India is now placing sovereignty at the centre of its digital strategies.

Megaport's Full Solution Portfolio Is Coming to India

From cloud on-ramps to virtual edge services, Megaport’s expansion into India from March 2026 will support modern network design in the market. India’s digital economy is growing at incredible speed. Cloud and AI adoption is accelerating, data volumes are scaling, and more businesses are designing architectures that extend well beyond a single data center or cloud region.

Showcasing open design in action: Loughborough University design students explore open source projects

Last year, we collaborated with two design student teams from Loughborough University in the UK. These students were challenged to work on open source project briefs. Team 1 focused on non-code contributions, while Team 2’s brief was to create a unified documentation experience, giving them a chance to apply their design skills to real-world problems within the open source ecosystem.

AI SRE in Practice: Diagnosing Configuration Drift in Deployment Failures

Deployments fail for dozens of reasons. Most of them are obvious from the error messages or pod events. But when a deployment rolls out successfully according to Kubernetes but your application starts experiencing latency spikes and error rate increases, the investigation becomes significantly harder. This scenario walks through a configuration drift incident where the deployment appeared healthy but available replicas were constantly flapping, creating cascading reliability issues.

Mock vs Stub: Essential Differences

When discussing the process of testing an API, one of the most common sets of terms you might encounter are “mocks” and “stubs.” These terms are quite ubiquitous, but understanding exactly how they differ from one another - and when each is the correct method for software testing - is critical to building an appropriate test and validation framework. In this blog, we’re going to talk about the differences and similarities between mocks and stubs.

India's path to digital independence: AI, Cloud, and Sovereignty

Digital sovereignty has moved from theory to necessity as organizations grapple with data control and independence. At Civo Navigate India 2025, Rahul Poruri, Toshal Khawale, Deepthi Anantharam, and Kunal Kushwaha examined how nations are balancing innovation with the need for multi-jurisdictional compliance.

Harness Sweeps Three Major Categories in DevOps Dozen Awards | Harness Blog

Harness has been recognized by TechStrong Group for its comprehensive, AI-native platform vision, winning Best End-to-End DevOps Platform, Best Platform Engineering Solution, and DevOps Industry Leader of the Year. At Harness, our mission has always been simple but ambitious: to enable every software engineering team in the world to deliver code reliably, efficiently, and quickly to their users, just like the world’s leading tech companies.

Introducing Code Optimizer (beta) - Better and Safer Infrastructure Code, Right Inside Your Git

Infrastructure code rarely stays clean on its own. Teams move fast, and reviews aren’t always deep or consistent. Over time, misconfigurations build up and increase the risk of outages, security gaps, or unpredictable behavior. Static scanning tools can help, but they often require setup, expertise, and don’t always reflect how infrastructure code is actually used across environments. Code Optimizer, now in beta, helps teams catch those issues earlier.

Easy Guide for Connecting VictoriaMetrics to a Grafana Data Source

VictoriaMetrics is a fast, cost-efficient, and highly scalable time-series database designed as a drop-in replacement for Prometheus storage. It is widely used for collecting, storing, and querying metrics at scale, while remaining lightweight enough to run as a single binary or container. Because it is fully Prometheus-compatible, VictoriaMetrics supports standard PromQL queries and integrates seamlessly with Grafana.

Observability Pricing Models: How to Evaluate Cost, Value, and Predictability

Observability pricing often seems reasonable at the outset, but many organizations discover their real complexity only as environments scale and usage patterns change. As environments grow more complex and hybrid by default, many organizations struggle with rising costs, fragmented tools, and pricing models that complicate cost predictability and long-term planning.

The CES Hangover: 3 Expensive Hardware Fails That Were Actually Software Problems

The dust has settled on Las Vegas. We saw transparent TVs, cars that drive sideways, and enough “AI-powered” toothbrushes to confuse a dentist. CES is incredible at selling the dream of hardware. The demos are slick, the lighting is perfect, and everything works on the showroom floor. But as engineers, we know the dirty secret of CES: The hardware is the easy part.

Universal Mesh in action: how PayPal solved multi-cloud complexity with HAProxy

The hardest part of modern infrastructure isn’t choosing your deployment environments — it’s bridging communication between them. Large enterprises are constantly facing the challenge of keeping everything connected, secure, and fast when their infrastructures are spread across different clouds and on-premises systems.

Why container security only works when the platform owns it

Container security has finally gone mainstream. When Docker announced hardened container images in late 2025, complete with minimal attack surfaces, non-root defaults, continuous CVE scanning, and automated updates, the response was enthusiastic. For teams managing their own infrastructure, this was a real step forward. Secure-by-default containers are no longer niche or expensive. They are expected.

Service Desk Automation Playbook To Improve KPIs and Agent Morale

Service desk leaders are being asked to do more with less. Ticket volumes keep climbing. SLAs keep tightening. Headcount rarely follows. Dashboards fill up fast, and before long, every conversation seems to start with a metric that’s in the red. Automation is pitched as the answer. But when it’s introduced only as a way to move faster or cut costs, it can backfire.

Fleet Management and Terraform: Use cases and best practices for managing collectors in Grafana Cloud

Earlier this year we launched Grafana Cloud Fleet Management to address the pain that comes with managing scores of telemetry collectors across departments and environments. We've been excited to see how organizations are using it to manage collectors at scale, but we've also heard from users who aren't sure how Fleet Management fits with their existing infrastructure-as-code tooling. The good news is Fleet Management is designed specifically to complement—not replace—tools like Terraform.

Getting started with on-call

Setting up on-call is simpler than it seems. It comes down to a few clear decisions about your team and what your service actually needs. This guide walks you through those decisions. You’ll learn who to add in your rotation, how long shifts should last, when to hand off, and what coverage makes sense for your service. By the end, you’ll know exactly how to set up your first schedule and move from ad-hoc firefighting to organized incident response.

GitKraken Desktop 11.8: Visibility Where It Matters, Undo When It Doesn't

Some releases break new ground. Others clear the path. GitKraken Desktop 11.8 does both. You know that moment when you’re three commits deep into an interactive rebase and realize you’ve made a terrible mistake? Or when you’re trying to explain what changed on a feature branch, but it means manually selecting 47 commits? Or when you just want to preview a README without opening another app?

"You Had One Job": Why Twenty Years of DevOps Has Failed to Do it

Let’s start with a question. What is DevOps all about? I’ll tell you my answer. In retrospect, I think the entire DevOps movement was a mighty, twenty year battle to achieve one thing: a single feedback loop connecting devs with prod. On those grounds, it failed. Not because software engineers weren’t good at their jobs, or didn’t care enough. It failed because the technology wasn’t good enough.

Canonical Ubuntu and Ubuntu Pro now available on AWS European Sovereign Cloud

January 15, 2026 – Canonical, the publisher of Ubuntu and provider of open source security, support, and services, announced today that it is a launch partner for the AWS European Sovereign Cloud, a new independent cloud for Europe, with Ubuntu and Ubuntu Pro now available. Canonical’s Ubuntu Pro delivers a securely designed, stable, and enterprise-ready foundation for open source innovation while providing customers with the same security, availability, and performance they expect from AWS.

Agentless First, Agents When Needed: A Hybrid Approach to Security Telemetry

Security data collection has become a first-class architectural concern for modern SOCs. Once collection is treated as a dedicated layer, separate from analytics and detection, the next question becomes practical: how should telemetry be collected in a way that aligns with this architecture? In the previous article, we examined why this shift occurred. Here, we focus on how different collection models (agent-based, agentless, and hybrid) fit into modern security data collection architectures.

Democratizing Reliability: Giving Non-Engineers Real Operational Power with Dileshni Jayasinghe

Many companies don’t invest in incident management until something goes wrong. commonsku took a different path. In this episode of Humans of Reliability, Sylvain sits down with Dileshni Jayasingha, VP of Technology at commonsku, to talk about what it really takes to introduce incident management in a mature, profitable SaaS that had never formalized it. From rolling out observability and incident tooling to practicing internal status updates before going public, Dileshni shares how her team built the right muscles before they were forced to.

Hidden Cloud Costs: The Cost Behind Every Cloud Click

In the cloud, every click has a cost, even if it doesn’t feel like it at the moment. In this conversation, we unpack how small, everyday cloud decisions quietly add up to significant spend, why teams often miss the true cost behind “simple” actions, and how FinOps and cloud leaders can reframe cost conversations around value, impact, and accountability. If you manage cloud costs, build in Azure, or care about FinOps, this episode will change how you think about cloud decisions.

How to build DORA-ready infrastructure with verifiable provenance and reliable support

The Digital Operational Resilience Act (DORA) came into force across the EU on January 17, 2025, fundamentally changing how financial institutions must approach infrastructure and technology assets resilience. Its requirements around ICT risk management, operational resilience, and third-party oversight signal a broader shift that will ripple across regulated industries worldwide.

AWS Vs. OCI: Which Cloud Services Provider Is Best?

Choosing between AWS and OCI is a common decision for organizations moving workloads to the cloud. Both Amazon Web Services and Oracle Cloud Infrastructure offer global infrastructure, robust security, and broad service portfolios. On paper, the platforms can look interchangeable. They are not. AWS and Oracle Cloud differ in pricing, compute models, storage options, networking, and managed services. These differences affect scalability, reliability, and day-to-day operations.

Scaling Autonomous Operations with Agentic AI demo with Resolve

What does autonomous IT actually look like? This clip shows it in action. In this moment from our Scaling Autonomous Operations with Agentic AI webinar, RITA meets users where they work. Inside Slack. No portals. No tickets. Just answers. Watch RITA pull personalized knowledge in real time, synced directly from systems like SharePoint. Updates publish once and are instantly available everywhere. Then the real power kicks in.

Agents of IT podcast - Ep. 10 - Building Automation that Actually Scales with Yaju Suneja

Automation promises efficiency. Scaling it across a global enterprise is where most teams struggle. In this episode of Agents of IT, we sit down with Yaju Suneja, Global Head of Automation at Stefanini Group, to unpack what it really takes to build automation programs that scale across regions, teams, and technologies. Yaju shares real-world lessons from leading automation at global scale, including.

GitKraken Desktop 11.8 Release: ARM Support, Undo Rebase, More Shallow Clone Support

Happy New Year! This release combines user requests from 11.7 with smart defaults and perf improvements in 11.8. We've got full ARM support, expanded shallow clone functionality, and a mightier UNDO button plus more. What’s New.

Start the year strong: make SQL Server development faster, more reliable, and more consistent

As the new year begins, development teams are looking to build momentum, set clear goals, and establish reliable, scalable processes that will help them deliver value consistently throughout 2026. That’s why many teams are turning to SQL Toolbelt Essentials: a powerful, easy-to-adopt toolkit that helps teams speed up database development, reduce risk, and standardize workflows.

Resolve's Agents of IT podcast - Ep. 10 - Building Automation that Actually Scales with Yaju Suneja

A new Agents of IT episode is live! This week we’re featuring Yaju Sunja, Global Head of Automation at Stefanini Group. Watch to see what he has to say on how enterprises scale automation and move toward autonomous operations.

PagerDuty Appoints Chris Ferro as Chief Legal Officer

PagerDuty, Inc. announces that Chris Ferro has joined the company as Chief Legal Officer. Ferro will oversee all legal functions at PagerDuty, including corporate, compliance, employment and product matters, with a focus on advancing business objectives while mitigating legal and regulatory risk.

Accelerating Automotive Infotainment with Anbox Cloud

In this video, discover how Canonical and Rightware are revolutionizing the automotive software development lifecycle. This demo showcases an automated workflow that deploys Anbox Cloud, spins up an Android Automotive image, and launches a Rightware Kanzi-powered UI that can be streamed at a stunning 8K resolution. By shifting infotainment testing to a cloud-native environment, automotive teams can accelerate iteration, enhance collaboration, and ensure high-performance user interfaces before they ever hit the physical hardware.

Your Cloud Economics Pulse For January 2026

Welcome to January’s Cloud Economics Pulse, CloudZero’s monthly look at cloud spend as AI moves from vibe to prod. And this related news flash — AI spend keeps hitting new highs. pilots to production. In last month’s Pulse, we explored the compounding effect of AI becoming part of everyday cloud operations. This month, we see that pattern harden into year-end results.

The API Metrics Every SaaS Team Must Track In 2026

API metrics have long been a core part of building and operating reliable SaaS products. Teams track the likes of request volume, latency, and uptime to ensure APIs perform as expected under load. First: API cost intelligence metrics measure how API usage translates into cloud, AI, and third-party spend — and attribute that cost to customers, features, workflows, and teams so SaaS businesses can protect margins as usage scales. But today, the API metrics that matter most go beyond performance.

Announcing HAProxy Kubernetes Ingress Controller 3.2

We’re excited to announce the simultaneous releases of HAProxy Kubernetes Ingress Controller 3.2 and HAProxy Enterprise Kubernetes Ingress Controller 3.2! All new features described here apply to both products. These releases introduce user-defined annotations, a new frontend CRD, and other minor improvements, and we’ll cover these in detail below. Visit our documentation to view the full release notes.

Deploy your Spring Boot application to production

In a previous article, we covered how easy it is to create Spring Boot containers with Rockcraft. So the next logical step is to deploy and operate your application in a production environment. The Juju ecosystem is the key to making this process straightforward. In this article we walk through the steps required to deploy a Spring Boot application to production using Juju and Kubernetes.

4 foundations you need to scale AI in engineering

As a baseline, engineering leaders need their teams to adopt AI tools to speed up velocity and ship faster. Most organizations have already rolled out AI coding assistants or are evaluating them, but there's a really big difference between buying a tool and successfully scaling it across an engineering organization. If you layer AI on top of a chaotic codebase or a disorganized service catalog, you accelerate the creation of legacy code.

How to Automate Tier 1 IT Tickets Without Breaking ITSM Processes

Tier 1 ticket automation is one of the most tempting (and, to be brutally honest, most mishandled) initiatives in IT service management. On paper, it seems simple: automate the high-volume requests, reduce handle time, and give your service desk some breathing room. In practice, though, many teams end up with brittle scripts and automations that quietly drift outside ITSM guardrails.

Redgate Flyway 2025 year in review

First, I’d like to say how happy and lucky I feel to be working at Redgate on Flyway. I’m one of two Senior Product Managers in Flyway and we’ve been looking for a third to join us. We work alongside the Flyway Group Leadership Team (Group Product Manager, Architect, Lead Designer, and Development Manager) and four amazing engineering teams with embedded designers. We also work with the Product Support, Marketing, Sales, and Customer Success teams.

Got Drift? Redgate Flyway now helps you resolve it quicker

Teams work on databases across multiple environments (e.g., Development, Test, QA/UAT, Production) and differences can happen in these databases over time. A hot fix applied directly to Production or a quick change applied in Test while troubleshooting are examples of how the schema can diverge from what’s expected. These differences are known as drift and can cause problems with deployments making them unpredictable and harder to troubleshoot.

Easy Guide for Connecting Redis to a Grafana Data Source

Redis is a widely used in-memory data store, commonly deployed as a cache, session store, message broker, or fast key-value database. Because Redis often sits on the critical path of an application, having visibility into its behavior (memory usage, client connections, command throughput, cache efficiency) is essential for troubleshooting and performance tuning.

Why Aging Networks Put Critical Infrastructure at Risk-and What It Means for Us

Everywhere around us, technology is evolving at lightning speed, yet the networks which underpin these capabilities often lag behind. This gap creates vulnerabilities that can impact everything from energy grids to emergency services. Forbes recently explored this urgent issue in an article featuring insights from our CEO Bruce McClelland, who shared an informed perspective on why modernization is essential, not optional. I encourage you to take a few minutes to read the full article.

How To Calculate Your OpenAI Cost Per API Call (And Why It Matters Now)

OpenAI doesn’t bill per feature, per customer, or per transaction. It bills per token, across multiple models, with usage patterns that can change by the hour. As a result, two API calls that support the same feature can have very different costs. Without a clear way to translate token-level pricing into something product, engineering, and finance teams can reason about, AI spend becomes difficult to forecast and harder to control.

Six FinOps Certifications And Courses To Set You Up For Success in 2026

FinOps is evolving fast, and 2026 is shaping up to be a big year for specialization. While these certifications are ranked from beginner to advanced to help you build skills in the right order, one course stands out as the hottest recommendation right now: FinOps for AI. AI spend is accelerating, ownership is getting murky, and teams are scrambling to keep up. That urgency is exactly why FinOps for AI is generating so much interest heading into 2026.

Should you still pay for SSL certificates?

There’s a particular flavor of skepticism that shows up whenever someone suggests using Let’s Encrypt. The security team crosses their arms. “Free certificates? For production? We’re a serious organization. We use Sectigo.” I get it. You’ve been buying certificates from the same vendors for twenty years. They send you invoices, you pay them, certificates appear. It feels responsible, and free feels like a trap. But is it?

Supercharge your LLM Using Production Data Context

Are your LLM coding agents (like Cursor or Claude Code) hallucinating fixes because they don't know what's actually happening in production? In this video, Matt from Speedscale shows you how to bridge the gap between your local IDE and live production traffic using the Model Context Protocol (MCP). Most observability tools just give you telemetry. Speedscale’s MCP server gives your agent the "inner workings" of actual API calls and payloads, so it can check its assumptions against reality. No more "vibe-coding" and hoping it works; let your agent find the 500 errors and rate limits for you.

Production readiness review checklist & best practices

Modern software systems are more distributed, complex, and business-critical than they've ever been. A single misconfigured service can take down an entire platform. Teams are aiming for production readiness, which is the state where your services are secure, reliable, observable, and owned. Production Readiness Reviews (PRRs) are one of the key mechanisms to get there.

Heroku Monitoring Add-ons 2026 and Hosted Graphite

Monitoring performance of Heroku applications helps improve user experience. This blog post covers Heroku monitoring add-ons and explores why Hosted Graphite is the best choice in 2026. We'll discuss the benefits and setup process of the Hosted Graphite add-on. We'll also discuss future trends in Heroku monitoring.

Applying Feature Flag Context To Your OpenTelemetry Spans | Harness Blog

Integrating feature flag context into OpenTelemetry traces enhances observability by recording flag states as span attributes, making it easier to analyze how specific flags influence application behavior. When you toggle a feature flag, you're changing the behavior of your application; sometimes, in subtle ways that are hard to detect through logs or metrics alone. By adding feature flag attributes directly to spans, you can make these changes observable at the trace level.

Let Your LLM Debug Using Production Recordings

Modern LLM coding agents are great at reading code, but they still make assumptions. When something breaks in production, those assumptions can slow you down—especially when the real issue lives in live traffic, API responses, or database behavior. In this post, I’ll walk through how to connect an MCP server to your LLM coding assistant so it can pull real production data on demand, validate its assumptions, and help you debug faster.

AI SRE in Practice: Resolving GPU Hardware Failures in Seconds

When a pod fails during a TensorFlow training job, the investigation usually starts with the obvious questions. The answers rarely come quickly, especially when the failure involves GPU hardware that most engineers don’t troubleshoot regularly. This scenario walks through an actual GPU hardware failure and shows how AI-augmented investigation changes both the time to resolution and the expertise required to handle it.

Cloud Strategy for 2026: the Year of Repatriation, Resilience, and Regional Rebalancing

This year is set to be a pivotal year for cloud strategy, with repatriation gaining momentum due to shifting legislative, geopolitical, and technological pressures. This trend has accelerated, with a growing focus on data sovereignty. These challenges have set the stage for 2026 to be the year of repatriation, resilience, and regional rebalancing. Here, Rob Coupland, Chief Executive Officer at Pulsant, offers his insights.

Speedscale vs. LocalStack for Realistic Mocks

API mocking plays a crucial role in modern software development allowing developers to simulate external API endpoints. It’s an effective way to isolate your application for testing and ensure that code changes don’t inadvertently break critical dependencies. Essentially, API mocking helps you create robust, reliable software by allowing you to test how your application interacts with external services.

How to Do Full-Text Search Across All Application Traffic with Speedscale

Modern DevOps observability tools are excellent for monitoring system health, tracking distributed traces, and aggregating metrics. However, they lack the fidelity needed for full-text search across application traffic. While observability platforms excel at showing what happened and when, they often fall short when you need to find where a specific piece of data (like an email address, user ID, or transaction token) appears as it flows through your entire application stack.

Budget Variance In The Cloud Era: Here's How To Turn Surprises Into Business Value

In the traditional finance world, budget variance was a static comparison between actual and budgeted spend. But in the cloud era, where costs scale with usage, experimentation, and engineering decisions, variance tells a much richer story. Done right, budget variance helps you distinguish between healthy growth and margin erosion. It can signal strong feature adoption, rising customer demand, or successful launches. It can also reveal waste, inefficiencies, and weak cost controls.

Is Kubernetes actually HARD? #speedscale #kubernetes #k8s #devops #cloudnative

Thinking about learning Kubernetes in 2026? You’ll need GitOps, kubectl, and CI/CD pipelines... OR you can just use Speedscale. See how a single operator replaces a million dependencies and gives you the traffic insights you actually need to survive production.

Kubernetes is Hard. Here is the "Easy Mode" for 2026

Is Kubernetes actually hard, or are we just using the wrong tools? In 2026, the Kubernetes ecosystem has become a "dependency jungle." Between GitOps, YAML configuration, kubectl mastery, and complex CI/CD pipelines, developers are spending more time managing infrastructure than writing code. In this video, Ken breaks down the "hard parts" of K8s and introduces a more efficient workflow using Speedscale. Learn how to gain instant visibility into your cluster, pull logs without the headache, and turn real-world traffic into actionable load tests.

Recommended Experiments for Production Resilience in Harness Chaos Engineering | Harness Blog

This guide covers battle-tested chaos experiments for Kubernetes, AWS, Azure, and GCP to help you validate production resilience before real failures happen. Start with low blast radius experiments (pod-level) and gradually progress to higher impact scenarios (node/zone failures), always defining clear hypotheses and using probes to measure results. Building reliable distributed systems isn't just about writing good code. It's about understanding how your systems behave when things go wrong.

Guide to Sending Custom Metrics From Your Heroku Application

Heroku makes it easy to deploy and operate applications without managing servers, but understanding how your application behaves internally still requires instrumentation. Platform metrics like CPU usage, memory consumption, and router request/status counts are useful, but they don’t tell you how long your code takes to run, when your app throws errors, or whether users are interacting with key features.

Top 7 Kubernetes Add-ons

The open-source Kubernetes platform is designed to help simplify application deployment through Linux containers. It supports tasks like deploying workloads in the form of pods, clustering nodes, managing container runtimes, and tracking resources. The Kubernetes microservices system has risen in popularity over the last several years as an easy way to support, scale, and manage applications.

How Standardizing Dev Workflows Boosts Velocity, Quality & Joy - with Jason Gates

What if your dev team loved their workflows? Jason Gates from Sandia National Labs joins GitKraken’s VP of Developer Research, Jeremy Castile, to unpack the real-world challenges and powerful benefits of developer workflow standardization. In this candid conversation, Jason shares lessons from helping dozens of teams improve their software delivery — from reducing friction and boosting velocity, to creating joyful, productive developer experiences. They dive into.

A buyer's guide to engineering intelligence platforms in 2026

You're in a planning meeting when someone asks a simple question. How long does it actually take your team to ship a feature? You've got spreadsheets, Git logs, and Jira exports scattered across three tabs, and you still can't give a confident answer. It's a question you should be able to answer instantly, but the data lives in too many places to stitch together on the fly.

AI coding assistants are only as good as the context you give them

AI coding assistants have quickly become part of everyday development. Teams now rely on them to explain unfamiliar code, suggest configuration files, debug errors, and accelerate delivery across the stack. But as these tools move from experimentation into real production workflows, a consistent pattern is emerging: AI breaks down at the platform boundary.

Harness | Docker Artifact Registry | How to Push and Pull Images

This video provides a clear and practical walkthrough of the Harness Artifact Registry, demonstrating how to work with Docker images in a secure and reliable manner. You will see the complete flow of pushing images into the registry and pulling them back for builds, deployments, and platform workflows. The goal is to help developers and platform engineers understand how the registry fits into everyday delivery pipelines.

Optimizing DCI for AI Growth: All Roads Lead to Managed Optical Fiber Networks

The accelerating demand for artificial intelligence and cloud-based applications is fundamentally altering how organizations approach physical infrastructure. As data center construction shifts toward rural geographies in search of affordable power and real estate, the connectivity binding these facilities together has become a critical bottleneck. Network architects and CIOs are currently facing a complex decision matrix regarding Data Center Interconnect (DCI) deployment.

IT Observability in 2026: Lessons From the Past Year

As IT organizations enter 2026, many of the assumptions around monitoring and observability have already been tested. Throughout 2025, infrastructure teams made it clear that visibility alone is not enough. Alerts without context, short data retention, and fragmented tools limited teams’ ability to explain behavior, validate changes, and plan with confidence. This article looks at what emerged from those experiences and how observability expectations continue to shift.

3AM Pager: When You Know the Data but Can't Search It

Ever tried searching your entire production stack for one user? Getting paged at 3 AM is bad enough. It’s worse when you only have a single username and zero visibility into what’s actually happening across your microservices. With Speedscale, you can perform full-text searches across every API call and database interaction in real-time. Stop guessing and start debugging with total context.

Why gRPC is a Debugging Nightmare #speedscale #observability #grpc #testing #devops

gRPC is fast and efficient - until it breaks at 2:00 AM. Traditional observability tools are built for HTTP/1.1 and JSON. When you switch to gRPC, you’re dealing with binary Protobuf payloads and HTTP/2 multiplexing that most logs and traces simply weren't designed to handle. Speedscale flips the switch by decoding Protobuf directly into human-readable JSON in real-time. Get the speed of gRPC with the visibility of REST.

Running Ansible Playbooks with Puppet Playbook Runner

Are you struggling to manage infrastructure scale with standalone Ansible playbooks? Discover how to consolidate your automation tools without losing your existing investment. In this demo, we explore how Puppet Edge empowers infrastructure teams to seamlessly execute Ansible Playbooks within Puppet workflows.

Sending Custom Application Metrics to MetricFire's Hosted Graphite

In this article, we’ll show how easy it is to send custom application metrics directly to MetricFire's public carbon endpoint. We’ll build a small Flask application, emit a handful of practical metrics, and generate local traffic to demonstrate how quickly meaningful data can flow from your code to your dashboards.

Five Ways to Simplify Data Masking | The Tony and Tonie Show Ep 38

5 signs your data masking is fast, secure, and low-maintenance. Can you protect PII, still deliver realistic test data, and design a data masking solution that’s easy to automate and maintain? Tony and Tonie discuss five key traits of a tool that does just that. Read the full article.

Deploy a serverless Python API to Scaleway Functions using CircleCI

Serverless platforms have revolutionized the way developers build and deploy APIs, eliminating the need to manage servers or underlying infrastructure. With serverless, you can focus entirely on your application logic and let the platform handle scaling, availability, and maintenance. Scaleway Serverless Functions is a flexible serverless platform that makes it easy to deploy lightweight APIs and background jobs in the cloud.

When is it ok or not ok to trust AI SRE with your production reliability?

There’s a moment every engineer knows. An AI suggests a fix, it looks reasonable,maybe even obvious, but production is on the line and you hesitate before clicking execute. There’s a big difference between an AI that can recommend an action and one you’re willing to let take that action. All it takes is one bad call, one kubectl command that makes things worse, and suddenly every automated suggestion is a potential liability instead of a help.

Infrastructure Guardrails: Why Your IaC Stack Needs Them | Harness Blog

Have you ever asked yourself, what is the fastest way to turn a harmless Infrastructure as Code change into a production incident and an awkward postmortem? We did, and found that usually, it's from letting it through without any guardrails. Infrastructure guardrails in Infrastructure as Code (IaC) were once a nice-to-have. Today, they’re essential. Without clear boundaries and safety mechanisms, even well-designed IaC workflows can turn small mistakes into fast-moving, high-impact problems.

Two Small Steps to Measurable Flyway Value | The Tony and Tonie show Ep40

Two small steps with Flyway. Fast, measurable value without disruption, even on fragile legacy databases. Tony and Tonie explore how two simple Flyway integration steps deliver fast, measurable value: more visible change, more reliable migrations, and fewer code issues, all without disrupting your existing development workflow.

Zero Ticket Video Series - Automatically Diagnose & Fix Outlook Email Issues

Outlook email issues are some of the most common and disruptive IT problems. In this demo, watch RITA, Resolve’s agentic IT assistant, automatically diagnose and fix an Outlook email sync issue: end to end, with no ticket and no technician required. See how RITA: Understands the issue through chat in Microsoft TeamsChecks service health and enterprise-wide outagesDiagnoses the root cause on the endpointExecutes the fix automaticallyConfirms resolution with the user.

VirtualMetric DataStream + Amazon Security Lake: OCSF-Ready Security Data Without Custom Pipelines

Security teams are increasingly turning to Amazon Security Lake to consolidate security telemetry across cloud, network, and on-prem environments. Security Lake provides a unified, OCSF-based data repository that powers analytics, threat hunting, and machine learning across AWS services and third-party tools. But to take advantage of Security Lake’s capabilities, organizations must deliver clean, normalized, OCSF-compliant data, and this is where challenges arise.

Navigating the human challenges of IDP adoption

Pragya Jazwal, Platform Engineering Lead at Paxos, compared standing up an internal developer portal to buying a gym membership during her talk at IDPCON 2025. Purchasing the software is one thing, but convincing a team of busy engineers to change their daily habits is a much bigger monster to tame. Pragya says the platform team at Paxos learned this lesson the hard way.

Context Engineering: How Dev Teams 10x Productivity with AI

Context engineering isn't just an AI buzzword. It's how high-performing dev teams are transforming productivity at scale. Chris Geoghegan, VP of Product at Zapier, breaks down why individual AI gains don't compound and what your team needs to do instead.In this GitKon session, learn how to.

AWS API Gateway Pricing Simplified: A 2026 Guide For Cost Savings

Why does AWS API Gateway spend rise even when backend infrastructure stays the same? For most teams, the answer isn’t compute. API Gateway pricing is driven by how APIs are used — request volume, retry behavior, traffic patterns, and growth over time — not by provisioned resources. Because AWS reports these costs as aggregated usage totals, it’s often unclear which APIs, environments, or behaviors are responsible for increases.

Inside Qovery's security architecture: how we secure your cloud & Kubernetes infrastructure

Discover how Qovery bridges the gap between developers and infrastructure with a "security by design" approach. From federated identities and unique encryption keys to real-time audit logs and SOC2 Type 2 certification - see how we protect your data while eliminating vendor lock-in.

Technical Debt for Middle Mile Broadband: Why Access-Agnostic Intelligent Middle Mile Matters

Technical debt refers to the future costs and limitations incurred when organizations opt for short-term solutions over robust, long-term scalable architectures. For the middle mile, technical debt often manifests as equipment or network designs that restrict long-term flexibility, scalability, or interoperability.

Why 2025 Changed Everything for DevOps and What Puppet Built to Meet It

In 2025, DevOps teams faced a pivotal moment. The era of treating security as an afterthought was over. Practically overnight, airtight protection became a non-negotiable requirement across every layer of the technology stack, whether on prem, in the cloud, or at the network’s edge. For many teams, this wasn’t just a technical hurdle; it was a daily source of stress.

DLP: The Key to Secure K8s Testing #speedscale #dlp #kubernetes #devops #testing

Testing with production traffic doesn't have to be a security risk. Engineers often avoid production data because of sensitive info like passwords, tokens, and PII. But legacy test data management is too static for modern, fast-changing payloads. Enter the Speedscale Streaming DLP Engine. It automatically detects and redacts sensitive data in real time as it's captured from your environment. You get the realism of production traffic without the risk of a data breach.

Easiest Way to Connect InfluxDB to a Grafana Data Source

InfluxDB is a widely used time-series database designed for storing and querying metrics, events, and telemetry data. It’s commonly used for infrastructure monitoring, application instrumentation, and IoT-style workloads where time-based data is central. In many environments, InfluxDB already exists as part of the monitoring or data collection pipeline, and the primary need is simply to visualize that data effectively.

Finetuning Gemma 3 on private data with Unsloth and CircleCI

Fine-tuning Large Language Models (LLMs) on private, domain-specific data can unlock significant value for your specific use case. When done correctly, you can create AI apps that understand your organization’s unique context. These apps can speak your brand’s voice and deliver remarkably accurate results that general models cannot match. However, finetuning is not always the right solution. Many teams rush into this complex technique without exploring simpler alternatives first.

Building Operational Resilience for the Year Ahead with Teneo's Digital Employee Experience (DEX)

As we step into a new year, one truth stands firm in financial services: resilience isn’t optional – it’s expected. Markets fluctuate, regulations evolve, and technology accelerates. Amid this complexity, IT leaders carry the responsibility of ensuring that operations don’t just survive disruption, they thrive through it.

How companies are using Civo GPUs to accelerate AI innovation without runaway costs

Accessing high-performance GPUs shouldn’t feel like a bottleneck. Yet, as AI adoption accelerates, many teams are discovering that hyperscaler offerings often come with a hidden price: long wait times, opaque billing, and layers of unnecessary complexity. At Civo, we’ve seen a different way. Our GPUs enable companies to move faster while keeping infrastructure overhead and costs firmly under control.

A Bright Outlook: Building Operational Resilience for the Year Ahead

As we step into a new year, one truth stands firm in financial services: resilience isn’t optional – it’s expected. Markets fluctuate, regulations evolve, and technology accelerates. Amid this complexity, IT leaders carry the responsibility of ensuring that operations don’t just survive disruption, they thrive through it.

Layers of Trust: How to Protect Financial Data from the Inside Out

Prior to working for a software company, I spent most of my career working for financial organizations. I have lots of friends who still do. Talking with one the other day, the question came up, what keeps you up at night? Her one word response was a little surprising: Fraud. Understand, she’s in charge of managing data at a bank. You’d expect maybe uptime, performance, high availability, any of the standard data management worries. Instead, it’s fraud.

Beyond the Hype: Building a Future-Proof Foundation for the AI-Native Enterprise

We are witnessing a fundamental transformation in how software is built. The industry has moved beyond the experimental phase of Machine Learning Operations and entered a complex new reality: the era of the AI Software Supply Chain. The adoption metrics confirm this shift is irreversible. Google reports that 90% of tech workers are now using AI as part of their daily work. Similarly, McKinsey data reveals that 88% of organizations use AI in at least one business function.

How Kubernetes Node Affinity Works (And Why It Matters for K8s Cost Control)

Think about how airlines assign seats on a plane. Some have extra legroom. Some sit near exits. Some are cheaper, while others cost a premium. Certain passengers also have strict requirements, like families traveling together or travelers who paid for a specific class. Now imagine boarding everyone randomly. A passenger who paid for extra legroom (perhaps for health reasons) ends up squeezed into a middle seat. Families scatter across the cabin. Premium seats sit half empty while the back rows overflow.

Chaos Engineering Training: Zonal, Regional Failures and SSL/TLS Certificates Expiration

Learn how to test your system's resilience against critical infrastructure failures. This tutorial demonstrates how to simulate zonal and regional outages to validate your high availability setup, plus how to test SSL/TLS certificate expiration scenarios. Essential for ensuring your applications can handle real-world failure conditions and maintain uptime during certificate-related issues.

Chaos Engineering Training: Chaos Hub, Experiment Templates, Import as Local Copy and Reference

Learn how to leverage Chaos Hub in Harness Chaos Engineering to accelerate your resilience testing. This tutorial covers browsing the Chaos Hub for pre-built experiments, understanding experiment templates, and two key workflows: importing experiments as local copies for customization or referencing them directly from the hub. Perfect for teams looking to quickly implement chaos experiments without building from scratch.

Multi-environment DNS automation on Cloudflare using CircleCI and Terraform

Manually configuring DNS records for staging and production environments is a common pain point for developers and DevOps teams. As your organization grows and you manage more applications across different services, keeping DNS records up-to-date and error-free becomes increasingly challenging and time-consuming. Mistakes in DNS setup can lead to downtime, broken environments, or confusing deployments, especially when juggling multiple teams or microservices.

The business case for internal developer portals in 2026

Throughout 2025, we watched AI transform from a novelty into a non-negotiable requirement for engineering teams. Leaders moved quickly to roll out coding assistants, driven by the promise of unprecedented velocity. But as we settle into this new reality, it’s becoming clear that there is a massive difference between buying a tool and successfully scaling it. You can't just drop AI into a complex organization and expect it to work without a solid foundation.

How to achieve cloud agility without compromising control or cost

As organizations increasingly embrace digital transformation, cloud agility has become a critical priority. Yet, the promise of cloud-native speed and flexibility often comes with trade-offs: loss of control, unpredictable costs, and operational complexity. Many companies find themselves stuck between the desire for agility and the reality of legacy infrastructure or regulatory constraints. At Civo, we don't think you have to choose. We’ve spent years helping teams navigate this tension.

Three key tech trends shaping connectivity in 2026

As enterprise IT continues to evolve, networking is becoming a strategic lever – one that directly impacts agility, resilience and the ability to compete. In a recent podcast episode, our CTO Paul Gampe shares the technologies he thinks will matter most for the foreseeable future. Three themes stood out: the rise of Network-as-a-Service (NaaS), the shift toward transmission on-demand at layer one and the rapid evolution of Model Context Protocol (MCP).

If Your Service Desk Automation Solution Needs More Humans, It's Not a Solution

Whenever service management has struggled, the conventional response was to add processes, tools, and people. That worked for a little while; at the very least, it worked well enough to keep the lights on. However, today’s environments don’t fail quietly or occasionally. They fail noisily and across tightly coupled systems. The end result is an operational model where human effort is consumed reconstructing context instead of resolving problems. More processes don’t fix that.

C# Equivalent of the TINYINT Data Type in SQL

TINYINT is one of the simplest numeric data types you can ever work with in SQL databases. It stores small numeric values, saves space, and is commonly used for flags, statuses, and boolean-like fields. But the moment you bring C# into the picture, things get intriguing. There is no TINYINT keyword in C#, and no one-to-one mapping you can use. Instead, you are left asking an important question: What is the correct C# equivalent of the TINYINT data type in SQL?

A Recap of 2025

In the past, our yearly recaps were mostly about numbers. What we shipped, how much Spike grew, and a long list of stats. See past recaps: 2023, 2024. But 2025 felt different to me. It had many moments that shaped how Spike as a product and the company looks today. Some of them were exciting. Some were uncomfortable, and all of them changed how I think about building Spike. We’re still bootstrapped and operating lean, with a team of fewer than ten people.

The Context Engineering Framework: 3 Shifts for AI-Powered Dev Teams

You’ve probably used AI earlier today. Maybe you asked it to debug a function, generate a test case, or explain a legacy codebase you just inherited. But here’s the thing: you didn’t just type a question and get an answer. You explained your problem, shared background context, pasted code snippets, clarified what you meant, then refined the output until it was actually useful. In other words, you were context engineering.

Top Cloud Cost News From December 2025

Happy New Year, everyone! 2025 was another exciting year filled with impressive AI advancements. As you might expect, some significant cost changes accompanied these new developments. Because reflecting on the past is one of the best ways to prepare for an even stronger future, here’s your end-of-2025 headline round-up, complete with what you can expect going forward into 2026: Get caught up on the details below.

Cloud Cost Optimization Strategies For 2026 And Beyond

Modern SaaS companies aren’t reporting weaker margins because they forgot to rightsize instances or buy reservations. It’s more because cloud spend now moves at the speed of AI experiments, overnight shifts in customer usage, and automated systems that scale in seconds. That’s why the next generation of cloud cost optimization strategies looks fundamentally different from what worked even two years ago.

Simplify Feature Flag Management with Harness FME and OpenFeature

Harness FME continues its investment in OpenFeature, building on our early support and adoption of the CNCF standard since 2022. Evaluate flags consistently across languages and environments, and integrate them seamlessly into your applications without modifying your code. Feature flags are table stakes for modern software development. They allow teams to ship features safely, test new functionality, and iterate quickly, all without re-deploying their applications.

Automating LLM application deployment with BentoML and CircleCI

Shipping application code, especially for LLM-based applications, can be a stressful and complex task. These applications demand intricate model management, careful resource allocation, and manual handling of dependency conflicts. Traditionally, preparing such applications for deployment involves integration tests, containerization, and updating image registries: all time-consuming manual steps. This is where an automated CI pipeline becomes invaluable.

DNS-PERSIST-01 validates a domain once to get certificates forever

With the ACME protocol, to issue a certificate you have to prove you control the domain. The CA gives you a challenge, you complete it, and they issue your cert. The trouble is that every validation method has tradeoffs. And as certificate lifetimes get shorter, those tradeoffs will get more painful. DNS-PERSIST-01 is a new approach coming in 2026 that trades proof-of-freshness for easier operations.

Troubleshoot faster with the GitLab Source Code integration in Datadog

Developers and SREs who rely on GitLab to develop their services often face significant friction when troubleshooting errors or fixing issues that degrade code quality. To understand the context of a problem, they resort to tab-hopping between observability tools and GitLab, connecting stack traces, spans, and profiles back to the right files and commits.

2026 - the year of repatriation, resilience, and regional rebalancing

2025 was a tough year for businesses, with slow growth, high costs, cyber risks and geopolitical uncertainties all contributing to a challenging climate. More than ever, businesses must innovate to survive and grow, and digital infrastructure will play a key role in 2026. Last year I predicted a pivotal year for cloud strategy, with repatriation gaining momentum due to shifting legislative, geopolitical, and technological pressures. This trend has accelerated, with a growing focus on data sovereignty.

From Promise to Practice: What Real AI SRE Can Actually Do When Production Breaks

We’ve written before about the advantages of training an AI SRE on real telemetry data rather than generic Kubernetes documentation. We’ve explained why RAG augmentation based on actual high-scale workload patterns produces better results than LLMs trained on generic scenarios or forum threads. The theory makes sense, the architecture is sound, and the approach is defensible.

Top Odoo Hosting Platforms for Your Business in 2026

Selecting the right Odoo hosting platform isn't just about finding the cheapest option. Performance, reliability, and control determine whether your Odoo implementation succeeds or becomes a constant source of frustration. This analysis examines three different hosting platforms, showing you what each delivers and which businesses they actually serve well.

Podman vs Docker 2026: Security, Performance & Which to Choose

When it comes to containerization technologies, Podman and Docker are the two giants that often come up in conversation. Both have revolutionized how we build, deploy, and manage containers, but what sets them apart? In this blog, we'll dive deep into a side-by-side comparison of Podman and Docker. We'll cover everything from architecture to security, performance, and compatibility.

Datadog Pricing 2026: Full Cost Breakdown + How to Save 40-90%

When it comes to monitoring and observability tools, Datadog is often one of the first names that comes to mind. But while Datadog’s features are widely discussed, its pricing often remains a topic of confusion. How much does Datadog cost, and what factors influence your bill? This guide breaks down Datadog pricing to help you better understand its structure, hidden nuances, and whether it’s the right fit for your needs.

Mature Companies Don't Care About Cloud Costs

“Cut spending!” “Slash costs!” “Stick to the budget!” Poke your head into almost any finance meeting in a SaaS company and you’re likely to hear one or more of the above phrases played on repeat. At first glance it makes sense: Costs are increasing, so we should reduce them. I’d like to challenge that narrative. Mature companies don’t care about cloud costs.

Peeking Under the Hood of Claude Code

Everyone is talking about Claude Code, but few people understand the machinery running in the background. Today, we’re opening up the terminal to see how Anthropic’s coding agent manages state, runs tests, and fixes its own bugs. From the Model Context Protocol (MCP) to its unique React-based terminal UI, find out what makes Claude Code the most "senior" feeling AI assistant on the market.

Is Claude Code Spying for OpenAI? #speedscale #anthropic #openai #claude #codingagent

While analyzing network traffic, we found huge amounts of telemetry including chat snippets, being sent to statsig.anthropic.com. The irony? Statsig was recently acquired by OpenAI. In this video, we use proxymock to intercept the traffic and show you exactly what’s being sent from your terminal to Anthropic (and technically, OpenAI’s infrastructure).

EP #3: Cloud, Kubernetes, and the Evolution of DevOps - The Open Source Observability Podcast

Kris Buytaert is the Co-founder of Inuits, O11y, and ‘DevOps Days,’ an internationally-attended series of DevOps events. He is a passionate advocate of Free and Open Source Software, and is accredited by the community as being a founding instigator of the DevOps movement. In this episode we trace the history of the DevOps movement from its intersection with open source and Agile, through the evolution of Cloud technologies and tools such Docker and Kubernetes, to present day best practices for CI/CD, monitoring, and observability.

HubSpot and Slack Integration: Best Practices for 2026

HubSpot Slack integration has become essential for the 238,000 companies on HubSpot and the 42 million users active on Slack each day. However, the native connector offers only limited functionality. Notifications often get buried, CRM updates remain locked inside HubSpot, and reporting still relies on manual exports. Worse, it lacks workflow customization and bidirectional updates leaving teams without the visibility and flexibility they need. This guide shows how to move past those limits.