Operations | Monitoring | ITSM | DevOps | Cloud

Extending the Application Edge with F5 BIG-IP VE and Megaport Virtual Edge

Learn how F5 BIG-IP VE simplifies multicloud application delivery, security, and traffic management with MVE. As enterprise applications continue spreading across multiple clouds, the application edge is changing. A few years ago, application delivery was usually tied to a physical appliance sitting in a data center; today, applications are everywhere.

DevOps with Kubernetes: How to Reduce Cluster Toil and Complexity

Has Kubernetes made your DevOps team faster, or just busier? Most teams adopt it for speed and portability, and they get both. What arrives with it is a quieter cost: the operational weight of running the cluster day to day. That weight shows up in the manual work the platform was supposed to eliminate. A resource limit set incorrectly can waste infrastructure for months.

Trace without traces

A customer emailed on a Tuesday: checkout hung for ten seconds. I opened our tracing tool, punched in the time window, and got nothing. The trace was sampled out. We keep 1% of traces, like most shops with real traffic do. The one request that actually mattered was in the 99% we threw away. I spent twenty minutes admiring our observability stack before admitting it couldn’t answer a first-grader’s question: what happened to this person? Here’s what I know now.

Prepare for the EU AI Act with Harness AI Security | Harness Blog

Harness AI Security provides a unified control plane for AI discovery, risk visibility, and runtime protection, helping organizations operationalize key requirements of the EU AI Act. Instead of relying on manual audits or fragmented tooling, teams get continuous insight into how AI systems are built, exposed, and used, along with the evidence needed to demonstrate compliance.

You Can't Detect What You Never Collect: Telemetry Coverage in the Agentic SOC

Every detection rule, every threat hunt, every AI agent you deploy rests on one silent assumption: that the data describing an attack actually reached your tools. When it doesn’t, nothing above it can save you, and no one gets an alert that the data was missing. Security teams invest heavily in the sharp end of the stack: detection content, threat intelligence, response playbooks, and increasingly, AI agents to triage and investigate at machine speed.

Why compliance audits keep slowing your engineering team down

If you've shipped software in fintech, healthcare, or government, you probably know the specific dread of an upcoming compliance audit. Not because the software isn't secure, but because proving it is requires reconstructing a paper trail for decisions that were made in Jira tickets, Slack threads, and pull request comments over the last six months. The software is fine. The documentation of the software is the problem.

Designing the New Workloads Dashboard for Rancher

To meet community demand, we have restored the global workload overview in Rancher Manager. After previously removing the feature due to performance constraints, we prioritized user feedback and rebuilt it from the ground up. Powered by a new, optimized API, the updated UI is both highly scalable and resilient.

How Liftoff cut costs by 87% and latency by 75% with HAProxy

Liftoff, a mobile advertising company, processes 1.5 trillion bid requests every month. Their platform touches 275 million unique devices daily across 150 geographies. At that scale, the proxy layer is a core part of the business. For years, Liftoff relied on a managed enterprise proxy vendor. It worked, until it didn’t.

The golden path: security that works because it's the easy path

A golden path for dependency management isn't a policy document – it's a preconfigured private registry with upstream proxies covering every ecosystem your teams use, set as the default. Developers don't opt into security; they get it automatically by using the standard toolchain. The alternative is teams configuring their own controls, producing inconsistent postures and compounding risk across the org. If the secure path requires extra steps, developers will route around it. Make it the easiest option and the policy enforces itself.

Walkthrough: Puppet System Hardening Assessment

Is your infrastructure as secure as you think it is? In this walkthrough, see how aSystem Hardening Assessmenthelps organizations identify security gaps, uncover configuration risks, and prioritize remediation efforts across critical systems. You'll learn how teams can evaluate their environment against security best practices, gain visibility into potential vulnerabilities, and take actionable steps to strengthen their overall security posture.

ACP vs MCP: What's the difference for agentic coding?

An AI coding agent holds many conversations at once. Not only is the user prompting it, the agent also talks to the IDE, showing diffs and asking before it touches a file. At the same time it talks to tools, pulling a failing build or querying a database. Two open protocols standardize those conversations. This guide compares ACP vs MCP in practical terms: what each protocol does and when each applies. ACP (Agent Client Protocol) connects a code editor to an AI coding agent.

To learn and improve, we cannot be afraid to fail

“Deployment stress doesn’t just come from high-profile public outages. It often starts much earlier, when a fear of failure seeps into team culture.” Rob Richardson, Software Craftsman Rob certainly knows the stress and embarrassment of public deployment failures. "But overall" he reflects, "I’ve had more stress in my career from internal failures.

Cloud repatriation strategies: From public dependency to hybrid flexibility

The phrase "cloud first" dominated IT strategy for the better part of a decade. It was gospel, practically unchallengeable, and for a lot of organizations, it was the right call. But something shifted between 2024 and 2026, and it shifted fast. Bills stopped being defensible. Vendor pricing imploded. Sovereignty stopped being a compliance checkbox and became a procurement requirement.

OpenAI API cost calculator: estimate your GPT spend before it estimates you

This OpenAI API cost calculator (also an AI inference calculator for o3/o4-mini thinking tokens) estimates your monthly OpenAI API pricing bill from three inputs: model, request volume, and average tokens per request. Toggle between standard, batch, and cached pricing and get your number in seconds. It also shows what the same workload costs on Claude and Gemini. For the full per-model rate card, see CloudZero's OpenAI API pricing guide.

Shipped: The Fastly spend that was hiding in plain sight

CDN and edge spend is easy to lose track of. Fastly bills on its own, off to the side of your cloud invoice – real money, often significant, sitting where none of your cost tooling reaches. So it stays its own island: a lump sum with no easy way to tie it back to the teams, products, and customers driving the traffic.

AI Summary Agent in Turbo360

Handed over an Azure integration environment you've never seen before? Turbo360's AI Resource Summary agent gives any support operator or engineer an instant plain-English overview of what a resource is, how it behaves, and what to watch out for - without needing to ask the developers. In this demo: Great for: IT operations teams, MSP NOCs, cloud support engineers, and anyone responsible for running integration workloads they didn't build.

Why Modern Data Centers Require Better Insulation Strategies for Continuous Uptime

From streaming platforms to banking networks, today's digital demands rest on advanced facilities that operate without pause. With companies relying heavily on constant connectivity, stability within these environments matters more than ever before. A short disruption might result not only in monetary setbacks but also slower workflows and weakened confidence among users. Although computing hardware, climate controls, and emergency energy sources typically dominate discussions, protective layering quietly contributes just as much to seamless performance.

How ITOps Can Automate Data Discovery for Rapid Privacy Request Fulfilment

For most organisations, managing data privacy compliance is traditionally viewed as a legal or governance function. However, when a Data Subject Access Request (DSAR) or Freedom of Information (FOI) application is submitted, the actual labour of retrieving that data falls squarely on IT operations. With the passing of the Privacy and Other Legislation Amendment Act 2024, Australian businesses are facing some of the most comprehensive federal privacy reforms in over a decade.

AI Tool Sprawl Is Killing Enterprise ROI | Why Orchestration Matters More Than AI Features

Enterprise AI adoption is accelerating, but are organizations actually solving business problems or just adding more tools? In this episode of Agents of IT, Fran Fernandez (Chief Product Officer at Resolve) and Zach Austin (Director of Product Marketing) explore one of the biggest challenges facing enterprise IT in 2026: AI tool sprawl. They discuss why many organizations struggle to demonstrate ROI from AI investments, how disconnected AI assistants create operational complexity, and why orchestration, automation, and context have become the real differentiators for enterprise AI success.

The most dangerous window is before threat intel knows about it

When a malicious package is first published, threat intelligence sources haven't flagged it yet – and every team pulling from a public registry is exposed during that entire window. The fix isn't faster scanning; it's a policy that holds new packages for a defined cooldown period before they're eligible to pull. By the time the window closes, the threat intelligence has caught up. Teams pulling direct from npm or PyPI have no equivalent enforcement layer – which is exactly how attacks like Shai-Hulud got in.

Harness Agents

Today, we're launching Autonomous Worker Agents, AI agents that run as governed pipeline steps inside Harness. They inherit OPA policies, RBAC, audit trails, and scoped credentials from the first run. And because they live inside your Harness pipelines, they reason using the Harness Knowledge Graph: your services, deployments, incidents, and policies.

AI Agents Write Broken Code 49% of the Time #speedscale #AI #Coding #Tech #DevOps

AI agents write broken code nearly 50% of the time. By adding a traffic-based deterministic evaluation, Speedscale boosted unsupervised bug-fixing quality from 51% to 77% in just 5 minutes. This helped slash token costs and eliminate rework without human intervention. Learn more: speedscale.com.

Fix flaky tests with AI, and track future test work in Jira

In January we launched Tests in Bitbucket Pipelines – a single place to track, organize, and understand your test health over time. In April we added automatic flaky test detection so unreliable tests get flagged before they slow your team down. But spotting a problem is only half the battle. Day to day, your team still needs to act on a test – track it as work, clean it up, or route it to the right person.

How Norsk Tipping uses feature flags to govern their deployments

Norsk Tipping is Norway’s state-owned gaming operator, running 2,500 to 3,000 production releases a year across iOS, Android, web and backend systems. Like every regulated organisation at scale, the platform team has to hold two things in tension: maintain strict deployment controls that stand up to audit, and keep the path to production open so that 100 engineers can ship safely.

Managing DHCP Across Distributed Networks

Managing DHCP across distributed networks gets messy fast. Lease activity changes constantly. Naming conventions drift. Infrastructure changes happen independently across locations. Before long, your team no longer has a complete view of what’s happening across the network. What started as a straightforward service becomes a records problem with real operational consequences.

LogicMonitor and Edwin AI: Autonomous IT for Hybrid IT Environments

Autonomous IT starts now with LogicMonitor and Edwin AI, built to help IT teams monitor complex hybrid IT environments, discover root cause faster, reduce downtime, and prevent incidents before they impact revenue or brand reputation. See how LogicMonitor brings AI-powered IT operations, observability, and incident prevention together for modern infrastructure teams.

Sovereign cloud for financial services: Meeting FCA and PRA requirements with UK infrastructure

Financial services in the UK operates under one of the most demanding regulatory frameworks in the world. The FCA and PRA between them set expectations for operational resilience, outsourcing, data governance, and concentration risk that shape every infrastructure decision a regulated firm makes. Cloud adoption in the sector has happened, but it's happened under regulatory scrutiny that's grown steadily more pointed over the last several years.

Monitor DigitalOcean in Grafana with MetricFire

Monitoring your DigitalOcean infrastructure just got easier. MetricFire now integrates natively with DigitalOcean, so you can connect your account and start streaming metrics from Droplets, Load Balancers, Managed Databases, and more directly into Grafana. No agents. No setup overhead. No dashboard stitching. Get full visibility into your DigitalOcean infrastructure from one dashboard, live in minutes.

More Resilience, Less Overhead: How to Modernize Disaster Recovery Testing

• Disaster recovery planning is essential for ensuring digital services remain online in the face of catastrophic failures or outages. When a major digital infrastructure outage occurs, systems need to be set up to automatically respond and restore functionality as quickly as possible. But no matter how in-depth your disaster recovery plan is, it’s still only theoretical until it’s thoroughly tested under realistic failure conditions, which is why testing is often mandated by leadership and regulators.

Rundeck/RBA 6.0: Modernizing the foundation your automation runs on

This blog post is part of PagerDuty’s ongoing series on how we’re helping customers navigate their journey towards autonomous operations. Read on to learn about how PagerDuty’s Rundeck/RBA 6.0 recently announced in GA builds towards this vision.

Automating SonicWall Certificate Deployment with the SonicOS API

How do we keep our Sonicwall certificates up to date as certificate lifetimes get shorter? We’re already at 200 day certs with 100 then 47 day certificates coming soon. A certificate you used to touch once every year now needs replacing up to twelve times a year. Doing this by hand is out of the question, no one has the time. Even if they did, the frequent updates is just asking for mistakes. Luckily, this can be automated using the SonicOS API.

Don't 'control' your AI spend. Understand it and be intentional.

There’s a good interview making the rounds. BizTech sat down with IBM’s James Stevenson to talk about how financial institutions can get a handle on cloud and AI costs. The advice is solid: get visibility, kill idle resources, tighten governance, tag everything. And pull finance and engineering into the same room. I don’t disagree with it. But I read the whole piece and noticed where the gravity pulls: control costs, reduce waste, bring down spend. The headline says it (‘Q&A.

Shipped: Turn your Bifrost gateway into an AI spend meter

If you route model traffic through Bifrost, you already have the hard part: one place every AI call passes through, where the model, the tokens, and the cost are visible on the way past. It’s the cheapest spot in your stack to measure AI spend. What’s missing is everything downstream – today that usage only becomes “spend” weeks later, when the provider invoice lands as a lump sum you can’t break apart.