Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

A/B Testing Tools: The CTO's Guide to Safe and Measurable Change | Harness Blog

Picture this: It's 2 a.m. Your phone is buzzing. A new feature just went out to your entire user base, and conversion rates are tanking. Your on-call engineer is digging through logs, your Slack channels are on fire, and you’re left wondering, Why didn't we just test this first? Every CTO has a version of this story. And most of them have quietly vowed never to repeat it.

How Engineers Get Leadership Buy-In for Technical Initiatives

Getting leadership to greenlight your technical work isn't about having the right answer, it's about speaking the right language. CircleCI CTO Rob Zuber shares the frameworks he's developed over 12 years for translating engineering priorities into business impact, navigating organizational dynamics, and building the relationships that make buy-in happen before you ever enter the room.

Preparing Web and Mobile Cloud Infrastructure for Massive Advertising Traffic Spikes

When a digital marketing team launches an aggressive display network campaign, they measure success in clicks, impressions, and conversions. However, for IT operations and DevOps teams, that same success manifests as a massive, often unpredictable surge in server requests. A sudden influx of users can be a triumph for brand visibility, but it quickly becomes a nightmare if the underlying web and mobile cloud infrastructure is not equipped to handle the heavy load. Bridging the gap between marketing ambition and technical reality requires robust planning, dynamic resource provisioning, and intelligent system monitoring. Without these elements, a successful ad campaign can accidentally execute a self-inflicted denial of service attack on a company's own platforms. Modern businesses cannot afford the disconnect that often exists between the departments generating traffic and the teams responsible for keeping the lights on. Aligning these two functions ensures that the digital infrastructure is primed and ready long before the first advertisement goes live.

Cloud Cost Visibility at Scale: Why It Fails & How to Fix It | Harness Blog

Why does your cloud cost visibility break down the moment someone spins up a Kubernetes cluster in a new region without telling anyone? You get the alert three weeks later when the bill arrives — and by then, nobody remembers which experiment justified the spend, or which team should own it. This scenario repeats constantly across platform teams managing multi-cloud environments at scale. Cloud cost visibility works fine when you have five services and one AWS account.

Smarter Alert Management: Test on Historical Data, Review Transitions, and Preview Silencing Schedules

Alert fatigue usually isn’t caused by one thing. It’s the accumulation of thresholds that are slightly too sensitive, alerts that fire during known maintenance windows, and historical patterns that nobody has the tools to review easily. Fixing it requires better visibility into how alerts actually behave over time, and a way to test changes before they hit production. We’ve shipped three improvements to alerting in Netdata that address different parts of this problem.

Why post-mortem action items die

You can run the best debrief of your life. Honest timeline, blameless tone, real insights. People leave the room nodding. And then nothing happens. This is the last mile problem of post-mortems - and it's an easy trap to fall into. When you've just been through a stressful incident, getting it back up is the priority. Once it's over, the post-mortem itself can feel like the finish line. You've documented what happened, been honest about it, identified what went wrong. It feels like the work is done.

An introduction to Konstruct: Production-ready IDP in minutes

What if you could own your platform and deploy it anywhere, without months of GitOps setup or vendor lock-in? Konstruct is an Internal Developer Platform that gives you a production-grade platform-as-a-service, deployed in minutes. It delivers a GitOps-powered experience that is fully owned and operated by you, distributing consistent, self-service control planes to development teams so they can ship without friction.