Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Why am I getting R14/R15 errors in NodeJS? | MericFire

How to Detect, Alert, and Resolve Memory Issues Before They Cause Downtime When applications scale on Heroku, memory-related issues are among the most common (and most frustrating... -_- ) sources of instability. Two of the most notorious culprits are the R14 (Memory Quota Exceeded) and R15 (Memory Quota Hard Limit) errors.

Winning Variations Explained: How to Identify True A/B Test Success With Statistical Confidence

A winning variation isn’t just the version that “looks better”, it’s the version that truly and measurably outperforms the control. In this video, we break down what a winning variation is, how to determine it, and why statistical significance is essential for making confident, data-driven product decisions.

How Roblox uses HAProxy Enterprise to power gaming for 100 million daily users

One of the most anticipated presentations at HAProxyConf 2025 came from gaming and user-generated content (UGC) innovators Roblox. Software Engineer Chris Jones and Senior Site Reliability Engineer Ben Meidel gave an enthusiastic and enjoyable presentation, detailing their journey from legacy hardware to a sophisticated, automated, and secure application delivery platform, with seamless, API-powered dynamic configuration and upgrades, supported by the HAProxy Enterprise Dynamic Update Module.

New Feature Friday: Cortex & AWS

Most teams treat AWS like a black box. Cortex turns the lights on. We now automatically ingest all your AWS resources—from Lambda to RDS—and map them to the services and teams that actually own them. Daily. Automatically. No spreadsheets. No guesswork. Scorecards help you enforce real standards (think: runtime upgrades, tagging hygiene, EOL migrations). Workflows help your engineers self-serve AWS resources without needing to be AWS experts.

Welcome to the Next Frontier: AI on Kubernetes

Last week’s KubeCon Atlanta made one thing abundantly clear, Kubernetes is quickly becoming the de facto platform for AI workloads – with the event lineup chock full of talks, workshops, and even co-located events dedicated to AI, machine learning and running data on Kubernetes natively – with approximately 50 (!) sessions in total focused on AI, ML, LLM, and GenAI topics.. What was until now mostly PoCs and aspirational is now truly delivering in production.

Resolve's Zero Ticket Minute - Ep. 2 #itautomation #aiautomation #servicemanagement

Last month, Azure + AWS outages spiked global incidents by 250%. Help desks lit up fast. Zero Ticket IT keeps teams steady with proactive updates and instant deflection of those “is it down?” floods.# Don’t miss your 60-second IT news hit.

VirtualMetric DataStream + Elasticsearch: A Smarter Way to Send Logs to Elastic

Elasticsearch has long been the backbone of security analytics for organizations that need fast search, flexible dashboards, and scalable visibility across massive datasets. It powers everything from threat hunting to compliance reporting and real-time investigation. But anyone who has operated Elasticsearch at scale also knows a quiet truth: Elasticsearch is only as strong as the data you feed it. And getting clean, consistent, usable telemetry into Elastic is often the hardest part.

Incident Postmortem: How to Learn From Failures and Build Reliable Systems

When the issue settles, and systems are back, one question always remains: What actually happened, and how do we stop it from happening again? That’s where incident postmortems come in. Not just as documentation, but as a structured way to learn, improve reliability, and replace guessing with clarity. A good postmortem isn’t about blame, heroics, or perfect narratives. It’s about truth, learning, and building systems that get stronger with every failure.