Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Escalation policies for critical incidents

When a critical incident triggers, there’s no time to figure out who to call. That decision needs to be made well before the incident arrives. A dedicated escalation policy for critical incidents gives your team a clear path to follow the moment things go wrong, rather than leaving it to whoever happens to be around. This guide covers the key decisions involved in building that policy.

Smarter Custom Metrics for Redgate Monitor: Additional Alert Text Query

This is a guest post from Nick Coombe. Redgate Monitor's built-in metrics cover the most common database pressure points out of the box. However, every estate has a few KPIs and metrics that are specific to the business, and users can create custom metrics to track those signals and receive an alert when they cross a threshold.

Cisco and Megaport: Redefining the Edge of Modern Networking

Explore four Cisco and Megaport Virtual Edge solutions that bring secure, high-performance networking closer to users and clouds. Enterprises have quickly outgrown the constraints of legacy WAN architectures. As cloud and SaaS dominate and users grow more distributed, rigid infrastructure models—complete with backhaul-heavy designs and unpredictable internet performance—no longer work.

Escalation policies for low-priority incidents

Teams put a lot of thought into how critical incidents are handled. Low-priority incidents usually don’t get the same attention. And without a proper escalation policy, they just land in a shared channel, waiting for someone to acknowledge. Setting up a clear policy for them is worth doing. Not because they need the same urgency as a critical incident, but because having a defined path for every incident makes the whole system more reliable.

Managing AI Models and Datasets with Harness Artifact Registry | AI/ML Artifact Management

Building AI applications often means juggling multiple models, scattered datasets, and version chaos across local systems. But what if you could bring it all together — securely and efficiently — in one place? In this walkthrough, Shibam Dhar, DevRel Engineer at Harness, demonstrates how Harness Artifact Registry makes it easy to manage and govern your AI/ML assets — from models and datasets to prompts and agents — with built-in support like Hugging Face and generic registry types.

Unmasking the Resolute Raccoon

You’ve almost certainly seen them… In the forest, rummaging through a dumpster, in poorly aging millennial memes. Raccoons are ubiquitous and endlessly entertaining creatures. YouTube and TikTok are full of videos documenting their clever antics and escapades. One such intrepid raccoon gained fame for making their way to the most unlikely places, from liquor stores to karate studios.

Inside the architecture: How Upsun delivers 99.99% uptime for AI

For a CTO, "four nines" represents a commitment to keeping production revenue live with less than 0.01% of total downtime per year. As AI workloads move from pilot projects into core production services, the reliability requirements for infrastructure have shifted. AI agents, RAG pipelines, and automated LLM workflows depend on a consistent platform state.