Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

AWS Reserved Instances 101: The Complete Guide

With 240 distinct services, ranging from compute to storage to networking and content delivery — each offered at different price points — choosing the right AWS service requires meticulous consideration.. By default, AWS services are available on-demand and you pay a monthly bill for services used. However, the on-demand pricing model can get expensive if you use a lot of services and deploy a fleet of instances.

Incident Response for DevOps, SREs, and IT Teams

That 3 AM alert is never fun. Your heart races as you try to figure out what broke this time, and how fast you can fix it. But with an incident response in place, that panic turns into a calm, step-by-step fix. It helps you handle everything, from a server crash to a security breach, in an organized way. In this guide, I’ll walk you through what exactly an incident response is, why you need it, its key components, and how to build one.

What is Database Change Management (DCM)?

Database change management is the foundation for building a stable, secure, and high-performing application. In today’s fast-paced technological landscape, where agile and DevOps are the go-to for developing database application, rapid releases and continuous iteration are the norms. But with frequent deployments comes the risk of untracked database changes.

The Complete SaaS Unit Economics Guide (2025 Edition)

Measuring and monitoring unit economics can help your SaaS brand make informed business and engineering decisions. But how do you get that data, and what exactly are SaaS unit economics? We’ll cover exactly what SaaS unit economics are, metrics you should monitor, how to calculate your unit economics, and the tools you can use to be successful.

Self-Service Query UI for Logs in Azure Data Explorer (ADX)

This video focuses on how to create a self-service user interface (UI) for querying logs using Azure Data Explorer (ADX) and the Business Activity Monitoring (BAM) module. Perfect for developers and business users aiming to gain actionable operational insights from log data with simple visualizations and monitoring.

IT Alerting: Everything You Need to Know

Behind every reliable service is a team of people watching for problems. But they don’t stare at screens all day. They rely on IT alerting systems. An IT alerting system tells you when something is wrong. It finds problems fast, so your team can fix them before your business or customers are affected. This article will explain everything you need to know about IT alerting. You’ll learn what it is, why you need it, how to set it up, and which tools work best. Table of Contents.

A complete security view for every Ubuntu LTS VM on Azure

Azure’s Update Manager now shows missing Ubuntu Pro updates for all Ubuntu Long-Term Support (LTS) releases: 18.04, 20.04, 22.04 and 24.04. The feature was first introduced for only 18.04 during its move to Expanded Security Maintenance. With this addition, Azure highlights where Ubuntu LTS instances would benefit from Expanded Security Maintenance updates if the administrator attaches an Ubuntu Pro license, even for instances running more recent Ubuntu releases.

Top AI Prompts for Engineering Leaders using the Cortex MCP

AI assistants have transformed how developers work. And now coupled with the Cortex MCP that connects AI assistants directly to live service data, ownership records, and organizational standards, developers can get accurate, context-rich answers about their services and standards right in their IDE. → Tips and prompts for developers using the Cortex MCP But what about engineering leaders?! Your opportunities with AI assistants extend far beyond code generation.

Fix issues faster with Recommended Remediations

You’ve successfully run a Fault Injection test and uncovered a new failure mode before it impacted customers. And the failure could have taken down your whole system if it had happened in production. Now what? Since this is a potential P1 outage, you absolutely need to address the issue, but that’s going to take some time as you dig through the service to track down the problem. Unfortunately, this is a common conflict.