Operations | Monitoring | ITSM | DevOps | Cloud

AI SRE in Practice: Resolving Node Termination Events at Scale

When a node terminates unexpectedly in a Kubernetes cluster, the immediate symptoms are obvious. Workloads restart elsewhere, services experience partial outages, and alerts fire across multiple systems. The harder question is why it happened and how to prevent it from recurring. This scenario walks through a node termination event where the entire node pool was affected, requiring investigation across infrastructure layers to identify root cause and implement lasting remediation.

API Observability: Why Outside-In Signals Are Still Essential

API observability has become a go-to goal for modern engineering teams. As architectures shift to microservices and APIs become the backbone of products, teams need a reliable way to understand what’s happening across services, before issues turn into incidents. That’s where observability comes in: collect the right signals, connect the dots, and debug faster.

SQL performance improvements: automatic detection & regression testing (part 3)

This is the final part of our 3-part series on SQL performance improvements. In part 1, we covered how to identify slow queries. In part 2, we explored how to fix them with indexes. In this post, we'll share how we prevent those performance issues from ever reaching production again. A few weeks ago, we massively improved the performance of the dashboard & website by optimizing our SQL queries.

The Rise of 24/7 Digital Front Desks: Why Law Firms Can't Rely on Voicemail Anymore

Why can't law firms rely on voicemail anymore in a 24/7 digital world? Because modern legal clients expect immediate answers, continuous availability, and clear next steps, and voicemail systems fail to deliver speed, trust, and engagement at the moment it matters most.

Monitor groups are now supported in the API

We recently launched monitor groups, making it easier to organize monitors on your boards and status pages. Now that same functionality is available in the StatusGator API, so you can manage monitor groups programmatically. The API now supports listing, creating, updating, and deleting monitor groups on a board. You can also assign or remove monitors from groups when creating or updating a monitor.

Best DNS Monitoring Tools in 2026

DNS monitoring is the practice of continuously checking that your domain names resolve correctly (right records, right answers) and that DNS lookups are fast and reliable from multiple locations. Depending on the tool, it can also watch for unexpected DNS record changes (A/AAAA/CNAME/MX/NS/TXT, etc.), validate DNSSEC, and pinpoint where resolution breaks in the chain.

AI Hosting: The Colocation vs. Cloud Dilemma for Your Next Project

Organisations running AI workloads, like banks training fraud detection models, hospitals testing diagnostic tools, or manufacturers using predictive analytics, all face the same problem: hosting them is costly and resource-intensive. They require dedicated GPUs running non-stop, vast amounts of data moving in and out, and far more power and cooling than a typical IT system.

What is IT Alerting?

IT alerting means that responsible and on-call employees receive IT alerts about disruptions and anomalies in IT systems and infrastructure. These notifications can come directly from the systems themselves or from monitoring tools. The goal is to reduce downtime, service limitations, security breaches, and data loss by responding quickly. In many cases, the stakes are high: data loss, reputational damage with customers, or even disruption of critical business processes.

Agentic AI Essentials: Adoption Pitfalls and How to Avoid Them

In the last article in this series, we explored how IT professionals and leaders can cut through the hype surrounding agentic AI and gain a deeper understanding of what the technology actually offers. Now, we turn to the practical side: how to integrate it effectively. Let’s explore the challenges and outline strategies that organizations of all sizes can use to adopt agentic AI with confidence.