Operations | Monitoring | ITSM | DevOps | Cloud

Latest posts

Spend less time on repetitive tasks with the new automation feature in Grafana Assistant

The ability to schedule regular tasks, such as cron jobs, has been around for decades. So why are we still running the same AI prompts by hand every day? As you use Grafana Assistant, our AI-powered observability agent, to stay on top of the state of your system, you likely find yourself asking the same questions. Maybe you want to know what changed overnight, or whether yesterday's deployment hurt latency, or which dashboards or skills are drifting out of date.

The inside scoop on alerting changes in Kubernetes Monitoring

Kubernetes Monitoring in Grafana Cloud comes out of the box with preconfigured alert rules that notify you about issues like CPU throttling, crash-looping pods, and nodes going offline. These rules are installed automatically when you set up the app, and they start evaluating immediately. But if you've recently reinstalled the Kubernetes Monitoring app and your alert notifications stopped arriving, or started looking different, you're not alone.

Data sovereignty is an opportunity for regional growth

Data sovereignty wasn’t a major topic just a few years ago and now it’s becoming a major economic opportunity for regions across the UK. In this clip from Perspectives from the Edge, Katie Gallagher OBE from Manchester Digital discusses why the conversation around data sovereignty has shifted, and how the rise of AI is accelerating demand for trusted regional digital infrastructure. As organisations rethink where data is stored, processed and governed, regions like Manchester are increasingly well placed to benefit through investment, innovation and digital skills growth.

SIGNL4 Update: Centralize alerts. Automate response. Easier than ever.

Get ready for the new SIGNL4 update. The completely redesigned API makes it easier than ever to connect your systems and tools and consolidate alerts from every source – so nothing gets missed. With the new Automation menu, you can now manage automated alert routing and filtering from one central place, ensuring the right alerts reach the right person at the right time.

How BigPanda and ServiceNow are redefining agentic IT operations for enterprise IT

Enterprise ITOps leaders are realizing that legacy incident management processes are collapsing under the weight of today’s sprawling, hybrid-cloud enterprise environments. Monitoring and observability tools generate a relentless flood of alerts across cloud platforms, infrastructure, applications, and services. The signals are there, the volume of noise makes it harder than ever to identify what’s urgent.

Security and reliability review: 7 delivery model weak points to check first

Security audits that focus only on application code often miss the delivery layer entirely. That is where the most common and most avoidable failures live. Most teams treat security as a layer added on top of a working system. The problem is that the delivery model itself introduces risk before a single line of application code runs. When deployments are manual, environments are inconsistent, or configuration drifts across stages, the system behaves unpredictably.

Real-World Service Desk Automation: Use Cases That Prove a Platform is Enterprise-Ready

Most conversations about service desk automation stay at the strategy level for too long. Capability checklists and evaluation frameworks matter, but they won’t show you what the platform does when something breaks at 2 AM, or what happens when a single incident crosses four team boundaries before it can close. These scenarios show where simpler platforms start to give way. Teams usually automate the clean, single-system work first.

IPL: How to use the ipl-web TermInput

Most form fields ask users for a single value like a name, an email, or a date. But some need a list of values. A plain text input with comma-separated values can technically do the job, but it gives no feedback while typing, no suggestions, and one invalid entry rejects the whole field. The ipl-web TermInput solves this problem. Each value becomes a separate term with its own validation; terms can be enriched, and the input even supports suggestions.

AI SRE Agent: How Autonomous Incident Investigation Is Eliminating Manual Root Cause Analysis

A critical production alert wakes you up: p99 latency just hit 4 seconds. You drag yourself to a terminal, open five dashboards, start correlating log timestamps with trace IDs, dig through 47,000 log lines across eight services, and 90 minutes later, you finally find the culprit: an N+1 database query introduced in a deployment that shipped four minutes before the spike started. An Atatus AI SRE Agent would have identified that root cause and drafted a remediation plan in 28 seconds. Not approximation.

Episode 31: Who really governs artificial intelligence? ft. Luqman Kondeth

In Episode 31 of Server Room, we sit down with Luqman Kondeth, AI Governance & Cybersecurity Strategist and Director at NYU, for a conversation that goes far beyond technology. From personal growth and global experiences to AI governance, cybersecurity, and leadership, this episode explores how mindset shapes the way we build careers, communities, and the future of technology itself. In this episode, we discuss.