Monthly Archive

Sponsored Post

Data-Led Growth: How FinTechs Win with App Event Analytics

May 31, 2026 By David Bunting In ChaosSearch

In the rapidly shifting world of financial technology (FinTech), acquiring and retaining new customers to achieve long-term business growth requires a proactive approach to user experience and application performance optimization. As FinTech companies compete against rivals to grow a user base and revolutionize how consumers manage their finances, they increasingly depend on data-driven insights to optimize their mobile applications and deliver exceptional user experiences. This is where application event analytics comes into play.

Read Post

ChaosSearch

Read more about Data-Led Growth: How FinTechs Win with App Event Analytics

Why Modern Enterprises Still Get Blindsided And How Business Process Observability Changes That

May 31, 2026 By Ammar Ravat In Digitate

Traditional observability misses business failures Modern monitoring tools can show that systems are technically healthy while critical business outcomes are quietly failing. Business Process Observability (BPO) closes this gap by tracking entire business transactions, like orders, payments, and shipments, instead of just infrastructure and application metrics.

Read Post

Digitate

Read more about Why Modern Enterprises Still Get Blindsided And How Business Process Observability Changes That

There Is No Good Spring Boot Alternative (Unless You're Doing One of These Three Things)

May 31, 2026 By Rollbar In Rollbar

Every few months a new "we migrated off Spring Boot" post washes across r/java or DZone. The numbers are always impressive. 60% memory reduction, 85% faster startup, cloud bill cut in half. The comments are always full of developers convinced they should be doing the same thing. Is Spring Boot really that bad now? I decided to do my own research. I read every credible public migration case study I could find. I ran benchmarks. I built the internal business case for switching two of our services.

Read Post

Rollbar

Read more about There Is No Good Spring Boot Alternative (Unless You're Doing One of These Three Things)

High-cardinality metrics at scale: why the standard playbook is wrong

May 30, 2026 By Shyam Sreevalsan In netdata

The “high cardinality is expensive” sentence has become observability’s version of “in this economy” — said so often that nobody questions whether it’s true. Every vendor pricing page invokes it. Every glossary article repeats it. Every architecture diagram shows aggregation buffers placed before the storage layer.

Read Post

netdata

Read more about High-cardinality metrics at scale: why the standard playbook is wrong

Your telemetry, your apps. Inside apps on the Cribl Platform

May 30, 2026 By Cribl In Cribl

You already use Cribl to tame your telemetry data. Now you can turn that data into apps your teams actually want to use. In this video, we walk through how to create apps in the Cribl Platform and show how real apps solve real problems: guided troubleshooting for noisy incidents, opinionated security views, and exec-friendly ROI dashboards. You’ll see how apps sit on top of Cribl Stream, Edge, Search, and Lake, so you reuse the data and logic you already have instead of building custom tools from scratch.

View Video

Cribl

Read more about Your telemetry, your apps. Inside apps on the Cribl Platform

498 Fake FIFA World Cup Domains and How Phishing Sentinel Catches Them

May 30, 2026 By DNS Spy In DNS Spy

The FBI published a warning last week. Threat actors have registered more than 498 fake domains tied to the 2026 FIFA World Cup. Fake ticket sites. Fake job listings. Fake merchandise stores. All live in DNS right now. Every one of those domains is catchable. Not after victims report fraud. Before anyone gets hurt. That is what DNS Spy’s Phishing Sentinel is built to do.

Read Post

DNS Spy

Read more about 498 Fake FIFA World Cup Domains and How Phishing Sentinel Catches Them

StatusGator now integrates with Datadog

May 29, 2026 By Valeria Kurolapova In StatusGator

Your Datadog dashboards tell you what’s happening inside your infrastructure. StatusGator helps you understand what’s happening with the vendors and services your business depends on. Now you can bring both together.

Read Post

StatusGator

Read more about StatusGator now integrates with Datadog

Phone numbers now supported in status page contact field

May 29, 2026 By Valeria Kurolapova In StatusGator

We’ve rolled out a small but useful improvement to the Status Page → General settings. Previously, the Support contact field in the footer only accepted: Based on feedback from our users, the field now also supports phone numbers. Status pages in StatusGator already offer a variety of customization options – including custom branding, layouts, monitor visibility, subscriber settings, and privacy controls.

Read Post

StatusGator

Read more about Phone numbers now supported in status page contact field

Why Shared Context Matters in Hybrid Cloud Operations

May 29, 2026 By Dallon Robinette In Selector

The first post in this series explored why traditional observability breaks down in hybrid cloud environments. As infrastructure, applications, and dependencies stretch across on-premises networks and cloud services, isolated monitoring views leave teams with an incomplete understanding of what is happening and why. That challenge raises the next question: what kind of operational model actually works in a hybrid environment?

Read Post

Selector

Read more about Why Shared Context Matters in Hybrid Cloud Operations

k8s-monitoring-helm Chart Office Hours (May 2026)

May 29, 2026 By Grafana In Grafana

In the May edition of the Kubernetes Monitoring Helm chart office hours, we discuss the version 4.1 release, the upcoming 4.2 feature release, and we discuss the upcoming deprecation of the 1.x and 2.0 versions.

View Video

Grafana

Read more about k8s-monitoring-helm Chart Office Hours (May 2026)

Monitor LLM routing with the Kubernetes Inference Extension

May 29, 2026 By David Lentz In Datadog

If you serve LLMs on Kubernetes without inference-aware routing, your load balancer is likely wasting inference capacity. Generic HTTP traffic management blindly routes requests, assuming the backends in your cluster are interchangeable. But your model-serving backends are stateful and unevenly prepared to handle any given request. As a result, requests are often routed to the backend that’s not the one best suited to respond.

Read Post

Datadog

Read more about Monitor LLM routing with the Kubernetes Inference Extension

How a unified data model improves feature flag rollout decisions

May 29, 2026 By Bridgitte Kwong In Datadog

Consolidation is reshaping the experimentation and feature management landscape. Tools are merging, and partnerships are being repackaged as platforms. But marketing a unified experience is not the same as building one. Right now, engineering leaders and product managers are reassessing whether the tools they depend on are built for the long term. It’s irrelevant which vendor has the most products.

Read Post

Datadog

Read more about How a unified data model improves feature flag rollout decisions

AI in SRE: Where and how Google is deploying agentic AI to improve operations

May 29, 2026 By Stevan Malesevic In Google Operations

With SRE AI, Google plans to fully adopt AI and agentic technologies, leveraging AI as a force multiplier while also maintaining control.

Read Post

Google Operations

Read more about AI in SRE: Where and how Google is deploying agentic AI to improve operations

7 Observability Platforms With Built-In SIEM (2026 Comparison)

May 29, 2026 By Aiswarya S In Atatus

Your SIEM flags a threat. Then someone loses ten minutes pivoting to a second tool just to find the trace, host, or deployment behind it. That gap where security and observability living in separate products is exactly what the 7 platforms below are built to close. This list is scoped deliberately to platforms that run real SIEM detection on the same data plane as your APM, logs, and infrastructure telemetry, not standalone security-only tools like QRadar or Wazuh.

Read Post

Atatus

Read more about 7 Observability Platforms With Built-In SIEM (2026 Comparison)

Patch Management vs Vulnerability Management: What are Key Differences?

May 29, 2026 By Written by In Motadata

What keeps systems secure in real IT environments, applying fixes quickly or knowing what needs attention first? Most IT teams do not struggle because they lack tools or processes. They struggle because two critical functions are often mixed together. Patch management and vulnerability management. This creates a gap between what is being fixed and what actually needs to be fixed. The challenge is that teams deal with constant alerts, regular updates, and growing security risks.

Read Post

Motadata

Read more about Patch Management vs Vulnerability Management: What are Key Differences?

Biggest MCP update yet? Breaking changes - explained in 60 seconds

May 29, 2026 By Coralogix In Coralogix

The new MCP (Model Context Protocol) release makes it stateless, here's what you need to know if you are building AI agents and remote MCP servers.

View Video

Coralogix

Read more about Biggest MCP update yet? Breaking changes - explained in 60 seconds

Overview of TCP Port and UDP Checks

May 29, 2026 By Uptime Website Monitoring In uptime

Welcome to Uptime.com! In this video, we'll guide you through setting up TCP Port and UDP checks. Learn how to monitor server responsiveness using TCP Port checks and how to configure UDP checks for applications requiring less packet accuracy. We'll cover the necessary steps, required information, and test your configuration to ensure it's correct.

View Video

uptime

Read more about Overview of TCP Port and UDP Checks

Sensor Monitoring Tools for Modern Facilities

May 29, 2026 By OpsMatters In OpsMatters

Modern facilities depend on real-time visibility. Buildings now need to monitor air quality, occupancy, water leaks, energy use, equipment vibration, access activity, temperature, humidity, and safety risks across multiple zones. Sensor monitoring tools help facility managers detect problems earlier, reduce downtime, protect assets, and improve occupant comfort. They also support better maintenance planning because teams can respond to data instead of waiting for complaints or failures.

Read Post

OpsMatters

Read more about Sensor Monitoring Tools for Modern Facilities

Top Tips: How to stop doing everything yourself and delegate to AI before you burn out

May 28, 2026 By Alsherin In ManageEngine

Top Tips is a weekly column where we highlight what’s trending in the tech world today and list ways to explore these trends. This week, we’re looking at which tasks you can delegate to AI. We've all struggled to delegate tasks. Whether you're a junior struggling to prioritize your tasks on a daily basis or a manager unsure of assigning responsibilities, you know how messy task delegation can get. Some people just improvise while others have a method to this madness.

Read Post

ManageEngine

Read more about Top Tips: How to stop doing everything yourself and delegate to AI before you burn out

Sponsored Post

Clouds Without the Fog: Unified Control for Hybrid SAP

May 28, 2026 By Brenton O'Callaghan In Avantra

SAP customers with complex SAP know the challenges of managing multiple landscapes well. While classic tools like SAP Landscape Management (LaMa) and Focused Run served us well for years, they were built for a static, on-premises world. Now, with the 2027 end-of-support deadline for legacy solutions looming, the "fog" of hybrid management is getting thicker.

Read Post

Avantra

Read more about Clouds Without the Fog: Unified Control for Hybrid SAP

8,000+ Services and counting: One place to monitor what matters

May 28, 2026 By Colin Bartlett In StatusGator

StatusGator now monitors more than 8,000 services now! From cloud platforms and AI tools to communication apps, payment providers, developer infrastructure, and business software, we continue expanding our monitoring coverage every day so teams can track everything that matters in one place.

Read Post

StatusGator

Read more about 8,000+ Services and counting: One place to monitor what matters

Our First Take on Citrix Platform Flex

May 28, 2026 By GripMatix In GripMatix

On May 12, 2026, Citrix officially announced Platform Flex, a new consumption-based model for the Citrix Platform. At first glance, it looks like yet another licensing change, something Citrix customers have become fairly used to over the years. But after digging through the documentation, Platform Flex appears to be more than just another packaging exercise. The real change is that Citrix is moving toward a credit-based desktop consumption model.

Read Post

GripMatix

Read more about Our First Take on Citrix Platform Flex

Unified observability for Alibaba Cloud with Datadog

May 28, 2026 By Ellie Cohen In Datadog

Alibaba Cloud is a major cloud provider in APAC, offering industry-leading foundational AI models in addition to compute, managed databases, object storage, and Kubernetes through its Container Service for Kubernetes (ACK). Teams choose Alibaba Cloud for its infrastructure availability across Asia Pacific and its managed services. For SREs and platform engineers, that often means running Alibaba Cloud alongside AWS, Google Cloud, or Microsoft Azure.

Read Post

Datadog

Read more about Unified observability for Alibaba Cloud with Datadog

How to Install and Configure an OpenTelemetry Collector

May 28, 2026 By Deepa Ramachandra In ObservIQ

Originally published June 2024. Updated May 2026. A lot has changed since the first version of this guide. In May 2026, OpenTelemetry officially graduated within the CNCF, the highest maturity level a project can achieve. All three core signals (metrics, logs, and traces) are now stable across every major language SDK. Collector adoption has never been higher, and the ecosystem around it, particularly OpAMP for remote management, has matured significantly. This update walks through three things.

Read Post

ObservIQ

Read more about How to Install and Configure an OpenTelemetry Collector

Best Cloud Monitoring Tools in 2026 [20+ Analyzed, Top 6 Picks]

May 28, 2026 By Leo Baecker In Hyperping

The best cloud monitoring tools are Hyperping (uptime, server monitoring, status pages, and on-call at a flat rate), Datadog (full-stack observability with the broadest integration catalog), New Relic (usage-based observability with the most generous free tier), Dynatrace (AI-driven automatic root-cause analysis for large enterprises), Better Stack (monitoring paired with logs and incident response), and Prometheus + Grafana (the open-source standard for cloud-native metrics).

Read Post

Hyperping

Read more about Best Cloud Monitoring Tools in 2026 [20+ Analyzed, Top 6 Picks]

Cybersecurity Tips for Small Businesses

May 28, 2026 By Frank Cotto In WhatsUp Gold

Small businesses are now among the most frequently targeted organizations in the world. Attackers focus on them not because they have the most to steal, but because they tend to have fewer defenses, smaller teams, and less time to spend on security. The good news is that the majority of attacks rely on a small set of well-understood techniques, and most of them can be prevented or contained with practical, affordable controls.

Read Post

WhatsUp Gold

Read more about Cybersecurity Tips for Small Businesses

Deploy Datadog Kubernetes Autoscaling at scale

May 28, 2026 By Danny Driscoll In Datadog

Every Kubernetes environment accumulates waste over time. Teams overprovision CPU and memory requests to avoid performance risk, run idle replicas to preserve headroom, and leave Horizontal Pod Autoscalers (HPAs) untouched long after workload behavior has changed. Some of this waste can be addressed at the node level, where Datadog Cluster Autoscaling helps teams rightsize capacity.

Read Post

Datadog

Read more about Deploy Datadog Kubernetes Autoscaling at scale

Monitor Azure Managed Redis with Datadog

May 28, 2026 By Michael Cronk In Datadog

Azure Managed Redis is Microsoft’s fully managed, enterprise-tier in-memory data store. It is designed for the low-latency caching, session storage, and real-time data needs of modern applications, including AI workloads that depend on fast vector and embedding lookups. Because user-facing applications often query Redis directly, even small regressions in latency, hit rate, or memory pressure can degrade the user experience.

Read Post

Datadog

Read more about Monitor Azure Managed Redis with Datadog

Monitor JavaScript framework routing with Datadog RUM

May 28, 2026 By Datadog In Datadog

Modern web applications rely on frameworks like Next.js, Vue, and Angular to handle routing and rendering. In these architectures, navigation happens within the application rather than through full page loads, which makes it difficult for traditional browser instrumentation to capture what users actually experience. As a result, teams often see misleading view names, missing navigations, and errors that are either misattributed or not captured at all, especially during hydration or lazy loading.

Read Post

Datadog

Read more about Monitor JavaScript framework routing with Datadog RUM

Top 9 Network Performance Metrics You Should Measure in 2026

May 28, 2026 By Arpit Sharma In Motadata

How do you know if your network is actually healthy right now? For most IT teams, answering that question means jumping between multiple tools, dashboards, and alerts, only to end up with more uncertainty than clarity. The problem is not missing data. It is knowing which signals matter, what normal really looks like, and when performance issues start affecting users and business operations. Modern networks generate thousands of metrics every minute, but not every spike or alert deserves attention.

Read Post

Motadata

Read more about Top 9 Network Performance Metrics You Should Measure in 2026

Federated Search | From Silos to Insight | Splunk Cloud with Apache Iceberg REST and AWS S3

May 28, 2026 By Splunk In Splunk

This walk-through shows how Splunk Cloud can search AWS S3 data through an Apache Iceberg REST catalog backed by Nessie. Learn how Iceberg table metadata, S3 storage, and Splunk Federated Search work together so analysts can query historical security data where it lives without reingesting it into Splunk.

View Video

Splunk

Read more about Federated Search | From Silos to Insight | Splunk Cloud with Apache Iceberg REST and AWS S3

How to Monitor RAM / Memory Utilization for Network Devices

May 28, 2026 By Andrii Kernitskyi In Obkio

Most network teams monitor uptime. A lot of them track CPU. Almost none of them consistently monitor memory utilization on routers, firewalls, and switches, and that gap causes more unexplained outages and intermittent issues than most people realize.

Read Post

Obkio

Read more about How to Monitor RAM / Memory Utilization for Network Devices

Instrument LangGraph agents with Datadog: a practical guide

May 28, 2026 By Datadog In Datadog

AI agents tend to function as black boxes, and it can be difficult to trace and understand agent workflows end-to-end in order to characterize performance. Particularly, you need visibility into the following: By tracing full agent runs with LLM Observability, Datadog AI Agent Monitoring enables you to visualize workflows with flame graphs and quickly spot sources of failures and latency.

Read Post

Datadog

Read more about Instrument LangGraph agents with Datadog: a practical guide

How we cut build times by two-thirds by deleting our CMS

May 28, 2026 By Eli Lennox In Sentry

At Sentry, we’re obsessed with things not breaking. It’s kind of our whole deal. But for a while, our own marketing site was testing that obsession. Much of what you see on sentry.io (the marketing site, blog, open source microsite, etc.) were running on a fleet of legacy Gatsby sites powered by a traditional headless CMS. On paper, it worked.

Read Post

Sentry

Read more about How we cut build times by two-thirds by deleting our CMS

From Insight to Action: Operationalizing Logicmonitor + Catchpoint for Unified Observability

May 28, 2026 By LogicMonitor In LogicMonitor

Visibility without control is just expensive awareness. Most IT teams can see when something’s wrong, but can’t easily tell who’s affected, why, or what to do next. In other words, they lack real control. In this in‑depth session, LogicMonitor’s Callum Brown and Brandon Delap showed how to move past that.

Read Post

LogicMonitor

Read more about From Insight to Action: Operationalizing Logicmonitor + Catchpoint for Unified Observability

Root Cause Analysis: How Engineering Teams Fix Production Issues Faster?

May 28, 2026 By Mohana Ayeswariya J In Atatus

When a production incident strikes, a sudden latency spike, a cascading API failure, a service returning 500s at scale, every minute of downtime has a cost. Root cause analysis (RCA) is the process that turns that chaos into a clear answer: what actually broke, and why. Not the symptom that triggered the alert. The underlying cause.

Read Post

Atatus

Read more about Root Cause Analysis: How Engineering Teams Fix Production Issues Faster?

InfluxDB 3 MCP Server v1.3.0: AI Access to Time Series Data

May 28, 2026 By Jason Stirnaman In InfluxData

A more reliable agent that learns your schema, queries your data, and investigates alerts - that’s what’s possible with this release of our MCP server v1.3.0.

Read Post

InfluxData

Read more about InfluxDB 3 MCP Server v1.3.0: AI Access to Time Series Data

Why Autonomous IT Is Becoming Essential for the Modern Industry

May 28, 2026 By Varun Sharma In Digitate

Autonomous IT shifts enterprises from reactive to proactive operations“By combining AIOps, agentic AI, predictive analytics, and self-healing automation, Autonomous IT helps organizations detect issues early, automate remediation, and prevent downtime before it impacts customers or revenue.

Read Post

Digitate

Read more about Why Autonomous IT Is Becoming Essential for the Modern Industry

Underminr Proved Your DNS Filter Has a Blind Spot. Here's the Other Layer You Should Be Watching.

May 28, 2026 By DNS Spy In DNS Spy

A new attack technique called Underminr was disclosed this week. It slips past protective DNS by abusing shared CDN edge IPs. The DNS query looks clean. The connection lands on malware. This post walks through what Underminr is, why protective DNS misses it, what actually stops it, and the OTHER DNS layer most teams forget to watch.

Read Post

DNS Spy

Read more about Underminr Proved Your DNS Filter Has a Blind Spot. Here's the Other Layer You Should Be Watching.

Faster monitor metrics and new response time stats

May 27, 2026 By Valeria Kurolapova In StatusGator

We’ve rolled out an update to the Monitor metrics view to improve both performance and visibility into response time trends for website and ping monitors.

Read Post

StatusGator

Read more about Faster monitor metrics and new response time stats

FinOps KPIs for IT Infrastructure: A Practical Field Guide for Cost Visibility

May 27, 2026 By Kristy Slimmer In Galileo

Infrastructure cost visibility has become a critical part of IT decision-making. Performance still matters, but for many infrastructure leaders, that’s no longer the full conversation. Leadership teams increasingly want clarity around cost movement, upgrade exposure, underutilized resources, and whether infrastructure decisions are financially defensible. That creates a different requirement for operations teams: visibility that connects technical behavior to business impact.

Read Post

Galileo

Read more about FinOps KPIs for IT Infrastructure: A Practical Field Guide for Cost Visibility

Microsoft 365 backup best practices: A practical guide for IT teams

May 27, 2026 By Raxxelyn L In ManageEngine

Microsoft 365 plays a critical role in modern business communication and collaboration with services such as Exchange Online, SharePoint Online, and OneDrive for Business. However, many organizations overestimate Microsoft 365’s native protection and recoverability. In reality, Microsoft 365 operates under a shared responsibility model. While Microsoft ensures infrastructure availability and uptime, organizations are responsible for protecting and recovering their data.

Read Post

ManageEngine

Read more about Microsoft 365 backup best practices: A practical guide for IT teams

See Every AI Agent Conversation in Sentry (Beta)

May 27, 2026 By Sentry In Sentry

Your AI agent just did something unexpected. Was it a hallucination? A bad tool call? Or did it actually handle things correctly? Sentry Conversations lets you replay the full exchange and find out.

View Video

Sentry

Monitoring

Read more about See Every AI Agent Conversation in Sentry (Beta)

Bridging Bedrock Skills with AI: A Conversation with Jeremy Bradberry

May 27, 2026 By Selector In Selector

What happens when decades of operational experience meet modern AI-driven networking? In the latest episode of Next-Gen Network Heroes, Bob Slevin sits down with Jeremy Bradberry, Senior Network Engineer at Delaware North, to explore how network engineers can modernize infrastructure without losing sight of the operational realities behind the technology. Jeremy shares lessons learned from working on legacy manufacturing systems, how AI is helping engineers analyze data and automate workflows faster than ever before, and why strong standards still matter in today’s AI era.

View Video

Selector

Read more about Bridging Bedrock Skills with AI: A Conversation with Jeremy Bradberry

Install and Activate WhatsUp Gold on an Offline Server

May 27, 2026 By Progress WhatsUp Gold In WhatsUp Gold

In this video, you learn how to install WhatsUp Gold and activate your license when your server does not have access to the internet.

View Video

WhatsUp Gold

Read more about Install and Activate WhatsUp Gold on an Offline Server

Investigate funnel drop-offs with Product Analytics

May 27, 2026 By Datadog In Datadog

For most product teams, funnels are a staple of the analytics toolkit despite a frustrating limitation. You can see which step users are dropping off at, but understanding why requires hours of manual slicing across segments, separate comparison views, and a lot of trial and error before you land on a useful hypothesis. And even when you find something meaningful, taking action typically means jumping to another tool, building a new segment, or filing a request with a data team.

Read Post

Datadog

Read more about Investigate funnel drop-offs with Product Analytics

What's Next for WhatsUp Gold: Unified Network Visibility and Security

May 27, 2026 By Progress WhatsUp Gold In WhatsUp Gold

In this session, we’ll walk through the Progress WhatsUp Gold roadmap - linking recent releases and what’s next - to show how the platform is growing toward greater visibility, stronger security, and more consistent operational workflows for hybrid, multi‑site, and security‑focused environments.

View Video

WhatsUp Gold

Read more about What's Next for WhatsUp Gold: Unified Network Visibility and Security

Hybrid Cloud Monitoring Explained: On-Prem + Cloud + Kubernetes in One View

May 27, 2026 By Motadata In Motadata

Understand what hybrid cloud monitoring is and why it’s critical for managing modern distributed IT environments. Hybrid cloud monitoring helps organizations unify visibility across on-prem infrastructure, public cloud platforms, virtual machines, containers, and Kubernetes clusters in a single monitoring platform. In this video, learn how fragmented monitoring tools create operational blind spots and slow down incident response across hybrid environments.

View Video

Motadata

Read more about Hybrid Cloud Monitoring Explained: On-Prem + Cloud + Kubernetes in One View

Game On: What Retro Gaming Teaches Us About Modern Networks with Jeremy Bradberry

May 27, 2026 By Selector In Selector

What can decades of hands-on operational experience teach us about the future of AI-driven networking? In this episode of Next-Gen Network Heroes, host Bob Slevin sits down with Jeremy Bradberry, Senior Network Engineer at Delaware North, for a conversation that spans everything from legacy manufacturing systems and mainframes to modern AI-assisted network operations. Jeremy shares how his early career working in industrial environments shaped the way he approaches networking today, giving him what he calls an “X-ray vision” into how technology connects directly to business operations.

View Video

Selector

Read more about Game On: What Retro Gaming Teaches Us About Modern Networks with Jeremy Bradberry

What is AI-Powered Observability? A Complete Guide for IT Teams in 2026

May 27, 2026 By Jagdish Sajnani In Motadata

Is your monitoring stack really giving you clarity, or just more alerts? Your monitoring stack is probably working exactly as designed. That is the problem. As systems grow, most IT and platform teams start to see the same patterns: At this point, traditional monitoring starts to feel limited. This is where teams begin exploring AI in observability. In this guide, we will explain what AI-powered observability actually means, how it works, and when it is useful.

Read Post

Motadata

Read more about What is AI-Powered Observability? A Complete Guide for IT Teams in 2026

Download massive logs in CSV without crashing your browser

May 27, 2026 By VictoriaMetrics In VictoriaMetrics

Now you can download massive logs in CSV without crashing your browser in VictoriaLogs. Resources for Further Learning.

View Video

VictoriaMetrics

Read more about Download massive logs in CSV without crashing your browser

AI SRE Agent: How Autonomous Incident Investigation Is Eliminating Manual Root Cause Analysis

May 27, 2026 By Mohana Ayeswariya J In Atatus

A critical production alert wakes you up: p99 latency just hit 4 seconds. You drag yourself to a terminal, open five dashboards, start correlating log timestamps with trace IDs, dig through 47,000 log lines across eight services, and 90 minutes later, you finally find the culprit: an N+1 database query introduced in a deployment that shipped four minutes before the spike started. An Atatus AI SRE Agent would have identified that root cause and drafted a remediation plan in 28 seconds. Not approximation.

Read Post

Atatus

Read more about AI SRE Agent: How Autonomous Incident Investigation Is Eliminating Manual Root Cause Analysis

IPL: How to use the ipl-web TermInput

May 27, 2026 By Bastian Lederer In Icinga

Most form fields ask users for a single value like a name, an email, or a date. But some need a list of values. A plain text input with comma-separated values can technically do the job, but it gives no feedback while typing, no suggestions, and one invalid entry rejects the whole field. The ipl-web TermInput solves this problem. Each value becomes a separate term with its own validation; terms can be enriched, and the input even supports suggestions.

Read Post

Icinga

Read more about IPL: How to use the ipl-web TermInput

The inside scoop on alerting changes in Kubernetes Monitoring

May 27, 2026 By Beverly Buchanan In Grafana

Kubernetes Monitoring in Grafana Cloud comes out of the box with preconfigured alert rules that notify you about issues like CPU throttling, crash-looping pods, and nodes going offline. These rules are installed automatically when you set up the app, and they start evaluating immediately. But if you've recently reinstalled the Kubernetes Monitoring app and your alert notifications stopped arriving, or started looking different, you're not alone.

Read Post

Grafana

Read more about The inside scoop on alerting changes in Kubernetes Monitoring

Spend less time on repetitive tasks with the new automation feature in Grafana Assistant

May 27, 2026 By Kevin Minutti In Grafana

The ability to schedule regular tasks, such as cron jobs, has been around for decades. So why are we still running the same AI prompts by hand every day? As you use Grafana Assistant, our AI-powered observability agent, to stay on top of the state of your system, you likely find yourself asking the same questions. Maybe you want to know what changed overnight, or whether yesterday's deployment hurt latency, or which dashboards or skills are drifting out of date.

Read Post

Grafana

Read more about Spend less time on repetitive tasks with the new automation feature in Grafana Assistant

Best Cron Job Monitoring Tools in 2026 [25 Analyzed, Top 5 Picks]

May 27, 2026 By Leo Baecker In Hyperping

The best cron job monitoring tools are Hyperping (cron monitoring, uptime, on-call, and status pages at a flat rate), Healthchecks.io (free open-source heartbeat monitoring), Cronitor (schedule-aware cron analytics), Better Stack (monitoring with integrated logs and incidents), and UptimeRobot (budget-friendly uptime with basic heartbeat checks).

Read Post

Hyperping

Read more about Best Cron Job Monitoring Tools in 2026 [25 Analyzed, Top 5 Picks]

You don't need to pick one: how Sentry and OpenTelemetry work together

May 27, 2026 By Lazar Nikolov In Sentry

You already instrumented the backend with OpenTelemetry. Your services emit spans. Your teams know the OTel APIs. Maybe you already run a Collector. So when you start evaluating Sentry, the obvious question is: Do you need to replace your OpenTelemetry setup with the Sentry SDK? No. The practical answer is usually: keep OpenTelemetry where it already works, add the Sentry SDK where it gives you more application context, and send OpenTelemetry Protocol (OTLP) events to Sentry.

Read Post

Sentry

Read more about You don't need to pick one: how Sentry and OpenTelemetry work together

Builder in the loop: Eric Lake on making AURA smarter after every incident

May 27, 2026 By Mezmo In Mezmo

Builder in the Loop is a Mezmo interview series focused on the engineers, product leaders, and operators shaping AURA, an open-source, MCP-native agent harness for production operations. The goal is to get past the polished product layer and talk through the decisions that matter when AI starts interacting with real systems. Key questions include: What should agents be allowed to do? How do they get better over time? Where should humans stay in the loop?

Read Post

Mezmo

Read more about Builder in the loop: Eric Lake on making AURA smarter after every incident

Everything We Talked About at O11yCon 2026

May 27, 2026 By Ken Rimple In Honeycomb

We just wrapped O11yCon 2026, and this year's conversations hit differently. Agent-based software development is here, now. It's no longer an optional choice, and everybody is struggling to understand what their agents are doing and how to make them cost less and perform better. Over the course of fifteen talks, we saw clearly that the old assumptions on how and who (or what) writes our software has been upended. Here are some highlights. We'll have videos available in the near future.

Read Post

Honeycomb

Read more about Everything We Talked About at O11yCon 2026

Datadog GPU Monitoring: Get More AI Work from Every GPU Dollar

May 27, 2026 By Datadog In Datadog

In this video, you'll learn how Datadog GPU Monitoring gives ML and platform teams a single view of their GPU fleet, so they can see what's slowing down their AI workloads, fix issues faster, and use the GPUs they already have more efficiently.

View Video

Datadog

Read more about Datadog GPU Monitoring: Get More AI Work from Every GPU Dollar

Web Accessibility Monitoring: an Ops Team Guide

May 27, 2026 By OpsMatters In OpsMatters

Web accessibility monitoring is the automated, scheduled scanning of a website for accessibility failures. Unlike a point-in-time audit, monitoring runs continuously. Code changes, content updates, and third-party scripts all introduce regressions. Monitoring catches them before they become complaints. This guide covers how it works, and where it fits in an ops stack.

Read Post

OpsMatters

Read more about Web Accessibility Monitoring: an Ops Team Guide

Why Clean Dashboards Improve Reporting and Decision-Making

May 27, 2026 By OpsMatters In OpsMatters

Reporting affects how leaders judge performance, catch strain points, and set priorities. Yet many teams still work from crowded views, disconnected files, and stale exports. That arrangement slows review, invites doubt, and weakens confidence in every figure shown on screen. Clean dashboards correct that problem by presenting important measures in a clear order, limiting visual clutter, and making changes easier to spot. Better reporting, in turn, supports steadier choices across finance, sales, operations, and service.

Read Post

OpsMatters

Read more about Why Clean Dashboards Improve Reporting and Decision-Making

Introducing Microsoft DHCP management in OpUtils: From monitoring to full control

May 26, 2026 By Aiswarya Giridharan In ManageEngine

If you manage enterprise networks, this scenario probably sounds familiar: An IP conflict surfaces, connectivity drops for a group of users, and the confusion begins. You check your DHCP server, dig through scope utilization, and try to piece together what went wrong, often after the disruption has already occurred. For years, network administrators have needed a single console for visibility and control into DHCP.

Read Post

ManageEngine

Read more about Introducing Microsoft DHCP management in OpUtils: From monitoring to full control

Explore for Spans: One View with Infinite Depth

May 26, 2026 By Jonny Steiner In Coralogix

It’s 20 minutes into a P0 incident, and you have already switched between four different tools, re-authenticated twice, and translated queries across three incompatible syntax languages. The root cause you are searching for. Well, that is still out there somewhere. The reality of investigative latency is that most engineering teams face navigation problems, not data problems. During high-pressure incidents, teams lose cognitive momentum due to context switching between disconnected telemetry silos.

Read Post

Coralogix

Read more about Explore for Spans: One View with Infinite Depth

What Is Hybrid Cloud Monitoring (And How To Actually Do It Well)

May 26, 2026 By Bhavyadeep Sinh Rathod In Motadata

Most IT teams running a real hybrid setup are not short on data. They are short on a place where the data agrees with itself. By the end, you will know what to ask a vendor for, where teams usually trip, and how to scope a proof of concept that does not burn a quarter. Hybrid cloud monitoring is the ongoing collection of telemetry across your on-prem kit and one or more public clouds, treated as one environment instead of two or three. The goal is not just visibility.

Read Post

Motadata

Read more about What Is Hybrid Cloud Monitoring (And How To Actually Do It Well)

Operator now has Long-Term Support (LTS) version

May 26, 2026 By Vadim Rutkovsky In VictoriaMetrics

VictoriaMetrics Operator has been developing at a neck-breaking pace, bringing numerous improvements, features, and fixes to our community. We usually make at least a single release every two weeks. While this rapid iteration cycle is great for delivering fixes and improvements quickly, it can be challenging for administrators managing critical production environments.

Read Post

VictoriaMetrics

Read more about Operator now has Long-Term Support (LTS) version

Getting Started with gcx: A CLI for AI Agents and Grafana Telemetry | Demo

May 26, 2026 By Grafana In Grafana

AI agents are only as useful as the context they can access. With gcx, your coding agents can connect to Grafana and query real-time production telemetry from your Cloud, Enterprise, or OSS environment. The best part: it avoids the upfront context bloat that can come with loading tools before you even send a prompt. gcx uses a CLI approach, so there’s zero token cost until your agent actually needs to run a query.

View Video

Grafana

Read more about Getting Started with gcx: A CLI for AI Agents and Grafana Telemetry | Demo

Lessons From a CI/CD Supply Chain Attack at Grafana Labs

May 26, 2026 By Grafana In Grafana

When a compromised GitHub Actions workflow targets your CI/CD pipeline, how do you respond — and what do you change so it never happens again? Nick and David from Grafana Security walk through a real supply chain incident triggered by a pull_request_target misconfiguration, showing exactly what broke, what tools caught it, and what the team rebuilt afterward.

View Video

Grafana

Read more about Lessons From a CI/CD Supply Chain Attack at Grafana Labs

Measure the real impact of AI coding tools on software delivery with Datadog AI Impact

May 26, 2026 By Datadog In Datadog

Engineering teams have rapidly adopted AI coding tools, but organizations still struggle to understand their impact. Existing dashboards focus on activity, such as daily active users, acceptance rates, or lines of generated code, but these metrics don’t answer a more important question: Are teams actually shipping more, faster, and with fewer issues?

Read Post

Datadog

Read more about Measure the real impact of AI coding tools on software delivery with Datadog AI Impact

Building a Defensible AI Compliance Framework

May 26, 2026 By ScienceLogic In ScienceLogic

Organizations have moved past theoretical conversations about AI adoption. Models, agents, and autonomous workflows are entering production environments. Business leaders are optimistic about potential gains in efficiency, decision support, and operational scale. Yet beneath this momentum, compliance and risk teams feel a different pressure.

Read Post

ScienceLogic

Read more about Building a Defensible AI Compliance Framework

Search Azure Blob data in-place with BYOS for Cribl Lake

May 26, 2026 By Cribl In Cribl

See how Bring Your Own Storage (BYOS) in Cribl Lake allows teams to connect directly to Azure Blob Storage and instantly search data in place — without moving, duplicating, or rehydrating telemetry. In this demo, Cribl Product Manager Risk Salsa walks through setup, dataset creation, and how to run fast investigations across your Azure-hosted data using Cribl Search.

View Video

Cribl

Read more about Search Azure Blob data in-place with BYOS for Cribl Lake

How to Reduce Help Desk Demand (Hint: It's Not a Help Desk Issue)

May 26, 2026 By Megan Brake In Nexthink

Most IT organizations are trying to reduce help desk demand the same way they have for years: by making the help desk itself more efficient. They improve routing, tighten SLAs, expand self-service, and add AI into the support flow. These changes can make the queue move faster, but they do not stop the work from arriving in the first place. The same problems keep finding their way back to IT. Employees lose time to slow devices, unreliable apps, failed updates, access issues, or confusion after a rollout.

Read Post

Nexthink

Read more about How to Reduce Help Desk Demand (Hint: It's Not a Help Desk Issue)

What Is Internet Congestion and How to Fix It

May 26, 2026 By Andrii Kernitskyi In Obkio

Your VoIP calls are choppy. File uploads are crawling. Your team is complaining that the CRM is sluggish, and remote desktop sessions keep freezing. You check your firewall, your switches look clean, and there are no alerts on your LAN. The problem isn't inside your network. It's upstream, and it's happening quietly every day during peak hours.

Read Post

Obkio

Read more about What Is Internet Congestion and How to Fix It

OpenTelemetry Monitoring with Netdata

May 26, 2026 By Netdata In netdata

If you've standardized on OpenTelemetry (or you're heading that way), you probably know the collector gets your data out, but where it lands and how useful it is once it gets there are separate problems. Netdata now ingests both OTLP metrics and OTLP logs natively, so your OTel pipelines feed directly into the same monitoring experience as everything else in your infrastructure: same dashboards, same alerting, same query interface. No separate backends, no context switching.

View Video

netdata

Read more about OpenTelemetry Monitoring with Netdata

New Explore: Faster answers, less friction, and a better way to investigate your data

May 26, 2026 By Ofri Grushka In Coralogix

There is a moment every engineer knows too well. Something is wrong in production. You have an alert, a vague symptom, and pressure to find the one signal that explains what changed. You open your logs and traces, and you immediately hit the same two problems: the dataset is huge, and the path from “I see something odd” to “I understand why” is full of tiny, exhausting steps. Meet new Explore, our redesigned investigation experience for logs, traces, and spans.

Read Post

Coralogix

Read more about New Explore: Faster answers, less friction, and a better way to investigate your data

Ameet Talwalkar on Building the AI Research Lab

May 26, 2026 By Datadog In Datadog

"We're doing cutting-edge AI, focused on real translational impact: getting our research over the wall and into production." Ameet Talwalkar, Datadog's Chief Scientist, shares what it took to build the AI Research Lab from the ground up — and what makes DAIR different from traditional research teams. At Datadog, research ships. Recent work from the lab includes Toto 2.0, open-weights time series forecasting models ranked on leading benchmarks, and ARFBench, a new benchmark for evaluating AI on real incident data.

View Video

Datadog

Read more about Ameet Talwalkar on Building the AI Research Lab

Your agent can't fix what it can't see

May 26, 2026 By Sergiy Dybskiy In Sentry

Agents are getting better and better at fixing bugs. They’re even getting better at testing their work, thanks to headless browsers, sandboxes, simulators, etc. But what about the bugs that only show up once you bring in different browsers, languages, extensions, internet speeds, and all the other variables that get mixed in the second you ship to prod? Or all the bugs that only show up when you account for… well, humans being humans and doing weird stuff you didn’t expect them to do?

Read Post

Sentry

Read more about Your agent can't fix what it can't see

We Built a Better DNS Propagation Checker. Here's What Makes It Different.

May 26, 2026 By DNS Spy In DNS Spy

Today we are launching the DNS Spy DNS Propagation Checker. It is free. It works on any domain. It shows you what is happening in more places, in more detail, and faster than the tools you have been using. You can try it right now: dnsspy.io/dns-tools/dns-propagation-checker.

Read Post

DNS Spy

Read more about We Built a Better DNS Propagation Checker. Here's What Makes It Different.

WHOIS & RDAP Domain Lookup & Expiry Check

May 26, 2026 By Uptime Website Monitoring In uptime

In this video, we’ll walk you through how to set up and configure your Whois and RDAP Domain Lookup & Expiry Checks in Uptime.com. Learn how to monitor and receive alerts before your domain expires, and protect your registration information from unauthorized modifications. We cover step-by-step instructions for setting up checks through the Uptime.com UI and via API.

View Video

uptime

Read more about WHOIS & RDAP Domain Lookup & Expiry Check

Future Solving with Brian Evergreen (Or: How to Escape those AI Career Jitters)

May 26, 2026 By Nexthink In Nexthink

Brian Evergreen joins the show to challenge the fear-driven narrative around AI and work. Rather than treating the future as something coming for us, Brian argues that leaders and individuals should decide what future they want to create, then work backwards. He explores why “start with the problem” thinking limits AI strategy, how visible strategy and relational leadership can unlock better transformation, and why human connection may become more valuable—not less—in an AI-enabled world. A thoughtful conversation on escaping AI career anxiety, building resilient networks, and creating value beyond efficiency.

View Video

Nexthink

Read more about Future Solving with Brian Evergreen (Or: How to Escape those AI Career Jitters)

Uber blew through entire 2026 AI budget in 4 months on Claude Code

May 25, 2026 By Coralogix In Coralogix

WATCH THE FULL EPISODE: click the link above Welcome to Episode 2 of Live Laugh Logs, the podcast from Annie, Lewis, and Andre from the Coralogix Developer Relations team, where we get together and recap everything going on in our worlds!

View Video

Coralogix

Read more about Uber blew through entire 2026 AI budget in 4 months on Claude Code

Inside the Grafana AI Team Weekly: Guards for AI Observability (May 5, 2026)

May 25, 2026 By Grafana In Grafana

This is an excerpt from a real AI team weekly meeting where we talk about the stuff we build and occasionally also demo them! In this one, Principal Software Engineer Sven Großmann shows a new feature he's working on for AI Observability, called "guards". We're showing parts of our team meetings to build in public in some small way and give you a sneak preview of what's to come. But not all features we show may make it to production! You've been warned. :)

View Video

Grafana

Read more about Inside the Grafana AI Team Weekly: Guards for AI Observability (May 5, 2026)

Cloud MCP server and AI dashboards for metrics

May 25, 2026 By VictoriaMetrics In VictoriaMetrics

Resources for Further Learning.

View Video

VictoriaMetrics

Read more about Cloud MCP server and AI dashboards for metrics

DNS Monitoring for MSPs: A Complete Setup Guide

May 25, 2026 By DNS Spy In DNS Spy

If you run an MSP, this is the call that ages you. The fix is almost always small. A record was edited at the registrar. A vendor changed an MX target. A new tool added a TXT record and pushed SPF over the lookup limit. None of that should reach a client. With the right monitoring, none of it does. Here is a real one. A 40-person law firm renews their EV certificate. The vendor needs a CAA record cleaned up.

Read Post

DNS Spy

Read more about DNS Monitoring for MSPs: A Complete Setup Guide

Exploring Powerful Power BI Dashboards for Smarter Decision-Making

May 25, 2026 By OpsMatters In OpsMatters

Operational dashboards help teams answer urgent business questions quickly. They show whether production is on track, inventory is healthy, downtime is rising, or resources are being stretched too thin. This article explores practical Power BI dashboard examples for operational efficiency across production, supply chain management, resource planning, and performance measurement. It also explains how to build dashboards that support real decisions rather than simply displaying data.

Read Post

OpsMatters

Read more about Exploring Powerful Power BI Dashboards for Smarter Decision-Making

Essential Mac Maintenance Tips for Operations Professionals

May 25, 2026 By OpsMatters In OpsMatters

Operations professionals rarely have the luxury of working slowly. Their day consists of managing deadlines and analyzing reports, communicating between teams, and organizing files. It also involves constantly switching between dozens of services. At this pace, the Mac becomes the hub of daily coordination. That's why performance speed, system stability, and macOS predictability have a direct impact on performance. Most Mac issues arise from a lack of regular maintenance. Chaotic background processes, overflowing storage, outdated security settings, and more can gradually turn even a powerful MacBook into an unstable device.

Read Post

OpsMatters

Read more about Essential Mac Maintenance Tips for Operations Professionals

Shopify outage on May 22, 2026 impacted merchants worldwide

May 23, 2026 By Colin Bartlett In StatusGator

On May 22, 2026, merchants using Shopify experienced a brief but widespread disruption that affected access to product pages, collections, and administrative tools. While the outage lasted less than an hour, it created immediate challenges for businesses that rely on Shopify to manage inventory, update products, and operate online stores. StatusGator detected the developing incident at 10:20 UTC using Early Warning Signals, 18 minutes before Shopify officially acknowledged the outage at 10:38 UTC.

Read Post

StatusGator

Read more about Shopify outage on May 22, 2026 impacted merchants worldwide

Your Microsoft Azure storage, our data lake power: The best of both worlds

May 23, 2026 By Cribl In Cribl

The wait is over for Azure-first organizations. Cribl just launched Cribl Lake Bring Your Own Storage (BYOS) for Microsoft Azure, giving you full data lake power without moving a single byte of telemetry out of your environment. Join us to see how you can finally get the flexibility of a modern data lake while keeping your data in Azure.

View Video

Cribl

Read more about Your Microsoft Azure storage, our data lake power: The best of both worlds

AI Won't Replace You. Someone Using It Will.

May 22, 2026 By Virtana In Virtana

AI isn’t about replacing engineers. It’s about leverage. The teams that win will be the ones that: Triage incidents faster Correlate signals automatically Reduce manual investigation Automate repetitive operational work In observability, that means asking: AI won’t eliminate expertise, it amplifies it. The real risk isn’t AI taking your job. It’s competitors using AI to operate at a speed and efficiency you can’t match.

View Video

Virtana

Read more about AI Won't Replace You. Someone Using It Will.

The New Agentic AI Job Roles IT Leaders Need

May 22, 2026 By Ella Drimer In Nexthink

CIOs are under pressure from every direction. Budgets remain tight, geopolitical uncertainty is forcing organizations to rethink resilience, and workforce expectations continue to evolve. At the same time, AI is accelerating a broader shift across enterprise IT – changing not only how organizations operate, but also the skills and roles they will increasingly depend on. The question is not whether AI will reshape IT teams, but how quickly organizations can adapt to these new ways of working.

Read Post

Nexthink

Read more about The New Agentic AI Job Roles IT Leaders Need

Anthropic Monitoring & Observability with OpenTelemetry and SigNoz

May 22, 2026 By SigNoz - Open Source Observability Platform In SigNoz

Learn how to implement end-to-end monitoring and observability for Anthropic (Claude) API-based applications using OpenTelemetry and SigNoz. In this video, we walk through instrumenting your Anthropic API calls, collecting traces, metrics, and logs, and visualizing everything in SigNoz to gain real-time visibility into performance, failures, and bottlenecks. You'll see how to move from basic logging to production-grade observability, so you can debug faster, optimize latency, and confidently run Claude-powered AI systems at scale.

View Video

SigNoz

Read more about Anthropic Monitoring & Observability with OpenTelemetry and SigNoz

Managed File Transfer Using Serv-U MFT Server

May 22, 2026 By solarwindsinc In SolarWinds

Serv-U presents a two-minute FTP server demo video, featuring Serv-U MFT from SolarWinds. This video will briefly show the FTP, SFTP, web transfer, mobile document viewing and administrative interfaces in action.

View Video

SolarWinds

Read more about Managed File Transfer Using Serv-U MFT Server

Monitor your Render services with AppSignal

May 22, 2026 By Karen Patteri de Souza In AppSignal

AppSignal now supports Render's Metrics Stream. Configure it once in your Render workspace and Render forwards OpenTelemetry metrics to the AppSignal Collector. From there, the metrics show up in your AppSignal app as host metrics and automated dashboards. You only have to set up the stream once per workspace.

Read Post

AppSignal

Read more about Monitor your Render services with AppSignal

Why Traditional Observability Breaks Down in Hybrid Cloud Environments

May 22, 2026 By Dallon Robinette In Selector

Hybrid cloud has reshaped the way enterprises build, run, and troubleshoot digital services. Applications now stretch across on-premises infrastructure, cloud platforms, regional services, interconnects, and distributed dependencies that change constantly. Operational complexity has expanded with that footprint, yet many observability practices still reflect assumptions from an earlier era of simpler architectures and clearer boundaries. That gap shows up fast during an incident.

Read Post

Selector

Read more about Why Traditional Observability Breaks Down in Hybrid Cloud Environments

How to measure developer experience (DevEx) in the AI era

May 22, 2026 By Datadog In Datadog

As AI coding assistants dramatically inflate PR counts, commit frequency, and lines of code, the limitations of individual output metrics have never been more apparent. A developer can now produce significantly more lines per session, but higher volume doesn’t guarantee that the code is stable, maintainable, or successfully running in production. GitClear analyzed over 200 million lines of code and found that code churn nearly doubled following widespread AI adoption.

Read Post

Datadog

Read more about How to measure developer experience (DevEx) in the AI era

How to Deploy Serv-U Gateway to Achieve Secure File Transfer in DMZ

May 22, 2026 By solarwindsinc In SolarWinds

Serv-U Gateway acts as a reverse proxy for secure file transfer for DMZ networks. Serv-U Gateway terminates all incoming connections to the DMZ and never stores any data at rest in the DMZ. Serv-U Gateway is supported on both Serv-U FTP Server and Managed File Transfer (MFT) Server. Comply with PCI DSS by using Serv-U Gateway.

View Video

SolarWinds

Read more about How to Deploy Serv-U Gateway to Achieve Secure File Transfer in DMZ

Episode 11 - Human Choices in an AI Future (Part 1)

May 22, 2026 By Digitate In Digitate

What if the biggest risk in the AI era isn't the technology, but waiting for someone else to tell you what to do with it? In this episode of The Intelligent Enterprise, host Tom Stoneman sits down with Karthik Ravindran, General Manager of Enterprise Data and AI at Microsoft, to unpack what it really takes to thrive alongside AI, not in spite of it.

View Video

Digitate

Read more about Episode 11 - Human Choices in an AI Future (Part 1)

Zero to Dashboard with Grafana Assistant and the Infinity datasource plugin

May 22, 2026 By Grafana In Grafana

Senior Developer Advocate Nicole van der Hoeven demonstrates how to go from zero to dashboard in a few minutes without using any queries, with the help of Grafana Assistant and the infinity datasource plugin for Grafana. Nicole is using the rawg.io video game database API to visualize games and get recommendations for what to play next!

View Video

Grafana

Read more about Zero to Dashboard with Grafana Assistant and the Infinity datasource plugin

The Checkly Playwright Reporter: Live Demo, Rocky AI RCA & Production Monitoring

May 22, 2026 By Checkly In Checkly

Your Playwright tests catch bugs. The hard part is figuring out what actually broke — and sharing that context with your team. This session shows exactly how the Checkly Playwright Reporter solves that: one shared home for all your test runs, AI-powered root cause analysis, and a direct path from failing test to production monitor. María de Antón, PM for Playwright features at Checkly, runs a live demo on a real app with real failures.

View Video

Checkly

Read more about The Checkly Playwright Reporter: Live Demo, Rocky AI RCA & Production Monitoring

A Runnable Reference Architecture for Industrial IoT on InfluxDB 3

May 22, 2026 By Ian Clark In InfluxData

Industrial teams keep telling us the same thing: the data is there, but the stack to act on it isn’t. PLCs, CNCs, SCADA systems, vibration sensors, and quality stations all generate high-frequency telemetry that gets stranded in proprietary historians or stitched together with point integrations nobody wants to own. By the time anyone looks at it, the moment to act has passed.

Read Post

InfluxData

Read more about A Runnable Reference Architecture for Industrial IoT on InfluxDB 3

Application Health and Scheduled tasks are now a breeze to set up

May 22, 2026 By Freek Van der Herten In Oh Dear

Two of our most powerful checks just got a lot easier to set up. Application Health and Scheduled task monitoring used to mean wiring up an endpoint or pings by hand. Now an AI prompt does that part for you.

Read Post

Oh Dear

Read more about Application Health and Scheduled tasks are now a breeze to set up

Sponsored Post

Multi-Cloud Monitoring And Why Status Pages Aren't Enough

May 21, 2026 By StatusGator In StatusGator

Multi-cloud environments make outage detection harder. Relying on individual status pages from Amazon Web Services, Google Cloud Platform, and Microsoft Azure often leads to delayed, incomplete, or conflicting signals during incidents. This article explains how fragmented visibility impacts incident response, and how aggregating status across cloud and SaaS dependencies helps DevOps teams detect outages faster and respond with confidence.

Read Post

StatusGator

Read more about Multi-Cloud Monitoring And Why Status Pages Aren't Enough

Meet the new Mobot: Your log analysis partner

May 21, 2026 By Margaret Selid In Sumo Logic

Every single day, the Sumo Logic Platform analyzes more than four exabytes of log data. The good news? The answers to your application performance, infrastructure health, and security incidents are hidden in those logs. The challenge? Historically, uncovering those answers required query language fluency. That’s why we built Mobot, our conversational interface that connects users to advanced AI capabilities using natural language.

Read Post

Sumo Logic

Read more about Meet the new Mobot: Your log analysis partner

Closing the Evidence Gap

May 21, 2026 By ScienceLogic In ScienceLogic

Compliance teams are entering a moment where the expectations placed on them far exceed the visibility tools they have available. AI-driven environments introduce new forms of variance, drift, and distributed decision-making that unfold across infrastructure, models, agents, and services. These patterns do not map cleanly to the evidence structures that compliance processes rely on.

Read Post

ScienceLogic

Read more about Closing the Evidence Gap

SIEM alerts: everything you need to know

May 21, 2026 By Muhammed Ali In Honeybadger

Let's walk through setting up SIEM (Security Information and Event Management) alerts to monitor security threats in applications. We will explain what SIEM alerts are, why they're relevant with regard to application security, and provide practical examples of common alerts a developer could implement. We will show how to configure simple alerts with Honeybadger Insights.

Read Post

Honeybadger

Read more about SIEM alerts: everything you need to know

Project and manage cloud spend with Datadog budget forecasting

May 21, 2026 By Katherine Broner In Datadog

Cloud and SaaS spending continues to grow across teams, services, and providers, changing too quickly for retrospective cost management workflows to keep up. Finance and engineering leaders often rely on last month’s reports or manually maintained spreadsheets, which don’t reflect current usage. As a result, teams lack context on how spend is trending and often discover budget overruns only after they’ve occurred.

Read Post

Datadog

Read more about Project and manage cloud spend with Datadog budget forecasting

Generate test scripts from natural language with Grafana Assistant: introducing k6 Script Authoring

May 21, 2026 By Vicente Ortega Torres In Grafana

Performance testing is critical to ensure your applications stay reliable under load, but writing the scripts themselves often feels like a chore. Most engineers already know the scenario they want to test; the hard part is translating that intent into a working performance test. Even experienced developers who use k6 can lose time looking up syntax, configuring load stages and thresholds, or debugging boilerplate code before they can run a meaningful test.

Read Post

Grafana

Read more about Generate test scripts from natural language with Grafana Assistant: introducing k6 Script Authoring

What keeps tech people motivated?

May 21, 2026 By VictoriaMetrics In VictoriaMetrics

What keeps tech people motivated? Is it a good tech challenge or a life-work balance? Elif, CNCF Ambassador, explains. Resources for Further Learning.

View Video

VictoriaMetrics

Monitoring

Read more about What keeps tech people motivated?

Elevate Your MSP: From Reactive IT to Proactive Digital Experience Assurance

May 21, 2026 By LogicMonitor In LogicMonitor

Internet Performance Monitoring (IPM) is essential for MSPs to move from reactive support to proactive experience assurance. Green lights on your internal dashboard don’t mean users are having a good experience. That was the central tension in this conversation between LogicMonitor RVP of Managed Services, Daniel Gad, and Catchpoint Field CTO, Gerardo Dada, and it’s a problem most MSPs haven’t fully solved.

Read Post

LogicMonitor

Read more about Elevate Your MSP: From Reactive IT to Proactive Digital Experience Assurance

A Runnable Reference Architecture for Network Telemetry on InfluxDB 3

May 21, 2026 By Mike Devy In InfluxData

Networks generate the most data of any system in your stack and have the least patience for stale dashboards. Interface counters tick every second. BGP sessions flap. Flow records arrive in bursts. When something goes wrong, you don’t have 10 seconds to wait for an aggregation to finish.

Read Post

InfluxData

Read more about A Runnable Reference Architecture for Network Telemetry on InfluxDB 3

The product analytics you already have

May 21, 2026 By Rahul Chhabria In Sentry

You already have everything you need. If you’re using Sentry, you have traces, structured logs, and now application metrics. Most teams use that stuff for debugging and stop there. But get this: that same data can answer most of the product questions you’ve been sending to a separate analytics tool, maintained by a separate team, with a separate data model and a separate bill. (Not all of them.

Read Post

Sentry

Read more about The product analytics you already have

Using AI to Instrument Applications with OpenTelemetry

May 21, 2026 By Sematext In Sematext

OpenTelemetry is one of the best things that’s happened to observability in the last decade. It’s open. It has SDKs for every language that matters. It’s vendor neutral. The OTel community has been doing the hard work of standardizing how applications emit telemetry, so that you, the engineer, don’t have to learn five different agent formats to monitor five different services.

Read Post

Sematext

Read more about Using AI to Instrument Applications with OpenTelemetry

What is Patch Management and Why is It Important? A Complete Guide

May 21, 2026 By Jagdish Sajnani In Motadata

Patch management is one of the cheapest security steps you can take, and one of the most often ignored. Most IT teams know they are behind on patching. They just disagree on how far behind they actually are. Here is the simple truth: That waiting period is the problem patch management exists to solve. This guide covers what patch management actually is, how the full process runs from start to finish, where most teams quietly fall behind, and what to look for in a tool that holds up today.

Read Post

Motadata

Read more about What is Patch Management and Why is It Important? A Complete Guide

AI copilot in VMAnomaly UI faster setup and tuning

May 21, 2026 By VictoriaMetrics In VictoriaMetrics

vmanomaly has now AI Copilot in the UI, with the help of MCP. Resources for Further Learning.

View Video

VictoriaMetrics

Read more about AI copilot in VMAnomaly UI faster setup and tuning

Inside the Grafana AI Team Weekly: Workspaces and Investigations (April 28, 2026)

May 21, 2026 By Grafana In Grafana

This is an excerpt from a real AI team weekly meeting where we talk about the stuff we build and occasionally also demo them! In this one, Staff Product Design Engineer Ben Darlow demos improvements to Workspace Home. Staff Software Engineer Sonia Aguilar and Principal Software Engineer Sven Großmann also demo a new dependency graph view for Investigations. We're showing parts of our team meetings to build in public in some small way and give you a sneak preview of what's to come. But not all features we show may make it to production! You've been warned. :)

View Video

Grafana

Read more about Inside the Grafana AI Team Weekly: Workspaces and Investigations (April 28, 2026)

Avantra Platform Overview

May 21, 2026 By Avantra In Avantra

Introducing Avantra: Purpose-built for SAP, Avantra's AIOps platform empowers the world’s top global enterprises and MSPs to run at their best — preventing costly unplanned downtime by automating the detection and resolution of operational issues before they impact the business. An SAP Partner Edge Build and Cloud ALM Silver Partner, Avantra partners with SAP to enable teams to observe everything, automate what matters, govern continuously, and navigate SAP transformation with clarity and confidence... anywhere SAP runs.

View Video

Avantra

Read more about Avantra Platform Overview

How to Overcome Government Payment Fraud with Speed and Scale

May 21, 2026 By Splunk In Splunk

Government payment fraud is a fast-growing risk for public sector organisations in Australia and globally. From welfare and healthcare payments to business grants and disaster relief, increasingly sophisticated organised criminal networks and other actors exploit complex, high-volume government programs to unlawfully access public funds. The impact is significant—billions lost, program integrity undermined, and essential resources diverted.

View Video

Splunk

Read more about How to Overcome Government Payment Fraud with Speed and Scale

What is Service Request Management? A Complete Guide

May 21, 2026 By Amartya Gupta In Motadata

If you run a service desk, you’ve likely seen this pattern: Service requests, incidents, and change requests often end up in the same queue under the same SLA, even though they require different handling. Many requests that could be resolved through self-service still go through manual intervention, while misclassification adds further delays and confusion. Service request management brings structure to this by defining how requests are handled end to end.

Read Post

Motadata

Read more about What is Service Request Management? A Complete Guide

Debugging Next.js Best Practices: Logs and Tracing

May 21, 2026 By Sentry In Sentry

Next.js applications can be challenging to debug in production. It’s not always clear where an issue originated or how it impacts users. Hydration errors, server component failures, and performance bottlenecks don’t always come with clear answers.

View Video

Sentry

Read more about Debugging Next.js Best Practices: Logs and Tracing

The "Single Pane of Glass" Is Dead - What Network Teams Actually Need Is Intelligence

May 21, 2026 By Justin Ryburn In Kentik

The infrastructure industry spent two decades chasing a single pane of glass. The future looks different: domain-expert AI platforms that reason deeply within their own data, connected through tool chaining when problems cross boundaries.

Read Post

Kentik

Read more about The "Single Pane of Glass" Is Dead - What Network Teams Actually Need Is Intelligence

Inside the Anthropic + Claude Code Hype at AWS Summit London: Live Laugh Logs ep. 2

May 21, 2026 By Coralogix In Coralogix

Are companies blowing through their entire 2026 AI budget in a matter of months? Welcome to Episode 2 of Live Laugh Logs, the podcast from Annie, Lewis, and Andre from the Coralogix Developer Relations team, where we get together and recap everything going on in our worlds!

View Video

Coralogix

Read more about Inside the Anthropic + Claude Code Hype at AWS Summit London: Live Laugh Logs ep. 2

What's New at Cribl 4.18: On release days, we wear teal.

May 21, 2026 By Cribl In Cribl

In this episode, Leon explores a bunch of BYO (bring your own) enhancements to Cribl including the ability to bring your own AI model, storage, and more.

View Video

Cribl

Read more about What's New at Cribl 4.18: On release days, we wear teal.

What is Log Management? The IT Team's Guide to Taming Log Data

May 21, 2026 By Motadata In Motadata

Understand what log management is and why it’s essential for troubleshooting, security, and observability across modern IT environments. Log management helps organizations collect, centralize, parse, and analyze logs from servers, applications, cloud platforms, containers, and network devices in one searchable platform. Learn how centralized log monitoring reduces mean time to resolution (MTTR), eliminates siloed troubleshooting, and helps IT teams detect anomalies faster using AI-powered analytics.

View Video

Motadata

Read more about What is Log Management? The IT Team's Guide to Taming Log Data

The Complete Guide to Observability Pipelines

May 21, 2026 By Mohana Ayeswariya J In Atatus

Modern engineering teams are drowning in telemetry data. A mid-sized Kubernetes cluster running 50 microservices can generate millions of log lines per minute. Add distributed traces, Prometheus metrics, cloud provider events, and application-level instrumentation and you're looking at terabytes of observability data every day. The problem isn't just volume. It's what you do with it.

Read Post

Atatus

Read more about The Complete Guide to Observability Pipelines

Error Budget in SRE: The Complete Guide (2026)

May 20, 2026 By Nuno Tomas In isDown

An error budget is the acceptable amount of unreliability permitted by your SLO over a defined time window. It is not a target. It is not a stretch goal. It is a hard ceiling that, when breached, should trigger a pre-agreed organizational response — feature freezes, postmortems, or infrastructure investment. The formula is blunt: Error Budget = 1 - SLO Target Error Budget (time) = (1 - SLO Target) × Window Duration For a 30-day window: That last number should make you uncomfortable.

Read Post

isDown

Read more about Error Budget in SRE: The Complete Guide (2026)

Automation will reshape IT operations within three years, say a third of teams

May 20, 2026 By SolarWinds In SolarWinds

SolarWinds research reveals growing confidence in automation, however concerns around accuracy, skills and oversight remain.

Read Post

SolarWinds

Read more about Automation will reshape IT operations within three years, say a third of teams

Sponsored Post

Proactive Monitoring for NetApp ONTAP

May 20, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

This whitepaper explores how proactive monitoring, using Microsoft SCOM enhanced with the NiCE NetApp ONTAP Management Pack, enables IT teams to detect issues early, optimize storage usage, and ensure reliable, predictable performance across both on-premises and hybrid-cloud infrastructures.

Read Post

NiCE IT Mgmt

Read more about Proactive Monitoring for NetApp ONTAP

Ask anything about your application - Seer Agent answers

May 20, 2026 By Sentry In Sentry

In this workshop, Paul Jaffre will show you how to query Sentry’s telemetry using natural language with Seer Agent.

View Video

Sentry

Read more about Ask anything about your application - Seer Agent answers

How to Create Your Own Plugins and Check Commands in Icinga 2

May 20, 2026 By Sukhwinder Dhillon In Icinga

If you’ve been using Icinga 2 for a while, you probably know the built-in checks cover a lot of ground: disk space, CPU, memory, ping. But sooner or later you’ll run into something specific to your setup that no existing plugin handles. That’s where writing your own plugin comes in. The good news? It’s simpler than it sounds. Icinga 2 doesn’t care what language your plugin is written in. It just runs the script, reads the exit code, and displays the output. That’s it.

Read Post

Icinga

Read more about How to Create Your Own Plugins and Check Commands in Icinga 2

The Productivity Tax of Repeat IT Failures in Technology Companies

May 20, 2026 By Chanté Frazer In Nexthink

Technology companies are being pushed to deliver faster outcomes while justifying growing investment in AI, SaaS, and digital infrastructure. But productivity does not improve just because new tools are deployed. It improves when employees can use those tools without the constant drag of slow devices, unstable applications, and fixes that do not fully solve the problem. That is the productivity tax of digital friction.

Read Post

Nexthink

Read more about The Productivity Tax of Repeat IT Failures in Technology Companies

Anomaly Detection in HEAL Software AIOps

May 20, 2026 By HEAL Software In HEAL Software

Every week, thousands of engineers, SREs, and IT leaders type questions about anomaly detection into ChatGPT, Reddit, and Stack Overflow. They are all trying to answer the same underlying question: why do production incidents keep catching us off guard, and how do we stop them?

Read Post

HEAL Software

Read more about Anomaly Detection in HEAL Software AIOps

3 DNS Records Most Companies Forget to Monitor

May 20, 2026 By DNS Spy In DNS Spy

Here are the three records most teams forget to monitor — and what happens when they break.

Read Post

DNS Spy

Read more about 3 DNS Records Most Companies Forget to Monitor

Unlock telemetry value with a well-planned data lake

May 20, 2026 By Cribl In Cribl

Your SIEM only holds a slice of your telemetry. Your data lake holds the rest. We'll show you how to use that to your advantage for investigations, threat hunting, and reporting. Why your data lake beats your SIEM for investigations – Your SIEM keeps a short window of expensive, filtered data. Your data lake keeps everything. When something goes wrong, that difference matters more than you think Threat hunting without the handcuffs – Hunting across months of data in a SIEM is painful and costly. We'll show you how a well-planned lake makes broad, deep searches practical and affordable.

View Video

Cribl

Read more about Unlock telemetry value with a well-planned data lake

Teach Your AI Coding Agent to Instrument, Monitor, and Troubleshoot Infrastructure with netdata/skills

May 20, 2026 By Shyam Sreevalsan In netdata

There’s a growing ecosystem of AI coding agents: Claude Code, Cursor, Copilot, Codex, Gemini CLI, Windsurf, and others. They’re good at writing code, but they don’t inherently know how to instrument that code for observability, configure monitoring infrastructure, or troubleshoot production systems using real telemetry data. That knowledge lives in documentation, runbooks, and the heads of your senior SREs.

Read Post

netdata

Read more about Teach Your AI Coding Agent to Instrument, Monitor, and Troubleshoot Infrastructure with netdata/skills

AI Powered IT Operations & Autonomous Resilience | Full SolarWinds Day Q2 2026 Event Replay

May 20, 2026 By solarwindsinc In SolarWinds

Watch the full SolarWinds Day 2026 event on-demand and discover how AI is transforming IT operations, observability, and incident response. In this exclusive event, SolarWinds CEO Sudhakar Ramakrishna and product leaders unveil the company’s vision for Autonomous Operational Resilience—powered by AI, automation, and unified visibility across hybrid and multi-cloud environments.

View Video

SolarWinds

Read more about AI Powered IT Operations & Autonomous Resilience | Full SolarWinds Day Q2 2026 Event Replay

Honeycomb Canvas: The Multiplayer Workspace for the Agentic Era

May 20, 2026 By Kale Bogdanovs In Honeycomb

Last week, we launched a major update to Canvas, our investigation workspace. The new Canvas has evolved from an AI co-pilot you chat with to a place where your whole team, human and agent, can work the same problem on the same surface. Auto-investigations begin the moment a trigger, SLO, or anomaly fires. Custom skills encode your team's runbooks so every agent investigates with your team's expertise built in.

Read Post

Honeycomb

Read more about Honeycomb Canvas: The Multiplayer Workspace for the Agentic Era

How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability

May 20, 2026 By Datadog In Datadog

Without experiment infrastructure to help you test your LLM applications, every research session starts with the same questions: What have we tried previously? What were the numbers? Which prompt version produced that result? Why did we discard that approach? The answers live in scattered notes, terminal history, and half-remembered conversations. Each handoff between sessions loses context. In practice, iteration can slow down as teams get bogged down in testing and analysis.

Read Post

Datadog

Read more about How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability

How to audit and clean up monitors effectively

May 20, 2026 By Capucine Marteau In Datadog

Alert fatigue and blind spots develop together. Monitoring stacks that generate noise while missing critical issues may have incomplete coverage or poorly configured alerts. As they grow reactively and without structured coverage assessment, both issues worsen. Teams will often add monitors when something breaks and tune thresholds when alerts become unbearable, but rarely audit their overall setup to see if it works.

Read Post

Datadog

Read more about How to audit and clean up monitors effectively

Introducing Atatus Sensitive Data Classifier

May 20, 2026 By Mohana Ayeswariya J In Atatus

Your logs know too much. Every debug statement, every traced request, every APM span can carry the risk of capturing something they shouldn't. A customer email. A JWT token. A credit card number. An API key that was never meant to leave your payment service. It doesn't look like a breach. There's no alert. Your observability platform just quietly accumulates sensitive data like indexed, replicated, and accessible to every engineer with log query access.

Read Post

Atatus

Read more about Introducing Atatus Sensitive Data Classifier

Building a CloudWatch metrics pipeline: parsing OpenTelemetry data

May 20, 2026 By Jeff Kreeftmeijer In AppSignal

AWS delivers CloudWatch metrics in OpenTelemetry format via Firehose, but AppSignal uses its own internal format. Building the parser to bridge these two formats presented several technical challenges. The metrics arriving through this pipe power AWS automated dashboards. When AppSignal detects metrics from a supported AWS service, it creates a dashboard for it automatically, with pre-built charts grouped by category: compute, databases, networking, messaging, storage, and others.

Read Post

AppSignal

Read more about Building a CloudWatch metrics pipeline: parsing OpenTelemetry data

How Airbnb Built a High-Volume Metrics Pipeline with OpenTelemetry and vmagent

May 20, 2026 By Pablo Fernandez In VictoriaMetrics

We always knew that Airbnb’s engineering is operating on a completely different scale, and their new high-volume metrics pipeline is proof of that. This is one of those rare stories where scale and efficiency go hand in hand - they modernized their observability stack with open source components and reduced cost by an order of magnitude. Airbnb is now processing more than 100 million samples per second on a single production cluster.

Read Post

VictoriaMetrics

Read more about How Airbnb Built a High-Volume Metrics Pipeline with OpenTelemetry and vmagent

From Signal Corps to Space: Building Networks That Can't Fail with Troy MacDonald

May 20, 2026 By Selector In Selector

What does it take to succeed in networking when complexity is constantly increasing, and change never slows down? In this episode of Next-Gen Network Heroes, host Bob Slevin sits down with Troy (David) MacDonald, a network engineer at Blue Origin and former U.S. Army Chief Warrant Officer, to explore a career that spans from infantry beginnings to designing and managing large-scale, mission-critical networks.

View Video

Selector

Read more about From Signal Corps to Space: Building Networks That Can't Fail with Troy MacDonald

Optimizing Team Strengths for Effective Operations

May 20, 2026 By Selector In Selector

Most people think great network engineers are defined by technical expertise. This episode challenges that idea. Because what Troy McDonald shows is that the real differentiator isn’t just technical skill—it’s the ability to translate complexity into clarity. From military operations to enterprise networks, one lesson keeps showing up.

View Video

Selector

Read more about Optimizing Team Strengths for Effective Operations

Microsoft Fabric outage disrupted analytics workloads on May 18, 2026

May 19, 2026 By Andy Libby In StatusGator

On May 18, 2026, organizations using Microsoft Fabric experienced a multi-hour outage that disrupted analytics workloads, reporting systems, and access to platform services across several regions. StatusGator detected the developing incident at 14:00 UTC using Early Warning Signals, 37 minutes before Microsoft officially acknowledged the outage at 14:37 UTC.

Read Post

StatusGator

Read more about Microsoft Fabric outage disrupted analytics workloads on May 18, 2026

The $600 billion wake-up call: New Splunk research reveals downtime is a systemic business crisis

May 19, 2026 By Splunk In Splunk

600 billion annual impact: Aggregate downtime costs for the Global 2000 have soared 50% in two years. $15,000 per minute: The average cost of downtime for organisations, highlighting the immediate financial impact of service disruptions. 3.4% stock price drop: The average decline in shareholder value following a single downtime incident.

Read Post

Splunk

Read more about The $600 billion wake-up call: New Splunk research reveals downtime is a systemic business crisis

Reality Byes The Birth of Mobile DEX (Opening the Black Box)

May 19, 2026 By Nexthink In Nexthink

On this edition of Reality Bytes, Dina and Tom welcome Rose Cicala, Director of Product Marketing, and Mile Djokic, Senior Product Manager, to discuss the launch of Mobile Experience — and what it means for the future of Digital Employee Experience. Together, they explore why mobile devices have become mission-critical for frontline and hybrid workforces, why mobile visibility has remained a major blind spot for IT, and how Mobile DEX changes that. The conversation covers healthcare, retail and manufacturing use cases, AI compliance, application insights, VDI convergence, and the growing shift toward mobile-first work strategies.

View Video

Nexthink

Read more about Reality Byes The Birth of Mobile DEX (Opening the Black Box)

Multiple API Keys Are Here - More Keys, Better Control, Stronger Security

May 19, 2026 By Brian Gardner In ObservIQ

Today we're rolling out a major upgrade to API Keys in Bindplane. You can now create up to 25 API keys per project, give each one a description, set an expiration date, and delete keys you no longer need. Under the hood, every key is now hashed with Argon2, the modern standard for credential storage. If you've been working around the old single-key limit by sharing one key across CI jobs, scripts, and teammates, this release is for you.

Read Post

ObservIQ

Read more about Multiple API Keys Are Here - More Keys, Better Control, Stronger Security

AI- Powered Anomaly Detection in Observability

May 19, 2026 By Mohana Ayeswariya J In Atatus

Your dashboards are green. Your thresholds are calm. Then a cascade failure starts and you don't know until users flood your status page. Traditional monitoring is reactive by design. Anomaly detection in observability changes that equation entirely.

Read Post

Atatus

Read more about AI- Powered Anomaly Detection in Observability

Diagnose slow PostgreSQL queries faster with explain plan correlation

May 19, 2026 By Allen Zhou In Datadog

When a PostgreSQL query runs slowly, engineers often start with EXPLAIN ANALYZE. The output is a tree of plan nodes, each one describing a step the database took to execute it. A query with several joins and a subquery can produce 20 or more nodes. But the plan gives no visual indication of which node corresponds to each clause in the SQL text. Diagnosing the problem means viewing the plan in one window and the query in another, manually tracing connections between them.

Read Post

Datadog

Read more about Diagnose slow PostgreSQL queries faster with explain plan correlation

Explore Datadog metrics with Natural Language Queries

May 19, 2026 By Racheal Ou In Datadog

Metric exploration often begins with a simple question, but answering that question can require deep familiarity with metric names, tag structures, and query syntax. Experienced users spend time refining queries through trial and error, and newer users struggle to get started. As a result, teams face delays in troubleshooting and analysis. Valuable observability data, including metrics that are difficult to discover and query, also goes underused.

Read Post

Datadog

Read more about Explore Datadog metrics with Natural Language Queries

Autonomous IT for Life Sciences: From Downtime to Compliance-Ready Resilience

May 19, 2026 By Varun Sharma In Digitate

“Autonomous” doesn’t mean uncontrolled. AI-driven IT in life sciences needs guardrails based on GxP, risk, and validation. Low-risk systems can automate; high-risk systems need human approval and documentation.

Read Post

Digitate

Read more about Autonomous IT for Life Sciences: From Downtime to Compliance-Ready Resilience

Community Spotlight: A Native iOS App for Your InfluxDB Data

May 19, 2026 By Ashley Fowler In InfluxData

One of the things we love most about building an open source platform is seeing what the community creates with it, and independent developer Anton Havekes recently built something we just had to share. Anton put together Influx Dashboard, a native iOS app that connects to your InfluxDB instance and brings your time series data straight to your phone. We’re genuinely thrilled to see this kind of work come out of the community.

Read Post

InfluxData

Read more about Community Spotlight: A Native iOS App for Your InfluxDB Data

12 IT Infrastructure Best Practices Every IT Leader Should Follow

May 19, 2026 By Jagdish Sajnani In Motadata

Why do IT infrastructure issues continue to slow down teams even when tools keep improving? In most IT environments, the challenge is not a single failure. It is a set of ongoing operational gaps that are easy to overlook but difficult to control over time. A few of the common challenges include: In 2026, IT environments are more distributed and fast-changing than before. Hybrid infrastructure, cloud adoption, and strict compliance requirements make consistency harder to maintain.

Read Post

Motadata

Read more about 12 IT Infrastructure Best Practices Every IT Leader Should Follow

The New Compliance Crisis: AI Is Outrunning Its Controls

May 19, 2026 By ScienceLogic In ScienceLogic

Enterprises have spent decades refining compliance frameworks around workflows that were linear, predictable, and well-documented. These frameworks were built for systems that executed actions deterministically and for human operators who made decisions slowly enough for oversight to keep up. In that environment, compliance could function as a retrospective discipline because the evidence required to validate behavior generally existed in complete, stable form.

Read Post

ScienceLogic

Read more about The New Compliance Crisis: AI Is Outrunning Its Controls

What's New in Graylog V7.1 Webinar

May 19, 2026 By Graylog In Graylog

What to Expect? Graylog 7.1 is built for lean security and IT operations teams who need real outcomes, not more tools, more add-ons, or more manual work. This 30-minute deep dive session covers what's new and what it means for your team. What you'll learn: See Graylog 7.1 in action: detection, triage, and documentation without compromise.

View Video

Graylog

Read more about What's New in Graylog V7.1 Webinar

Why SRE agents need orchestration, not just more tools

May 19, 2026 By Mezmo In Mezmo

Single agents are a useful starting point for SRE workflows. They are not where the architecture should end. The first version is simple enough: connect an LLM to a few tools, give it a system prompt, and point it at your infrastructure. It can summarize an alert, pull logs, answer questions, and draft a useful next step. Then the workflow gets real. You add GitHub for runbooks, Kubernetes for cluster state, PagerDuty for incident context, Prometheus for metrics, and Mezmo for telemetry.

Read Post

Mezmo

Read more about Why SRE agents need orchestration, not just more tools

Media Monitoring Evolved: How AI Makes Website Tracking Tools Essential

May 19, 2026 By ChangeTower In ChangeTower

The average person would need 180 million years to read everything published online in a single day. For organizations trying to track what people say about their brand, manual monitoring stopped being viable somewhere around 2015. AI-powered media monitoring tools now process this impossible volume automatically, detecting brand mentions, analyzing sentiment, and flagging potential crises before they spiral.

Read Post

ChangeTower

Read more about Media Monitoring Evolved: How AI Makes Website Tracking Tools Essential

Agent Timeline: The Flight Recorder for Your AI Agents

May 19, 2026 By Dan Juengst In Honeycomb

Last week, we introduced Agent Timeline, a powerful new observability experience purpose-built for debugging AI agent workflows in production. Agent Timeline uniquely connects AI-layer visibility to full-stack observability by organizing telemetry around an agentic conversation. A conversation contains one or more agent executions, each of which may contain LLM calls, tool invocations, handoffs, retries, human escalations, and downstream system calls.

Read Post

Honeycomb

Read more about Agent Timeline: The Flight Recorder for Your AI Agents

How Ecommerce Brands Track Regional Price Differences Online

May 19, 2026 By OpsMatters In OpsMatters

Many online stores display different prices depending on the user's location. The same product may cost less in Eastern Europe, more in the United States, and have completely different discounts in Germany or France. There are several reasons for this: This is especially common in marketplaces, electronics, fashion, and travel-related ecommerce. For international brands, understanding these pricing differences has become an important part of market analytics.

Read Post

OpsMatters

Read more about How Ecommerce Brands Track Regional Price Differences Online

Commercial Trucking Technology for Better Driver Awareness

May 19, 2026 By OpsMatters In OpsMatters

Modern highways demand constant focus from professional drivers. New tools help fleets stay safe on long trips across the country. Fleet operators can monitor road hazards much better than in past decades. New onboard systems protect both the driver and the cargo from unexpected road events. High highway speeds mean split-second decisions dictate safety margins. Stay aware of your surroundings to prevent severe accidents before they happen. New updates give teams better visibility than ever. Drivers feel more secure when they have technology backing them up on dark roads.

Read Post

OpsMatters

Read more about Commercial Trucking Technology for Better Driver Awareness

The Importance of Time Synchronization in Windows Authentication

May 18, 2026 By Babu Sundaram In eG Innovations

Kerberos is a secure network authentication protocol that allows users and systems to prove their identity over a network without sending passwords in plain text. It is widely used in enterprise environments (for example, in Windows domains) to enable single sign-on (SSO). At its core, Kerberos uses a trusted authority called the Key Distribution Center (KDC) to issue encrypted “tickets” that verify identity.

Read Post

eG Innovations

Read more about The Importance of Time Synchronization in Windows Authentication

Status Page Snapshot

May 18, 2026 By Uptime Website Monitoring In uptime

Get a quick tour of the Uptime.com Status Page solution. Uptime.com has you covered regardless of your needs, from SLA accountability to Public Status updates to Internal communications.

View Video

uptime

Read more about Status Page Snapshot

Cache-busting magic variables for uptime checks

May 18, 2026 By Mattias Geniar In Oh Dear

Over the weekend, my own site went down and Oh Dear didn't catch it. The origin server had fallen over, but Cloudflare happily kept serving the cached HTML. Everything looked fine from the outside. Embarrassing. Scratching our own itch here, we just shipped magic variables: short placeholders you can drop into your monitor URL, request headers, or POST payload. Right before each check, we replace them with fresh values, so every request is unique enough to slip past any cache and actually hit your origin.

Read Post

Oh Dear

Read more about Cache-busting magic variables for uptime checks

Get Lightrun AI Skills: Expert Workflows for AI Agents

May 18, 2026 By Gidi Freud In Lightrun

Today we’re launching Lightrun AI Skills, structured, repeatable investigation workflows built for AI coding agents. With Lightrun MCP, agents like Claude Code, Codex, and Cursor can already instrument live production services and reason over live runtime evidence without a redeployment. But AI agents remain non-deterministic by design, using the same tool differently every session.

Read Post

Lightrun

Read more about Get Lightrun AI Skills: Expert Workflows for AI Agents

SOA Expire Value Out of Recommended Range: What It Means and How to Fix It

May 18, 2026 By DNS Spy In DNS Spy

The Start of Authority record is the first record in any DNS zone file. It's the record that says "this zone exists, this is the primary nameserver in charge, and here are the timing rules that govern how this zone behaves." A full SOA record looks like this when you query it: Each of those numbers does something different. The one that triggered your warning is the Expire value, the fourth number. In this example, 1209600 seconds, which is exactly 14 days.

Read Post

DNS Spy

Read more about SOA Expire Value Out of Recommended Range: What It Means and How to Fix It

Reverse DNS Does Not Match SMTP Banner: What It Means and How to Fix It

May 18, 2026 By DNS Spy In DNS Spy

When your mail server connects to a recipient server to deliver email, the very first thing it does after the TCP connection is established is introduce itself. That introduction happens through the EHLO command (or its older predecessor HELO), and it looks like this: That hostname in the EHLO line is your SMTP banner. It is what your server claims to be.

Read Post

DNS Spy

Read more about Reverse DNS Does Not Match SMTP Banner: What It Means and How to Fix It

How Honeycomb Is Embracing the Challenges of End-to-End Observability with Embrace

May 18, 2026 By Howard Yoo In Honeycomb

Customers regularly come to us looking to solve their observability problem by connecting the dots from frontend to backend. It sounds straightforward in theory, but in practice it's one of the hardest problems in modern application monitoring. The frontend monitoring tools they already have in place tend to be proprietary or narrowly scoped to frontend needs, leaving them without the context-rich backend data that makes real triage possible.

Read Post

Honeycomb

Read more about How Honeycomb Is Embracing the Challenges of End-to-End Observability with Embrace

Cribl Notebook templates in Cribl Search

May 18, 2026 By Cribl In Cribl

Investigations are time-sensitive, and analysts shouldn’t waste time recreating the same workflows or rewriting familiar queries. Whether troubleshooting infrastructure, investigating suspicious IPs, or analyzing host activity, teams often rely on duplicating old processes and copying query snippets — a slow, inconsistent approach that’s hard to scale.

View Video

Cribl

Read more about Cribl Notebook templates in Cribl Search

Server Monitoring: The Complete Guide to Metrics, Tools, and Best Practices

May 18, 2026 By Motadata Team In Motadata

If you run IT operations, you already know servers carry most of what your business depends on: When a server slows down or goes offline, the impact spreads fast, and the team feels it before the dashboard does. That's the core problem server monitoring is built to solve. It watches the health and performance of your servers continuously, so issues get caught early instead of becoming outages. The cost of getting these wrong keeps climbing.

Read Post

Motadata

Read more about Server Monitoring: The Complete Guide to Metrics, Tools, and Best Practices

Autonomous IT Needs Internet Performance Monitoring: Why Internal Visibility Alone Is No Longer Enough

May 18, 2026 By LogicMonitor In LogicMonitor

Internal visibility isn’t enough for modern incident response. Your app team has three dashboards open and everything looks fine. CPU is healthy, memory is stable, the application servers are responding normally. But users are still complaining. The checkout page is slow. Logins are timing out. Support tickets are piling up. And your monitoring tools have nothing useful to say about why.

Read Post

LogicMonitor

Read more about Autonomous IT Needs Internet Performance Monitoring: Why Internal Visibility Alone Is No Longer Enough

Slack outage on May 14, 2026

May 15, 2026 By Colin Bartlett In StatusGator

On May 14, 2026, users across multiple regions began reporting problems with Slack, including messaging failures, sign-in issues, and problems loading attachments and images. While the outage did not affect every user, reports quickly showed the issue was widespread enough to disrupt business communication for organizations around the world. StatusGator identified the incident through customer outage reports and triggered an Early Warning Signals alert at 14:21 UTC.

Read Post

StatusGator

Read more about Slack outage on May 14, 2026

How to embed Grafana dashboards into web applications

May 15, 2026 By David Allen In Grafana

Note: This post originally published in October 2023 and was updated in May 2026 to include new methods and options for embedding Grafana dashboards. Grafana dashboards are powerful and flexible tools for observing applications and infrastructure, so it’s no surprise we get a lot of questions from the community about how to embed them into their web applications.

Read Post

Grafana

Read more about How to embed Grafana dashboards into web applications

Web API: your complete guide for custom integrations

May 15, 2026 By Blog In Squared Up

Data is almost always scattered across too many tools. Usually, if you want to see it all in one place, you're stuck building messy pipelines or paying for a warehouse you don't really want. SquaredUp is a window into all those tools. It lets you see what’s happening across your entire stack in real time without moving any of the data. Think of it as a universal translator that lets your tools talk to each other so you can stop the manual digging and just see the big picture.

Read Post

Squared Up

Read more about Web API: your complete guide for custom integrations

OpenTelemetry and Netdata, Today

May 15, 2026 By Netdata Team In netdata

OpenTelemetry has become the default way to instrument applications and ship telemetry. The hard part has never been the data model. It’s been picking a backend that handles OTLP without quietly turning into a per-metric bill or a black box that swallows your data. Netdata is a native OTLP backend.

Read Post

netdata

Read more about OpenTelemetry and Netdata, Today

Product Update - May 2026

May 15, 2026 By Hrishikesh Barua In IncidentHub

IncidentHub's latest product updates include a new Business plan with Teams support, early outage detection v1, and more integrations with ticketing systems. The public status now includes a disable feature. As before, many features are driven by feedback, and I am grateful to all our customers who have shared their feedback with us.

Read Post

IncidentHub

Read more about Product Update - May 2026

Best Network Monitoring Tools for 2026 (Top 12 Compared)

May 15, 2026 By Kentik In Kentik

In 2026, the best network monitoring tools are Kentik, Datadog, SolarWinds NPM, LogicMonitor, Cisco ThousandEyes, Dynatrace, Auvik, Paessler PRTG, ManageEngine OpManager, Zabbix, OpenNMS, and WhatsUp Gold — spanning four overlapping categories: network intelligence platforms, full-stack observability, digital experience monitoring (DEM), and traditional network performance monitoring (NPM), including open-source tools.

View Video

Kentik

Read more about Best Network Monitoring Tools for 2026 (Top 12 Compared)

Action trails: The missing link between AI and human trust

May 15, 2026 By David Girvin In Sumo Logic

When people talk about trusting AI, they usually focus on the interface. It summarizes and uses confident language with a level of clarity that feels reliable. But that’s all window dressing. None of it builds trust. Trust doesn’t come from what the AI says. A verifiable record of what the AI did makes it trustworthy.

Read Post

Sumo Logic

Read more about Action trails: The missing link between AI and human trust

Proactive vs Reactive Monitoring: What are the Differences?

May 15, 2026 By Jagdish Sajnani In Motadata

A single hour of unplanned downtime can cost a mid-sized enterprise more than $300,000, according to ITIC report. Most of that cost comes from one place: teams find out about the problem after users do. That is the core limitation of reactive monitoring. It tells you something has failed, but doesn't tell you something is about to fail. This guide is for IT operations leads, platform and SRE engineers, and IT directors deciding how to evolve their monitoring practice.

Read Post

Motadata

Read more about Proactive vs Reactive Monitoring: What are the Differences?

Building Real-Time Telemetry Pipelines for IRIG 106 compliance

May 15, 2026 By Allyson Boate In InfluxData

Every second of a flight test produces a torrent of telemetry from engines, sensors, and control systems. Aerospace teams have captured this data for decades to verify performance and maintain safety, yet analysis often happens long after the mission ends. Engineers wait for downloads, conversions, and compliance checks before they can interpret results. That delay turns telemetry into a historical record instead of a feedback loop.

Read Post

InfluxData

Read more about Building Real-Time Telemetry Pipelines for IRIG 106 compliance

Transforming Managed File Transfer into a Strategic Business Asset

May 15, 2026 By meshIQ In meshIQ

meshIQ MFT Flow Intelligence converts fragmented file transfers into observable transaction ecosystems, ensuring secure, timely delivery across hybrid environments while reducing operational risk and enhancing regulatory compliance for modern enterprise operations.

Read Post

meshIQ

Read more about Transforming Managed File Transfer into a Strategic Business Asset

When your agents hallucinate at 2 am, it is not a model problem

May 15, 2026 By Mezmo In Mezmo

The first time an AI assistant suggests "restart the service" during a live incident and nobody on the bridge can tell whether that suggestion came from a current runbook, a stale wiki page, or thin air, you stop caring about model benchmarks. You start caring about what the agent actually knew, where that knowledge came from, and whether you can trust the chain of reasoning behind it.

Read Post

Mezmo

Read more about When your agents hallucinate at 2 am, it is not a model problem

How to Identify LAN Issues (Local Area Network Problems)

May 15, 2026 By Andrii Kernitskyi In Obkio

Here is a reality that every network admin eventually runs into: users report slow apps, dropped calls, and broken connections, and the first instinct is to blame the ISP or the cloud provider. The ticket gets escalated, the ISP pushes back, and hours later, you find out the problem was sitting inside your own building the whole time. A saturated switch port. A misconfigured VLAN. A flaky patch cable in the server room.

Read Post

Obkio

Read more about How to Identify LAN Issues (Local Area Network Problems)

ITSM Maturity Playbook Live, Episode 1: Incident Management Masterclass

May 15, 2026 By solarwindsinc In SolarWinds

Join this 5-part series designed to help IT teams move from reactive, fragmented processes to a more structured, connected way of working. Each session focuses on a core area, from incident resolution and CMDB visibility to employee experience, service catalog design, and change governance, giving you practical frameworks you can apply right away. You’ll walk away with: Faster, more consistent incident resolution.

View Video

SolarWinds

Read more about ITSM Maturity Playbook Live, Episode 1: Incident Management Masterclass

Why Network Operations Needs Data-Centric AI

May 15, 2026 By Dallon Robinette In Selector

The discussion around AI in infrastructure and operations has become increasingly model-centric. Teams want to know what model a platform uses, how current it is, how much reasoning capacity it has, and how quickly it can be updated as the model landscape shifts. Those are reasonable questions, but they tend to arrive too early. In production operations, the more consequential question is what happens to the data before any model is asked to interpret it.

Read Post

Selector

Read more about Why Network Operations Needs Data-Centric AI

Add your logo to StatusGator emails

May 14, 2026 By Valeria Kurolapova In StatusGator

You can now customize the logo shown in your email header on all paid plans. We’ve also updated how email customization works in StatusGator to make branding options clearer and easier to manage.

Read Post

StatusGator

Read more about Add your logo to StatusGator emails

7 Proven Steps to Maintain Operational Continuity During S/4HANA Migration

May 14, 2026 By Avantra Team In Avantra

Migrating to SAP S/4HANA is one of the most consequential system changes your organization will undertake. The technical complexity alone is significant. But the real risk is operational: maintaining uninterrupted service delivery while transforming the core systems your business depends on. Failure to manage this well causes outages, data inconsistencies, user disruption, and cost overruns. None of those are acceptable outcomes. The good news is these risks are manageable.

Read Post

Avantra

Read more about 7 Proven Steps to Maintain Operational Continuity During S/4HANA Migration

Getting started with Checkly dashboards

May 14, 2026 By Blog In Squared Up

Checkly is a modern reliability platform that combines testing, monitoring and observability in one place. Its integration with Playwright and languages such as TypeScript means that developers can write tests using tools they are familiar with and then run them in Checkly. Its Monitoring as Code philosophy also means that Checkly tests can be incorporated into CI/CD pipelines.

Read Post

Squared Up

Read more about Getting started with Checkly dashboards

From Phishing to SQL Injection: How Breaches Actually Happen

May 14, 2026 By Bindplane In ObservIQ

Critical vulnerabilities are critical because they're easy to exploit — but most breaches don't even need them. Tony explains why phishing remains the dominant attack vector, why strong instrumentation matters for forensics (tracing an API call through a database to see exactly what was leaked), and how observability data becomes security data when something goes wrong. The system is harder to breach than the human. And that's the whole game.

View Video

ObservIQ

Read more about From Phishing to SQL Injection: How Breaches Actually Happen

One Collector, Two Teams: How Bindplane Bridges Security and Observability with OpenTelemetry

May 14, 2026 By Bindplane In ObservIQ

Observability engineers will spend weeks tuning instrumentation. Security engineers? They want a collector installed and logs flowing — yesterday. And that's actually the magic of OpenTelemetry + Bindplane: from day one you're routing firewall logs, endpoint data, server logs straight into your SIEM with zero instrumentation lift. One toolchain. Two teams. No silos. Filmed at Google Cloud Next '26 — Las Vegas bindplane.com#OpenTelemetry.

View Video

ObservIQ

Read more about One Collector, Two Teams: How Bindplane Bridges Security and Observability with OpenTelemetry

Getting started with Zabbix dashboards

May 14, 2026 By Blog In Squared Up

Zabbix is one of the most popular open source monitoring platforms in the world, with first class capabilities for monitoring networks and infrastructure at scale. It also boasts a rich ecosystem of adapters and plugins for collecting telemetry from thousands of device types.

Read Post

Squared Up

Read more about Getting started with Zabbix dashboards

Best APM for Small Development Teams in 2026

May 14, 2026 By Sarah Morgan In Scout

Last updated: May 2026 If your team is 2 to 20 developers and you do not have dedicated DevOps, SRE, or platform engineering, most APM tools were not built for you. They were built for the team that has you: a team with specialists who can tune dashboards, configure alerting pipelines, manage data retention policies, and explain the monitoring system to everyone else. You do not have that team. You have developers who also handle deploys, on-call, and debugging production issues between writing features.

Read Post

Scout

Read more about Best APM for Small Development Teams in 2026

Honeycomb Innovation Week: Announcing Our Partnership With Embrace

May 14, 2026 By Honeycomb In Honeycomb

Honeycomb and Embrace are extending the rigorous, data-driven practice that Honeycomb pioneered for foundational to mobile and web, giving, site reliability, and platform teams a complete, correlated picture of system health. The strategic partnership makes understanding performance and reliability for every user and every screen part of the observability practice, bringing new depth and standardization to how teams measure end user impact.

View Video

Honeycomb

Read more about Honeycomb Innovation Week: Announcing Our Partnership With Embrace

New ways to agentically build and edit dashboards

May 14, 2026 By Ben Coe In Sentry

The traditional dashboard workflow, teams slowly handcrafting visualizations to track critical KPIs, is dying in a world of AI agents. A few years ago, in a pre-agentic-everything world, we tried to make it easier for developers to monitor critical experiences. We introduced Insights pages, which were pre-configured dashboards any Sentry user could adopt instantly that surfaced common health signals, like Web and Mobile Vitals.

Read Post

Sentry

Read more about New ways to agentically build and edit dashboards

Simplify micro-frontend observability with Datadog RUM

May 14, 2026 By Stella Ma In Datadog

Micro-frontend architectures, where independent teams build and deploy separate parts of a frontend application, introduce an observability challenge: Telemetry data is fragmented across services, making it difficult to determine which micro-frontend caused a performance degradation or error spike.

Read Post

Datadog

Read more about Simplify micro-frontend observability with Datadog RUM

Attribute AI costs across providers with Datadog Cloud Cost Management

May 14, 2026 By Katherine Broner In Datadog

AI adoption is accelerating across organizations, and spending often follows a similar pattern: rapid growth, multiple providers, and limited visibility into where costs originate. Each provider exposes billing data differently, with distinct schemas, dimensions, and interfaces. FinOps and engineering teams often spend significant time consolidating fragmented data, only to end up with partial attribution and limited context about who or what generated the AI spending.

Read Post

Datadog

Read more about Attribute AI costs across providers with Datadog Cloud Cost Management

Improvements to our status pages as we tackle a DDoS

May 14, 2026 By Mattias Geniar In Oh Dear

The uptime & availability of our status pages hasn't been great these past few days. The root cause is a persistent and pretty aggressive DDoS attack targeted at our own status page, status.ohdear.app. As a result, the overload on our systems also affected all other status pages we host for clients. We're not yet at Github or Claude levels of uptime sadness, but this isn't acceptable to us. In this post, I'll share what's happening and what steps we've already taken.

Read Post

Oh Dear

Read more about Improvements to our status pages as we tackle a DDoS

You Are Building With AI. Who Is Watching What It Ships?

May 14, 2026 By Sarah Morgan In Scout

AI coding assistants have made it possible for a single developer to build and ship a production application in a weekend. Claude Code, Cursor, GitHub Copilot, and similar tools can scaffold a Rails app, write the models, generate the views, wire up the API, and push to production before Monday. This is genuinely exciting. It is also genuinely dangerous if you do not have monitoring in place before you ship.

Read Post

Scout

Read more about You Are Building With AI. Who Is Watching What It Ships?

Honeycomb Innovation Week: Day 3 Replay

May 14, 2026 By Honeycomb In Honeycomb

Watch a full replay of all sessions on Day 3 of Honeycomb's Innovation Week.

View Video

Honeycomb

Read more about Honeycomb Innovation Week: Day 3 Replay

Honeycomb Innovation Week: Honeycomb and AWS

May 14, 2026 By Honeycomb In Honeycomb

Honeycomb has shipped a production integration with Amazon Bedrock AgentCore, surfacing agent telemetry directly in Agent Timeline, Honeycomb's trace view for behavior. It's available now and built on.

View Video

Honeycomb

Read more about Honeycomb Innovation Week: Honeycomb and AWS

Honeycomb Achieves the AWS Financial Services Competency

May 14, 2026 By Matthew Scott In Honeycomb

Honeycomb is proud to share that we have achieved the Amazon Web Services (AWS) Financial Services Competency. This recognition validates our technical expertise and proven customer success in assisting financial services organizations with building, running, and understanding their production systems on AWS. Securing this competency is a direct response to our customers’ feedback in this space: observability in regulated, high-stakes environments requires more than dashboards and alerts.

Read Post

Honeycomb

Read more about Honeycomb Achieves the AWS Financial Services Competency

3 things you need to know about headless observability

May 14, 2026 By Coralogix In Coralogix

If you're building agents trying to figure out the best way to actually make them successful in production, you're going to want to know about headless observability. Headless observability means an agent can access information about the health of your system through a CLI instead of clicking around dashboards. It's the data layer that going to unlock serious autonomy and allow you to scale with agentic workloads.

View Video

Coralogix

Read more about 3 things you need to know about headless observability

Cloud Outage History: Six Years of Recurring Failures

May 13, 2026 By Nuno Tomas In isDown

Cloud infrastructure has never been more reliable in theory. In practice, the last six years of cloud outage history have delivered some of the most disruptive incidents on record. Not because cloud providers got worse, but because the systems built on top of them got larger, more interconnected, and more brittle in ways that don't show up until everything breaks at once.

Read Post

isDown

Read more about Cloud Outage History: Six Years of Recurring Failures

Get deeper insights with historical outage reports

May 13, 2026 By Valeria Kurolapova In StatusGator

StatusGator now includes a new Outage Reports tab on the service monitor detail page, giving users more visibility into recent service disruptions directly where they monitor services. Users can now quickly review recent outage activity for a specific monitored service without leaving the detail page.

Read Post

StatusGator

Read more about Get deeper insights with historical outage reports

How to Monitor Applications and End User Experiences

May 13, 2026 By ScienceLogic In ScienceLogic

In this video, see how Skylar One helps you understand the impact of changes on application performance and the end user experience. By tracking service level metrics across an e commerce environment, you can quickly identify when performance degrades and how it affects user behavior. Explore how Skylar One enables: With Skylar One, teams can quickly connect performance changes to real user impact, helping ensure a consistent and reliable digital experience.

View Video

ScienceLogic

Read more about How to Monitor Applications and End User Experiences

Total Economic Impact study finds LogicMonitor Edwin AI delivered a 313% ROI and payback in 6 months or less

May 13, 2026 By Margo Poda In LogicMonitor

Forrester Consulting’s Total Economic Impact study found that a composite organization based on interviewed customers achieved 313% ROI and payback in less than 6 months with LogicMonitor Edwin AI. AI for IT operations has a credibility problem. The market is crowded with claims about speed, automation, and intelligence, while buyers are left doing the harder work of separating measurable impact from vendor language.

Read Post

LogicMonitor

Read more about Total Economic Impact study finds LogicMonitor Edwin AI delivered a 313% ROI and payback in 6 months or less

Poller Checker | Favorite Forgotten Feature

May 13, 2026 By solarwindsinc In SolarWinds

Learn how Poller Checker helps identify polling issues, improve monitoring accuracy, and streamline SolarWinds troubleshooting. Discover key features, common use cases, and how to keep your environment running smoothly.

View Video

SolarWinds

Read more about Poller Checker | Favorite Forgotten Feature

Seer Agent: Analyze & Fix Sentry Issues from Slack

May 13, 2026 By Sentry In Sentry

Sentry already alerts your team in Slack, but with Seer, you can investigate and fix issues without ever leaving. It's like having the Sentry MCP server living right in your Slack channel. See how the Seer Agent (now in open beta!) and actionable alert messages turn your Slack channel into a debugging workflow.

View Video

Sentry

Read more about Seer Agent: Analyze & Fix Sentry Issues from Slack

True Visibility: How Liang Chen is Rethinking Network Monitoring

May 13, 2026 By Selector In Selector

What happens when deep networking expertise meets low-level programming and a passion for invention? In this episode of Next-Gen Network Heroes, host Bob Slevin sits down with Liang Chen, Senior Network Architect at Texas Children's Hospital and a true innovator in network performance and visibility. With more than 25 years of experience in networking, plus advanced expertise in programming languages like C and Assembly, Liang has built his own next-generation traffic analysis platform from the ground up—designed to provide real-time, packet-level visibility at massive scale.

View Video

Selector

Read more about True Visibility: How Liang Chen is Rethinking Network Monitoring

Enhancing Your Search Skills with Liang Chen

May 13, 2026 By Selector In Selector

What does it take to reinvent network visibility from the ground up? In this episode of Next-Gen Network Heroes, Bob sits down with Liang Chen, Senior Network Architect at Texas Children’s Hospital and creator of a next-generation network traffic analyzer built for real-time, packet-level visibility. Liang shares how he built a platform capable of analyzing traffic at up to 200Gbps with zero packet loss—unlocking deeper network forensics and faster troubleshooting in mission-critical environments.

View Video

Selector

Read more about Enhancing Your Search Skills with Liang Chen

Cribl Search demo: Getting data in

May 13, 2026 By Cribl In Cribl

Cribl's Product Manager David Cavuto walks through how quick and easy it is to get data ingested into Cribl Search's lakehouse engine.

View Video

Cribl

Read more about Cribl Search demo: Getting data in

Tips and Tricks for Handling Secrets in Icinga 2

May 13, 2026 By Julian Brost In Icinga

Today, we are going to look at a few things related to handling secrets. While Icinga 2 has no dedicated mechanisms for secret handling, there are a few tricks you can do with standard features. This is not meant as a step-by-step tutorial, but rather as an inspiration where you can adopt the ideas that make sense in your setup.

Read Post

Icinga

Read more about Tips and Tricks for Handling Secrets in Icinga 2

Observability for the Agent Era: Day 1 | Keynotes

May 13, 2026 By Honeycomb In Honeycomb

Honeycomb's Innovation Week: Observability for the Agent Era (May 12-14) For Day 1 of Innovation Week, Honeycomb co-founders Christine Yen and Charity Majors will share what it actually takes to understand and debug systems in the agent era, and what the best engineering teams are doing differently. A 3-Day Virtual Event for Teams Building the Future May 12: Get insights on how the best engineering teams are tackling the challenges of the agentic era.

View Video

Honeycomb

Read more about Observability for the Agent Era: Day 1 | Keynotes

Redgate Monitor | AWS Database Migration Readiness

May 13, 2026 By Redgate Software In Redgate

n this demo, we explore the AWS Database Migration and Modernization (D2M) framework, from Align and Assess, trough to Optimize, and show how Redgate Monitor helps you to establish performance baselines, right-size target environments and continuously optimize RDS and Aurora spend for full cloud cost visibility. Learn how Redgate Monitor can give you a single view of your entire AWS and on-premises, multi-database environment.

View Video

Redgate

Read more about Redgate Monitor | AWS Database Migration Readiness

Why Siloed Monitoring Increases Your MTTR and How to Resolve It

May 13, 2026 By Jagdish Sajnani In Motadata

Are you spending more time figuring out whose problem it is than actually fixing it? If that feels familiar, you are not alone. Many IT teams start their day with multiple dashboards and tools, yet still struggle to understand what is wrong when something breaks. Everything may look fine in one view, and fine in another, but the customer impact tells a different story. Incidents end up taking longer to resolve than they should. This is not about effort or capability.

Read Post

Motadata

Read more about Why Siloed Monitoring Increases Your MTTR and How to Resolve It

May 2026 at Bindplane: Global Pipeline Intelligence and the CLI AI Skill

May 13, 2026 By Adnan Rahic In ObservIQ

The Google Cloud Next 2026 recap covers booth demos, OpenTelemetry and SecOps, and talks from Craig Lee and Laura Luttmer. Watch the recap.

Read Post

ObservIQ

Read more about May 2026 at Bindplane: Global Pipeline Intelligence and the CLI AI Skill

Builder in the loop: Henry Andrews on building AURA like production software

May 13, 2026 By Mezmo In Mezmo

An interview series with the people building Mezmo’s open-source agentic harness for production operations. Builder in the loop is a Mezmo interview series focused on the engineers, product leaders, and operators shaping AURA, our open-source, MCP-native agentic harness for production operations. The goal is to get past the polished product layer and talk through the decisions that matter when AI starts interacting with real systems. What should agents be allowed to do?

Read Post

Mezmo

Read more about Builder in the loop: Henry Andrews on building AURA like production software

ActiveMQ Message Persistence: KahaDB, Artemis Journal & JDBC

May 13, 2026 By meshIQ In meshIQ

Every persistent message in ActiveMQ must survive a broker restart. That guarantee is the contract behind DeliveryMode.PERSISTENT is what separates a messaging system from a memory buffer. It is also what makes message persistence configuration the most consequential decision in ActiveMQ architecture.

Read Post

meshIQ

Read more about ActiveMQ Message Persistence: KahaDB, Artemis Journal & JDBC

Turn StatusCake into a verified alerting and escalation flow with Hermes

May 13, 2026 By Daniel In StatusCake

Most monitoring setups have the same weak spot. Detection is easy. Decision-making is not. StatusCake is good at telling you that something might be wrong. What happens next is where things sometimes get messy. One alert goes straight to a chat room. Another wakes the wrong person. A third ends up getting missed because the site had a brief wobble and recovered before anyone looked. Hermes is useful in that gap.

Read Post

StatusCake

Read more about Turn StatusCake into a verified alerting and escalation flow with Hermes

Here's How to Access Grafana Assistant from Self-Managed Environments

May 13, 2026 By Grafana In Grafana

You just need a free Grafana Cloud account. It's generous and free forever. Thanks for watching! Was this video helpful?

View Video

Grafana

Read more about Here's How to Access Grafana Assistant from Self-Managed Environments

What Is Log Monitoring? Pipeline, Pitfalls, and Practices for 2026

May 13, 2026 By Coralogix Team In Coralogix

Catching a cascading failure in the first 90 seconds is one of the better feelings in production engineering, and it almost always comes back to your log monitoring pipeline doing its job upstream of the alert. The teams that land there consistently treat log monitoring as a real-time detection layer in its own right, and the choices you make in that pipeline shape how every incident plays out for years.

Read Post

Coralogix

Read more about What Is Log Monitoring? Pipeline, Pitfalls, and Practices for 2026

From Monitoring to Observability: How DEX Integrations Strengthen IT Visibility and User Productivity

May 13, 2026 By Teneo In Teneo

When I started working in IT in the last 90’s, IT performance was always measured by the health of infrastructure: CPU utilization, network latency, server uptime, and for many organizations, little has changed in the last 30+ years. We became very good at keeping systems alive, yet users still struggled to get work done. That disconnect is exactly why Digital Employee Experience (DEX) has emerged as a critical discipline. But DEX on its own is not the end goal.

Read Post

Teneo

Read more about From Monitoring to Observability: How DEX Integrations Strengthen IT Visibility and User Productivity

What Is APM? A Guide to Application Performance Monitoring

May 13, 2026 By Coralogix Team In Coralogix

A well-instrumented service tells your on-call engineer which deploy broke checkout, which span ate the latency budget, and which line to revert before the support queue fills up. Getting there depends on how cleanly your application performance monitoring layer turns telemetry into answers. The sections ahead walk through how APM works, the metrics and components worth tracking, the cloud-native challenges at scale, and how to evaluate APM tooling against your real workload.

Read Post

Coralogix

Read more about What Is APM? A Guide to Application Performance Monitoring

What Is an Incident Commander? Role, Skills, and Best Practices

May 13, 2026 By Coralogix Team In Coralogix

The fastest incident response teams treat coordination as a craft. Someone owns the call, drives the decisions, and keeps everyone moving in the same direction while the team puts the system back together. That person is the incident commander (IC), and getting the role right is what separates your 15-minute fix from a four-hour war room where nobody’s sure who’s making the call.

Read Post

Coralogix

Read more about What Is an Incident Commander? Role, Skills, and Best Practices

Log-based metrics, now in AppSignal Labs

May 13, 2026 By Serena Chou In AppSignal

A lot of what's useful in a high-volume log source is a count, a rate, or a measurement — 5xx responses per minute, p95 request duration, job retry rate. You don't need every line to track those. You need the metric. Log-based metrics is now in beta as part of AppSignal Labs.

Read Post

AppSignal

Read more about Log-based metrics, now in AppSignal Labs

Honeycomb Innovation Week: Debugging Agentic Workflows with Ken Rimple

May 13, 2026 By Honeycomb In Honeycomb

Canvas skills are how your team's runbooks and tribal knowledge become an active part of the investigation instead of a document someone has to remember to open. Pre-built skills cover the most common investigation patterns out of the box. Custom skills let you encode the specific context, thresholds, and decision logic your team has accumulated, so every auto-investigation starts with your best thinking already applied.

View Video

Honeycomb

Read more about Honeycomb Innovation Week: Debugging Agentic Workflows with Ken Rimple

Observability for the Agent Era: Day 2 | Launches

May 13, 2026 By Honeycomb In Honeycomb

Honeycomb's Innovation Week: Observability for the Agent Era (May 12-14) For Day 2 of Innovation Week, Honeycomb's product and engineering teams will take you inside the new capabilities purpose-built for the agent era. Expect live demos, real scenarios, and a hands-on look at what it means to own observability for the Agentic era, with AI in Honeycomb to observe AI in production. A 3-Day Virtual Event for Teams Building the Future May 12: Get insights on how the best engineering teams are tackling the challenges of the agentic era.

View Video

Honeycomb

Read more about Observability for the Agent Era: Day 2 | Launches

Innovation Week Day 2: Observability for AI, and Observability With AI

May 13, 2026 By Shabih Syed In Honeycomb

AI is reshaping the SDLC in two directions at once. AI-generated code is shipping faster and with less human supervision than ever before, while agents and LLMs are running directly in production, where they behave very differently from traditional software: non-deterministic, with a wider blast radius than any single function or component, with no stack trace to catch when something goes wrong.

Read Post

Honeycomb

Read more about Innovation Week Day 2: Observability for AI, and Observability With AI

Why Some Roles Care About Open Source & Why Others Don't: 4th Annual Observability Survey | Grafana

May 13, 2026 By Grafana In Grafana

Note: We're happy to share that since the recording of this video, OpenTelemetry *has* graduated from the CNCF! SREs, developers, and CTOs say open source is essential to observability. Engineering managers and directors? Not so much. Grafana's 4th annual observability survey — 1,363 responses — reveals a split inside the same orgs that's worth a conversation.

View Video

Grafana

Read more about Why Some Roles Care About Open Source & Why Others Don't: 4th Annual Observability Survey | Grafana

Honeycomb Innovation Week: Observability With AI With Kale and Taylor

May 13, 2026 By Honeycomb In Honeycomb

Watch this video to see the re-imagined Canvas in action, where auto-investigation has already ranked your hypotheses before you open the tab, multiplayer agents build on each other's work in real time, and a custom skill encoding your team's own runbook can reprioritize the entire incident before you've had your morning coffee.

View Video

Honeycomb

Read more about Honeycomb Innovation Week: Observability With AI With Kale and Taylor

Honeycomb Innovation Week: Observability for AI with Dan and Shashank

May 13, 2026 By Honeycomb In Honeycomb

Watch this video to see Agent Timeline in action: one conversation ID, one view, every agent invocation, LLM call, tool call, and downstream trace, so you stop stitching tabs together and start finding the failure in seconds.

View Video

Honeycomb

Read more about Honeycomb Innovation Week: Observability for AI with Dan and Shashank

Honeycomb Innovation Week: Day 2 Replay

May 13, 2026 By Honeycomb In Honeycomb

Watch a full replay of all sessions and demos on Day 2 of Honeycomb's Innovation Week.

View Video

Honeycomb

Read more about Honeycomb Innovation Week: Day 2 Replay

Sponsored Post

The SDLC: phases, popular models, benefits & more

May 12, 2026 By Dave Swersky In Raygun

The Software Development Life Cycle (SDLC) describes the process we follow to deliver software to customers. It captures each step of creating software, from ideation to delivery and eventually to maintenance. In this post, we've broken down everything you need to understand the SDLC.

Read Post

Raygun

Read more about The SDLC: phases, popular models, benefits & more

Stop Guessing, Start Fixing: AI Root Cause Analysis

May 12, 2026 By Brenton O'Callaghan In Avantra

Automating root cause analysis is often regarded as the holy grail of IT operations. A solution capable of automatically identifying issues, resolutions and even prevention. Performed correctly, automated root cause analysis accelerates MTTI (Mean Time to Identify) and MTTR (Mean Time to Resolution). But for many platforms, this goal remains elusive: complexity, differences between deployments and different architectures make automating root cause challenging.

Read Post

Avantra

Read more about Stop Guessing, Start Fixing: AI Root Cause Analysis

Contributing Distributed Partition Ownership to the Azure Event Hub Receiver

May 12, 2026 By Dylan Strohschein In ObservIQ

If you're running OpenTelemetry collectors against Azure Event Hubs, distributed partition ownership and checkpointing just got significantly better. Your fleet now self-organizes. Failover is automatic. Restarts don't lose data. Here's how we got here.

Read Post

ObservIQ

Read more about Contributing Distributed Partition Ownership to the Azure Event Hub Receiver

Monitor CAA Records with DNS Check

May 12, 2026 By Matt Rideout In DNS Check

DNS Check now supports monitoring CAA records. A CAA record (Certification Authority Authorization record) tells public certificate authorities (CAs) which of them, if any, are allowed to issue TLS/SSL certificates for your domain. Public CAs have been required to honor these records since 2017, so CAA records act as an access control list for certificate issuance.

Read Post

DNS Check

Read more about Monitor CAA Records with DNS Check

AI-assisted testing, extensions updates, and more: k6 2.0 is here

May 12, 2026 By Théo Crevon In Grafana

For years, teams have relied on k6 to take a more proactive approach to performance testing, ensuring they can catch issues early and deliver more reliable user experiences. That approach has helped make k6 one of the most widely used performance testing tools in the open source community today, with more than 30k stars on GitHub. Last year, we introduced k6 1.0, a major release that brought TypeScript support, native extensions, revamped test insights, and production-grade stability guarantees.

Read Post

Grafana

Read more about AI-assisted testing, extensions updates, and more: k6 2.0 is here

Innovation Week Day 1: The SDLC Is Collapsing, and Observability Has Never Mattered More

May 12, 2026 By Shabih Syed In Honeycomb

The software development lifecycle is collapsing. The multi-stage pipeline that defined how software got built and shipped for decades is compressing into rapid loops of intent and validation, with agents now part of the teams building and running it. Day 1 of Innovation Week was about what that shift means for how software gets validated, where observability fits, and the problems that have always been hard but are now genuinely urgent.

Read Post

Honeycomb

Read more about Innovation Week Day 1: The SDLC Is Collapsing, and Observability Has Never Mattered More

Dashboard Playlists: Cycle Through Dashboards in TV Mode

May 12, 2026 By Shyam Sreevalsan In netdata

When we shipped TV mode, we heard almost immediately: “Great, but I have five dashboards and one screen.” A single dashboard on a wall display covers one view of your infrastructure. If you want to rotate between your network overview, database health, application metrics, and infrastructure summary, someone has to walk over and click, or you’re buying more screens. Dashboard playlists solve this.

Read Post

netdata

Read more about Dashboard Playlists: Cycle Through Dashboards in TV Mode

What is the Mean Time to Resolution (MTTR)? Why It Matters and How to Resolve

May 12, 2026 By Jagdish Sajnani In Motadata

How quickly can you restore service when an incident hits your system? Most IT teams are not slowed down by detecting incidents. The challenge starts after something breaks, when the goal is to bring services back online as quickly as possible. Modern systems are highly distributed. Alerts arrive from multiple tools, dependencies are complex, and it is often difficult to immediately understand what actually failed.

Read Post

Motadata

Read more about What is the Mean Time to Resolution (MTTR)? Why It Matters and How to Resolve

What Leading Engineering Teams Teach Us About Operational Truth

May 12, 2026 By ScienceLogic In ScienceLogic

Modern operational environments are intricate ecosystems shaped by distributed architectures, accelerating change cycles, and a constant influx of telemetry. The complexity itself is not the issue. The issue is how teams construct understanding inside that complexity. After years of expansion across cloud, edge, third-party services, and internal modernization efforts, many organizations now have abundant data but limited confidence in the meanings behind it.

Read Post

ScienceLogic

Read more about What Leading Engineering Teams Teach Us About Operational Truth

Getting Started with XcodeBuildMCP: Let AI Agents Debug Your iOS Apps

May 12, 2026 By Sentry In Sentry

XcodeBuildMCP gives AI agents the ability to build, test, and debug native iOS and macOS apps. In this hands-on workshop, we show you how to use the open source MCP server to unlock the full developer loop — build, run, debug, interact, and verify — without leaving your preferred AI coding environment.

View Video

Sentry

Read more about Getting Started with XcodeBuildMCP: Let AI Agents Debug Your iOS Apps

Your AI Strategy Has a Blind Spot: The Network

May 12, 2026 By Justin Ryburn In Kentik

Enterprises are pouring billions into GPUs and AI compute, but most are overlooking the infrastructure that connects it all. Justin Ryburn, field CTO at Kentik, makes the case that the network is the most underestimated variable in whether AI initiatives succeed or fail.

Read Post

Kentik

Read more about Your AI Strategy Has a Blind Spot: The Network

OpenTelemetry Fleet Management: Scalable Control

May 12, 2026 By Coralogix In Coralogix

OpenTelemetry has turned observability pipelines into production infrastructure, but managing them at scale often creates a massive operational burden. In this demo, we show how Coralogix Fleet Management acts as the central control plane for your OTel ecosystem, providing the governance and orchestration required for modern DevOps. Stop the "manual marathon" of PRs and Helm upgrades. Move toward a safer, more predictable operating model where telemetry is consistent, audited, and scalable.

View Video

Coralogix

Read more about OpenTelemetry Fleet Management: Scalable Control

Turn Noisy Logs Into Structured Data with Uptrace Grouping Rules

May 12, 2026 By Uptrace In Uptrace

Here are 3 YouTube title options plus a description optimized for technical/dev audiences: Same log pattern. Hundreds of useless groups. In this video, we show how to use Uptrace Grouping Rules to automatically turn noisy logs into structured, searchable data — without changing application code. You'll learn how to: Examples covered: Perfect for:#OpenTelemetry users, backend engineers, SREs, and anyone dealing with noisy logs.

View Video

Uptrace

Read more about Turn Noisy Logs Into Structured Data with Uptrace Grouping Rules

AI Word of the Day: Confabulation - SolarWinds TechPod 109

May 12, 2026 By solarwindsinc In SolarWinds

Sean Sebring and Chrystal Taylor chat with Josh Stageberg, SolarWinds GVP of Product Management, about the future of Agentic AI.

View Video

SolarWinds

Read more about AI Word of the Day: Confabulation - SolarWinds TechPod 109

Security Integrations in Observability Self-Hosted

May 12, 2026 By solarwindsinc In SolarWinds

Integrating security data with observability data provides a comprehensive view for better threat detection and response. Security observability helps connect the dots between seemingly innocent events that, when correlated, reveal complex attack patterns. SolarWinds security products integrate into observability self-hosted, including Security Event Manager for log data and event correlation, Access Rights Management for identifying potential attack vectors, configuration management for compliance monitoring, and Patch Manager for tracking critical updates.

View Video

SolarWinds

Read more about Security Integrations in Observability Self-Hosted

SWQL Admin Console | Favorite Forgotten Feature

May 12, 2026 By solarwindsinc In SolarWinds

Get a quick walkthrough of the SWQL Admin Console and see how it simplifies querying, managing, and troubleshooting within SolarWinds environments. Review key features, practical use cases, and tips to help you work smarter and faster.

View Video

SolarWinds

Read more about SWQL Admin Console | Favorite Forgotten Feature

Honeycomb Innovation Week - Day 1 Replay

May 12, 2026 By Honeycomb In Honeycomb

Watch a full replay of all keynotes on Day 1 of Honeycomb's Innovation Week.

View Video

Honeycomb

Read more about Honeycomb Innovation Week - Day 1 Replay

Why the Operational Complexity of E-Commerce Reaches a Critical Point in 2025

May 12, 2026 By OpsMatters In OpsMatters

Modern webshops no longer run on a single system. Behind the digital storefront lies an architecture made up of dozens of components: from product information management to caching layers, from search engines to payment providers. For operations teams, this means the classic LAMP stack from 2010 is now a distant memory.

Read Post

OpsMatters

Read more about Why the Operational Complexity of E-Commerce Reaches a Critical Point in 2025

From vibe code to production-ready: observability for Next.js and Supabase apps

May 11, 2026 By Sergiy Dybskiy In Sentry

The way we build software has drastically changed over the past few years. What hasn’t changed is that this software ends up in front of real people: you, me, my mom. And when those users inevitably run into something broken, you as the application’s developer need to be equipped with the right tools, context and understanding of what broke, where it broke, and how to fix it as quickly as possible. Every day we’re inching closer to self-healing software.

Read Post

Sentry

Read more about From vibe code to production-ready: observability for Next.js and Supabase apps

Why Alert Fatigue Solutions Still Miss the Root Cause

May 11, 2026 By Lightrun Team In Lightrun

Alert fatigue solutions have never been better, but on-call engineers are still burning out. Threshold tuning, AI triage, and alert correlation reduce the noise, but every alert that clears filtering lands with the same incomplete telemetry and triggers the same manual investigation cycle. This post explains why the evidence gap survives every fix, and how runtime context changes that.

Read Post

Lightrun

Read more about Why Alert Fatigue Solutions Still Miss the Root Cause

The Best Kubernetes Monitoring Tools of 2026

May 11, 2026 By Libi Michelson In logz.io

Effective Kubernetes monitoring in 2026 is critical due to increased cluster scale and microservices complexity, demanding a shift toward unified observability (logs, metrics, and traces). The core focus is leveraging AI-driven features to automate anomaly detection, correlate diverse data, and significantly reduce Mean Time to Recovery (MTTR).

Read Post

logz.io

Read more about The Best Kubernetes Monitoring Tools of 2026

Best Elixir APM Tools in 2026: A Developer's Guide

May 11, 2026 By Sarah Morgan In Scout

Last updated: May 2026 Elixir applications have performance characteristics that are genuinely different from Ruby or Python. The BEAM virtual machine handles concurrency through lightweight processes, supervision trees restart failed processes automatically, and Phoenix channels can hold tens of thousands of persistent connections on a single node. These are strengths, but they also mean that the performance problems you encounter are different from what most APM tools were built to detect.

Read Post

Scout

Read more about Best Elixir APM Tools in 2026: A Developer's Guide

What is an Enterprise Knowledge Graph? Definition, Benefits, and Use Cases

May 11, 2026 By Jagdish Sajnani In Motadata

Are your AI systems giving answers your teams cannot trust? Most enterprises deploy LLMs expecting reliable outputs, but the results often feel inconsistent or incomplete. The problem is the missing structure behind it. Enterprise data is usually fragmented across multiple systems, teams, and tools. Your AI does not understand how customers, products, policies, and operations connect. Without that context, it fills gaps with assumptions, which leads to unreliable results.

Read Post

Motadata

Read more about What is an Enterprise Knowledge Graph? Definition, Benefits, and Use Cases

Making Semantic Conventions Work for You With OpenTelemetry Weaver

May 11, 2026 By Mike Goldsmith In Honeycomb

Your dataset has hundreds of attributes. Some are self-explanatory: http.response.status_code, server.address. Others are not: meta.refinery.reason, dataset.slug, sli.latency_target_ms. If you don't know what an attribute means, you can't write a good query. And if an AI agent doesn't know what it means, it guesses.

Read Post

Honeycomb

Read more about Making Semantic Conventions Work for You With OpenTelemetry Weaver

Easily connect any AI assistant (Claude, Codex, ...) to your Oh Dear data

May 11, 2026 By Freek Van der Herten In Oh Dear

Oh Dear keeps a watchful eye on your websites: uptime, performance, SSL certificates, broken links, DNS, cron jobs. If something can quietly break, we're already checking it for you. Today we're connecting that data to a new place: your AI assistant. We just shipped an MCP integration. If you use Claude, Cursor, or any other client that speaks the Model Context Protocol, you can now ask questions like "any broken links on my site?" or "when does my certificate expire?" in plain language.

Read Post

Oh Dear

Read more about Easily connect any AI assistant (Claude, Codex, ...) to your Oh Dear data

Migrating Your DX NetOps Integrations from OData 2 to OData 4

May 11, 2026 By Helen Burke In Broadcom

If you integrate DX NetOps with external dashboards, reporting engines, or IT service management tools, you likely rely on our API framework. We are currently migrating this framework from OData 2 to OData 4. This transition requires you to update your existing integrations so they continue to function properly. Let me walk you through exactly what is changing, how to identify your active API queries, and the specific adjustments you need to make to your setup.

Read Post

Broadcom

Read more about Migrating Your DX NetOps Integrations from OData 2 to OData 4

What is AI Agent Orchestration? Concept + How It Works

May 11, 2026 By Jagdish Sajnani In Motadata

Have you tried using AI at work and felt it works well for small tasks, but not beyond that? It can handle simple things like creating a summary, writing a draft, or answering a question. This works because the task is clear. But most tasks are not that simple. They involve multiple steps. One step depends on another. Data comes from different systems, and some decisions need checks before moving ahead. This is where a single AI system starts to struggle.

Read Post

Motadata

Read more about What is AI Agent Orchestration? Concept + How It Works

Monitoring Your Azure to Azure Local Migration: One Dashboard for Both Sides

May 11, 2026 By Satyadeep Ashwathnarayana In netdata

More organizations are moving workloads from Azure public cloud to Azure Local (formerly Azure Stack HCI) than most people realize. The reasons vary: data sovereignty requirements, latency-sensitive workloads that need to be closer to the edge, cost optimization for predictable workloads where reserved cloud capacity doesn’t make financial sense, or regulatory constraints that require data to stay on-premises.

Read Post

netdata

Read more about Monitoring Your Azure to Azure Local Migration: One Dashboard for Both Sides

ActiveMQ on Kubernetes: Production Deployment Guide

May 11, 2026 By meshIQ In meshIQ

Kubernetes is now the default deployment substrate for most enterprise platform teams. But ActiveMQ on Kubernetes presents a specific challenge that pure stateless workloads do not: message brokers are stateful.

Read Post

meshIQ

Read more about ActiveMQ on Kubernetes: Production Deployment Guide

Scheduled Maintenance Windows YouTube Video

May 11, 2026 By Uptime Website Monitoring In uptime

Learn how to create and manage your scheduled maintenance windows at Uptime!

View Video

uptime

Monitoring

Read more about Scheduled Maintenance Windows YouTube Video

AURA in Practice: Mezmo's SRE bot, demo walkthrough

May 11, 2026 By Mezmo In Mezmo

A walkthrough of the Slack-based SRE bot Mezmo's engineering team built on AURA, the open-source agent harness, running against Mezmo's own production tooling. Adrian Furlong shows the bot answering questions in a DM with tool calls visible inline, then in a shared channel where it reads the conversation before responding. He opens a fresh PagerDuty incident on camera. The webhook fires AURA, and within seconds, the agent posts a triage note back on the incident and a structured analysis in the dedicated incident channel.

View Video

Mezmo

Read more about AURA in Practice: Mezmo's SRE bot, demo walkthrough

Managing OpenTelemetry at Scale: Why OTel Pipelines Need a Control Plane

May 10, 2026 By Jonny Steiner In Coralogix

OpenTelemetry made telemetry possible everywhere – turning observability pipelines into distributed production infrastructure. Distributed infrastructure requires a control plane for inventory, governance, and safe change. At 500 collectors across hybrid environments, operational overhead becomes a production risk. The moment telemetry pipelines become a distributed infrastructure, they inherit the operational problems of one.

Read Post

Coralogix

Read more about Managing OpenTelemetry at Scale: Why OTel Pipelines Need a Control Plane

Geo Maps: See Where Your Infrastructure Lives

May 9, 2026 By Shyam Sreevalsan In netdata

When your infrastructure is spread across regions, data centers, branch offices, or edge locations, knowing where a node is physically located matters more than people usually admit. During an incident, “the node in the Singapore POP” communicates faster than a hostname. When you’re planning capacity, seeing geographic clustering tells you something that a flat list of nodes doesn’t.

Read Post

netdata

Read more about Geo Maps: See Where Your Infrastructure Lives

Uptime.com Basic Solution: Uptime.com

May 9, 2026 By Uptime Website Monitoring In uptime

Learn how to create and manage a Status Page on Uptime.com, including setting up public, SLA, and internal pages. Covering components, linking checks, incidents, maintenance, customization, subscribers, external users, and configuring SSO.

View Video

uptime

Monitoring

Read more about Uptime.com Basic Solution: Uptime.com

AWS outage takes down more than 150 cloud services

May 8, 2026 By Colin Bartlett In StatusGator

On May 7th and 8th, 2026, Amazon Web Services (AWS) experienced an outage affecting Amazon Elastic Compute Cloud (EC2) in the dreaded US East 1 region. The original region of AWS located in Northern Virginia, us-east-1 or just “US East” as it is known, has been the subject of some of the internet’s most high profile and destructive outages and remains Amazon’s least reliable region.

Read Post

StatusGator

Read more about AWS outage takes down more than 150 cloud services

A Runnable Reference Architecture for Battery Energy Storage Systems on InfluxDB 3

May 8, 2026 By InfluxData Team In InfluxData

A battery is a complex electrochemical system where safety and revenue are decided in milliseconds. Cell temperatures, voltages, and state of charge change in real-time; dispatch decisions and thermal alarms must fire in real-time. Anything in between—your data pipeline, your historian, your alerting layer—has to disappear into the background.

Read Post

InfluxData

Read more about A Runnable Reference Architecture for Battery Energy Storage Systems on InfluxDB 3

Federated Search | From Silos to Insight | AWS S3 Schema Discovery with Splunk-Managed Tables

May 8, 2026 By Splunk In Splunk

This walk-through shows how Splunk's crawler, available through the Data Management app, can discover schema and partition keys for S3 backed datasets and create Splunk managed catalog tables. Once the data is mapped, analysts can search AWS S3 data through Splunk and bring it into broader security, observability, and operational workflows.

View Video

Splunk

Read more about Federated Search | From Silos to Insight | AWS S3 Schema Discovery with Splunk-Managed Tables

How Modern Ops Lost Their Bearings

May 8, 2026 By ScienceLogic In ScienceLogic

Modern operations carry a quiet contradiction. Organizations have never had more data, more dashboards, or more instrumentation, yet teams increasingly struggle to gain a reliable sense of what the environment is actually doing. The problem is not the absence of information. It is the absence of bearings. This drift did not happen suddenly. It accumulated across years of transformation.

Read Post

ScienceLogic

Read more about How Modern Ops Lost Their Bearings

Multi-tiered Observability: A Practical Way to Handle Diverse Workloads

May 8, 2026 By Pablo Fernandez In VictoriaMetrics

Observability in large companies is rarely one-size-fits-all. The VictoriaMetrics topologies guide shows why different deployment patterns are needed as scale, isolation, and reliability requirements grow. Different workloads require different trade-offs: some need long retention for audits and trend analysis, while others need higher resolution for debugging. Business-critical systems also demand dependable alerting and high availability, often with several 9s of reliability.

Read Post

VictoriaMetrics

Read more about Multi-tiered Observability: A Practical Way to Handle Diverse Workloads

Diagnose and resolve database performance issues faster with Database Investigator

May 8, 2026 By Ethan Perez In Datadog

When your database performance degrades, diagnosing the root cause is rarely quick or straightforward. Your existing tools might surface metrics like CPU utilization, wait events, and query duration, but then leave you to correlate the data and identify what went wrong. Worse, what first appears to be the root cause can often just be a downstream effect of multiple interrelated issues.

Read Post

Datadog

Read more about Diagnose and resolve database performance issues faster with Database Investigator

Using Grafana Assistant Inside Claude Code

May 8, 2026 By Grafana In Grafana

Shawn Pitts demos how you can use Grafana Assistant inside Claude Code to get an analysis and code recommendation to fix an issue.

View Video

Grafana

Read more about Using Grafana Assistant Inside Claude Code

Zero-Code OpenTelemetry for Vert.x

May 8, 2026 By Prathamesh Sonpatki In Last9

Drop a JAR on the JVM. Get distributed tracing, RxJava context propagation, log-trace correlation, and Vert.x internal metrics. No code changes. No Maven dependency. Java 8–21. Inside the design of last9/vertx-opentelemetry v2.3.4. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Read Post

Last9

Read more about Zero-Code OpenTelemetry for Vert.x

From noise to knowledge: How GenAI is revolutionizing log management and analytics

May 8, 2026 By Elastic Observability Team In Elastic

Focusing on GenAI and logs for IT efficiency Efficiency is everything for managing today’s digital systems. Technology is constantly transforming and expanding operations are driving an explosion in data. Consequently, data ingest and storage costs have soared. But it’s not just storage data costs that keeps teams behind.The challenge of managing all that observability data forces IT teams to choose between efficiency and the bottom line.

Read Post

Elastic

Read more about From noise to knowledge: How GenAI is revolutionizing log management and analytics

Monitor Unreal Engine Game Performance with Application Metrics

May 8, 2026 By Ivan Tustanivskyi In Sentry

Your Unreal game can ship with zero errors and still not feel great. Stutters during combat, a frame-rate cliff on the big boss, rubber-banding in multiplayer, none of it shows up as a crash and none of it shows up in Sentry, leaving you without any visibility into what your players are actually experiencing in the wild. Well, until now. Unreal Engine already gives you plenty of tools to measure game performance and collect runtime stats, but all that data stays on the dev’s machine.

Read Post

Sentry

Read more about Monitor Unreal Engine Game Performance with Application Metrics

The Journey to Production AI: Five Steps for SRE and Platform Teams

May 8, 2026 By Mezmo In Mezmo

In a recent webinar, The Journey to Production AI, Andre Elizondo walked through what separates a working agent demo from an agent worth trusting on a 2 a.m. page. Live polls during the session put numbers behind a pattern most platform teams already feel. ‍ ‍ Most teams are early. The ones who are further along did not get there by shipping a flashier demo. They got there by treating production AI as a platform problem.

Read Post

Mezmo

Read more about The Journey to Production AI: Five Steps for SRE and Platform Teams

Operational Intelligence and the Hidden Structure in System Logs

May 8, 2026 By Bob Slevin In Selector

Most IT teams do not suffer from a lack of data. They suffer from the amount of effort required to make sense of it. Every network device, application, cloud service, and infrastructure component generates a constant stream of machine output. Logs capture state changes, failures, retries, warnings, and thousands of other small signals about how systems behave. The problem is that raw logs are hard to use at operational speed.

Read Post

Selector

Read more about Operational Intelligence and the Hidden Structure in System Logs

Retroactive sampling reduce trace traffic and costs

May 8, 2026 By VictoriaMetrics In VictoriaMetrics

In this short, our software engineer Zhu Jiekun, explains how retroactive sampling can reduce trace traffic and ingestion costs by sending minimal data for sampling decisions and retrieving full spans only when needed—at the cost of added system complexity. Resources for Further Learning.

View Video

VictoriaMetrics

Monitoring

Read more about Retroactive sampling reduce trace traffic and costs

Sponsored Post

How to Reduce MTTR When Third-Party Services Go Down

May 7, 2026 By Nuno Tomas In isDown

Most MTTR guides assume the problem is in your infra. For modern apps, it's often not - it's Stripe, AWS, Auth0, or another vendor. Vendor status pages lie by omission. The lag between impact and acknowledgment can stretch to an hour or more. You need two runbooks, proactive vendor monitoring, and graceful degradation baked in before the 3 AM page hits. This post shows you exactly how.

Read Post

isDown

Read more about How to Reduce MTTR When Third-Party Services Go Down

Why Blast Radius Analysis Does Not End When Alerts Fire

May 7, 2026 By Lightrun Team In Lightrun

Modern distributed systems fail in ways that can bypass even well-designed isolation patterns. When a failure is actively propagating across services at four in the morning, the question shifts from “how do we limit the blast radius” to “how do we confirm what it actually is.” Monitoring shows which services are in the impact zone, but it cannot show what code path caused the failure to spread, or whether it has stopped.

Read Post

Lightrun

Read more about Why Blast Radius Analysis Does Not End When Alerts Fire

Data Sovereignty: How to Keep All of Your Services in Europe (AppSignal + Hatchbox)

May 7, 2026 By Julian Rubisch In AppSignal

Over the last decade, a great deal of data privacy regulations have been passed in the European Union. Like it or not, measures like GDPR, the Digital Services Act, and the upcoming Artificial Intelligence Act are exerting increasing influence across industries over how and especially where the data of European customers is stored. In this article, we will explore the ways to keep the simplicity of a Platform as a Service (PaaS) while utilizing only European providers.

Read Post

AppSignal

Read more about Data Sovereignty: How to Keep All of Your Services in Europe (AppSignal + Hatchbox)

Choosing Enterprise Observability: Compliance, Security, Visibility and the Rise of the Enterprise Control Tower

May 7, 2026 By david.arrowsmith In Interlink

For IT professionals evaluating Observability for large, complex enterprise environments, the shortlist usually starts with technical depth and ends with operational reality.

Read Post

Interlink

Read more about Choosing Enterprise Observability: Compliance, Security, Visibility and the Rise of the Enterprise Control Tower

Versions and Deploys Now Live on Your Dashboard

May 7, 2026 By Rollbar In Rollbar

Your version and deploy health data no longer lives on separate pages. Versions and Deploys are now Dashboard cards, visible by default the next time you open Rollbar. Errors, versions, and deploys in one view, no more tab-hopping to piece together release health.

Read Post

Rollbar

Read more about Versions and Deploys Now Live on Your Dashboard

Span or Attribute in OpenTelemetry Custom Instrumentation

May 7, 2026 By Jessica Kerr (Jessitron) In Honeycomb

TL;DR: Attribute. More information on one event gives us more correlation power. It’s also cheaper. When you want to add some information to your tracing telemetry, you could emit a log, create a span, or add a piece of data to your current span. Adding a piece of data to your current span is the best! Usually.

Read Post

Honeycomb

Read more about Span or Attribute in OpenTelemetry Custom Instrumentation

How one partnership powers search for over 2 million WP Engine users

May 7, 2026 By Sunile Manjee In Elastic

How do you make search faster, smarter, and more scalable? During our recent webinar, I sat down with Luke Patterson, senior product manager at WP Engine, and Delphin Barankanira, independent software vendor partner engineering lead and data & AI specialist at Google Cloud, to answer that question. We dug into the mechanics behind WP Engine’s ability to deliver near-instant updates to over 2 million users.

Read Post

Elastic

Read more about How one partnership powers search for over 2 million WP Engine users

Faster OpenTelemetry Migrations from Splunk to SecOps with Bindplane

May 7, 2026 By Laura Luttmer In ObservIQ

Many security teams are looking to move off Splunk, whether to reduce licensing costs, consolidate their SIEM, or take advantage of Google SecOps' built-in threat intelligence and YARA-L detection capabilities. But migrations aren’t easy, and no one wants to run blind while they evaluate and move to a new platform. With OpenTelemetry and Bindplane, you can easily make the switch to SecOps without impacting your existing stack.

Read Post

ObservIQ

Read more about Faster OpenTelemetry Migrations from Splunk to SecOps with Bindplane

Eliminate noisy log lines with Adaptive Logs drop rules

May 7, 2026 By Steven Dungan In Grafana

Most platform and observability teams have logs they know are noise. These could be throwaway health check logs, forgotten DEBUG logs, or verbose INFO logs from little used services that only serve to inflate your bill. Regardless of what they contain and why they're there in the first place, the hard part is getting rid of them. Centralized teams want to easily and quickly prevent these logs from being ingested, without having to work with toilsome infrastructure change management to do so.

Read Post

Grafana

Read more about Eliminate noisy log lines with Adaptive Logs drop rules

Fixing JavaScript observability, one library at a time

May 7, 2026 By Abdelrahman Awad In Sentry

Over the past few weeks, we have been driving a cross-ecosystem effort to replace the “monkey-patching” that powers all JavaScript APM tools today with something built into the runtime. Here is why, how, and where it stands. This applies to server-side JavaScript only (Node.js, Bun, Deno, Cloudflare Workers). Browsers do not have diagnostics_channel and lack the async context propagation primitives needed to polyfill it.

Read Post

Sentry

Read more about Fixing JavaScript observability, one library at a time

7 Best Practices to Improve Digital Employee Experience in Modern IT Environments

May 7, 2026 By Teneo In Teneo

Digital employee experience isn’t just a nice to have anymore. In hybrid, SaaS heavy IT environments Digital Employee Experience (DEX) is where productivity can live or die. Employees don’t care whether the culprit is Wi‑Fi connectivity, CPU/RAM load, poor battery life, or a misbehaving cloud app. They just know work got harder.

Read Post

Teneo

Read more about 7 Best Practices to Improve Digital Employee Experience in Modern IT Environments

ActiveMQ Monitoring & Alerting Setup: The Complete 2026 Guide

May 7, 2026 By meshIQ In meshIQ

Most ActiveMQ outages are not sudden failures. They are visible in the metrics for minutes, sometimes hours, before they become incidents. A memory usage graph climbing past 60%. A queue depth that isn't draining. An enqueue time that doubled after a deployment. A consumer count that dropped from 3 to 1 at 2 AM.

Read Post

meshIQ

Read more about ActiveMQ Monitoring & Alerting Setup: The Complete 2026 Guide

Auvik Aurora and the Future of AI in IT Operations

May 7, 2026 By Antonio Neglia In Auvik

We built something called Auvik Aurora, and before you scroll any further, I can already hear your thoughts. “Wait a second, Anto. Is this going to be another blog post giving me the hard sell on using AI?” Fair enough, I don’t think anyone would blame you, especially when we’re seeing AI adoption across nearly every industry, tool, hobby, workflow, or even . The blank is intentional, AI is everywhere, and chances are that you already know that it matters.

Read Post

Auvik

Read more about Auvik Aurora and the Future of AI in IT Operations

Observability and Security for the AI Era

May 7, 2026 By Datadog In Datadog

Datadog has always been driven by a broader vision of helping teams understand and operate complex systems. In this session, you’ll hear from Michael Whetten, Product SVP, and Abrar Hussain, Senior Director, Product Management, as they share the latest updates across the Datadog product suite and discuss how that vision continues to shape the platform’s evolution and support the next generation of AI-driven applications.

View Video

Datadog

Read more about Observability and Security for the AI Era

Major .de Outage: DNSSEC Failure at DENIC Takes Down German Domains

May 6, 2026 By Colin Bartlett In StatusGator

On May 5, 2026, a major.de outage disrupted access to websites across Germany and Europe. The incident, caused by a failure at DENIC, the operator of the.de top-level domain, resulted in widespread DNS resolution failures. This was not a typical service outage. It was a failure at the DNS layer that made entire domains unreachable. As DNS caches expired, more services went offline, creating the appearance of a spreading outage across unrelated companies.

Read Post

StatusGator

Read more about Major .de Outage: DNSSEC Failure at DENIC Takes Down German Domains

UK Public Sector IT teams face mounting AI pressures amid 'do more with less' reality

May 6, 2026 By SolarWinds In SolarWinds

New SolarWinds research reveals that AI is adding pressure and complexity for public sector IT teams, despite being positioned as a solution to ease workload.

Read Post

SolarWinds

Read more about UK Public Sector IT teams face mounting AI pressures amid 'do more with less' reality

Extended URL & Log Monitoring for Microsoft SCOM

May 6, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

NEW + FREE Community Management Packs for Enhanced Monitoring.

Read Post

NiCE IT Mgmt

Read more about Extended URL & Log Monitoring for Microsoft SCOM

Powering Autonomous IT with Edwin AI in ServiceNow Now Assist

May 6, 2026 By Margo Poda In LogicMonitor

Edwin AI extends ServiceNow Now Assist with real-time incident intelligence, acting as a context broker between observability data and ServiceNow incidents. Responders get the context they need inside the IT operations workflow they already use. Edwin AI now: The Edwin AI Agent for ServiceNow brings real-time incident intelligence into Now Assist and Workspace, giving ITOps teams root cause, impact, and recommended next steps directly inside the ServiceNow incident record.

Read Post

LogicMonitor

Read more about Powering Autonomous IT with Edwin AI in ServiceNow Now Assist

How to Prevent AI Agents From Deleting Production Data

May 6, 2026 By Lightrun Team In Lightrun

There’s a new question teams are asking. How can we prevent AI agents from deleting production. When Cursor deleted PocketOS’s entire production database in nine seconds, the agent wasn’t malfunctioning. It had full technical capability, but it was inferring operational authority from static code rather than live environment state. That gap between capability and context is the root cause. This article breaks down exactly how that happens, and what runtime visibility does to stop it.

Read Post

Lightrun

Read more about How to Prevent AI Agents From Deleting Production Data

Get Valid TLS Certificates for Icinga Web Despite a Firewall

May 6, 2026 By Alexander Klimov In Icinga

Lots of big companies lock down their IT infrastructure in the internal network, sometimes they even use only locally mirrored repositories. I totally understand this, especially since our CVE-2024-49369. Nowadays, when LLMs find security holes even in OpenBSD, you definitely shouldn’t expose any services to the public without need.

Read Post

Icinga

Read more about Get Valid TLS Certificates for Icinga Web Despite a Firewall

Building with Bots - Agent Monitoring in Sentry

May 6, 2026 By Sentry In Sentry

In this workshop, we’ll show you how to use Sentry Agent Monitoring to crack open the black box that is AI — inputs and outputs, token usage, model performance — so you actually know what the robots in your application are doing.

View Video

Sentry

Read more about Building with Bots - Agent Monitoring in Sentry

Troubleshoot performance issues faster with the new Grafana Assistant integration for Database Observability

May 6, 2026 By Jeremy Heller In Grafana

So your database is slow. Now what? Grafana Cloud Database Observability already gives you visibility into your SQL queries with RED metrics, individual execution samples, wait event breakdowns, table schemas, and visual explain plans. But visibility is just the starting point. You can see that a query's P99 latency spiked, but what should you do about it? You can see wait events like wait/synch/mutex/innodb firing, but what does that actually mean?

Read Post

Grafana

Read more about Troubleshoot performance issues faster with the new Grafana Assistant integration for Database Observability

Elasticsearch 9.4 powers the next phase of the Elastic AI Ecosystem: Dell AI Data Platform with NVIDIA

May 6, 2026 By Sunnie Weber In Elastic

AI is moving fast. Enterprise adoption needs to move with purpose. Over the past year, one thing has become clear: Organizations are not looking for more AI hype. They are looking for a path to production — one that connects infrastructure, data, and intelligence in a way that delivers real business value. That is exactly what the Elastic AI Ecosystem is built to do. At Elastic, we believe AI is only as powerful as the data foundation behind it. Great models matter.

Read Post

Elastic

Read more about Elasticsearch 9.4 powers the next phase of the Elastic AI Ecosystem: Dell AI Data Platform with NVIDIA

Datadog for Government achieves FedRAMP High certification

May 6, 2026 By Geoffrey Carlisle In Datadog

Modern government missions depend on software platforms that can perform under demanding conditions. As agencies update systems that support public safety, benefits delivery, financial operations, and national priorities, they face security and compliance requirements that shape how technology is adopted as well as how it is built, operated, and evolved over time.

Read Post

Datadog

Read more about Datadog for Government achieves FedRAMP High certification

Analyze cloud costs with flexible spreadsheets in Datadog Sheets

May 6, 2026 By Katherine Broner In Datadog

Cloud cost data is most useful when teams can adapt it to their own reporting and planning needs. In addition to viewing cost breakdowns, FinOps teams often need to calculate forecasts, reshape datasets, and present tailored views to finance and leadership teams. In many workflows, those steps happen outside the observability platform. Once the data is exported, it quickly becomes outdated and requires repeated manual updates.

Read Post

Datadog

Read more about Analyze cloud costs with flexible spreadsheets in Datadog Sheets

Navigating the Middleware Maze: How meshIQ 12.1 Redefines Scale and Simplicity with Agentic AI

May 6, 2026 By Greg DeaKyne In meshIQ

meshIQ v12.1 transforms middleware management with petabyte-scale data processing and agentic AI. The new intelligent launchpad, simplified onboarding, and context-aware safeguards move teams from reactive monitoring to proactive, AI-driven operations across the enterprise.

Read Post

meshIQ

Read more about Navigating the Middleware Maze: How meshIQ 12.1 Redefines Scale and Simplicity with Agentic AI

Inside the .de DNS Outage: Real-World Data from UptimeRobot.

May 6, 2026 By Tomas Koprusak In Uptime Robot

In the evening of May 5th, 2026, large parts of the German web briefly went dark. For a few hours, anyone trying to load a.de address through a major DNS resolver got errors instead of websites. Bahn.de, Amazon.de, and Spiegel.de were among the affected. Major brands like Telekom, DHL, and Sparkassen felt it too, along with hosting providers Hetzner, Strato, and Ionos.

Read Post

Uptime Robot

Read more about Inside the .de DNS Outage: Real-World Data from UptimeRobot.

What kind of correlations become impossible without depth and breadth?

May 6, 2026 By Virtana In Virtana

Most teams don’t have a data problem. They have a correlation problem. When visibility is fragmented:→ Marketing sees conversion drop→ Engineering sees API latency So the wrong call gets made. Example: Checkout drops → pricing gets blamed → discounts applied. Reality: a backend API timeout was killing transactions. That’s what happens when you can’t connect: user impact (what) to system behavior (why)

View Video

Virtana

Read more about What kind of correlations become impossible without depth and breadth?

Improved debugging for Expo apps with the React Native SDK

May 6, 2026 By Aleksandr Pantiukhov In Sentry

Events from Expo apps account for about 75% of the total event volume we receive from React Native apps. That number made it an easy decision to invest in updates to the Sentry React Native SDK to improve the debugging and performance workflow for your Expo apps. With these updates, you can now.

Read Post

Sentry

Read more about Improved debugging for Expo apps with the React Native SDK

VictoriaMetrics April 2026 Ecosystem Updates

May 6, 2026 By Pablo Fernandez In VictoriaMetrics

We’re excited to learn that our vmagent helped Airbnb migrate its high-volume metrics pipeline from StatsD and Veneur to OpenTelemetry. Airbnb is now handling 100 million samples per second. You can read more about the migration in these articles: In other news, April saw releases across the VictoriaMetrics Observability Stack. We have released several important bugfixes for VictoriaMetrics and many new features in VictoriaLogs. This release round-up covers updates for.

Read Post

VictoriaMetrics

Read more about VictoriaMetrics April 2026 Ecosystem Updates

Monitoring from Private Locations

May 6, 2026 By Checkly In Checkly

Not everything worth monitoring is on the public internet. In this 30-minute hands-on session, Daniel Paulus deploys four Checkly private location agents on AWS EKS with Terraform, then uses a coding agent to scaffold 200 internal checks in seconds — uptime, TCP, DNS, ICMP, and Playwright browser checks against legacy apps that never leave the firewall.

View Video

Checkly

Read more about Monitoring from Private Locations

The cost of knowledge

May 6, 2026 By Ofri Grushka In Coralogix

In the world of observability, “cardinality” has become a heavy word. It is a ghost used to justify skyrocketing bills or degraded query performance. When cardinality rises, the advice is almost always the same: reduce it. Drop your labels, or reduce the dimensions. It is usually framed as “optimization.” Every label you add to a metric is a dimension of knowledge. Each one gives you a way to slice, compare, and explain the chaos of production.

Read Post

Coralogix

Read more about The cost of knowledge

Driving Innovation: A Bias Towards Action with Greg Freeman

May 6, 2026 By Selector In Selector

AI is changing network operations faster than ever. In the latest episode of Next-Gen Network Heroes, Bob sits down with Greg Freeman of Lumen Technologies to talk about what it takes to innovate across one of the world’s largest telecommunications networks. From deterministic workflows to agentic AI, Greg shares how his team is using automation, analytics, and AI to improve network reliability, customer experience, and operational efficiency at scale.

View Video

Selector

Read more about Driving Innovation: A Bias Towards Action with Greg Freeman

Bias Toward Action: Driving AI Innovation Across Global Networks with Greg Freeman

May 6, 2026 By Selector In Selector

What does it take to lead innovation across one of the world’s largest telecommunications networks? In this episode of Next-Gen Network Heroes, host Bob Slevin sits down with Greg Freeman, Vice President of Network and Customer Transformation at Lumen Technologies, to explore how AI, automation, and curiosity are reshaping the future of network operations.

View Video

Selector

Read more about Bias Toward Action: Driving AI Innovation Across Global Networks with Greg Freeman

How to Measure your Most Expensive Milliseconds

May 6, 2026 By Datadog In Datadog

In the fast-paced world of mobile development, reliability rarely fails with a loud crash; instead, it degrades quietly through micro-regressions that erode user trust and engagement. While most companies track backend health and API latency, they often fly blind regarding the actual screen-level responsiveness that defines the true user experience. When Expedia Group underwent a major technical evolution, the team realized they lacked a consistent baseline to compare performance across platforms, leaving them unable to validate improvements before rollout.

View Video

Datadog

Read more about How to Measure your Most Expensive Milliseconds

Introducing AppSignal Labs

May 6, 2026 By Serena Chou In AppSignal

We've been shipping faster. A dark mode for the UI, AppSignal MCP, the AWS dashboard templates — things we would have kept internal a year ago until everything was polished. Now we don't. A v1 in your hands beats a v3 in our heads. We learn more from a week of real use than from a quarter of internal review. So we're giving that work a home. AppSignal Labs is where you'll find the earlier versions. Real software, available today, with a direct line to the team building it.

Read Post

AppSignal

Read more about Introducing AppSignal Labs

Bindplane at Google Cloud Next 2026: OpenTelemetry, SecOps, and Smarter Telemetry Pipelines

May 6, 2026 By Bindplane In ObservIQ

Bindplane at Google Cloud Next 2026. A quick recap of our booth, demos, conversations, and how Bindplane helps teams build smarter OpenTelemetry pipelines for Google SecOps and modern telemetry operations.

View Video

ObservIQ

Read more about Bindplane at Google Cloud Next 2026: OpenTelemetry, SecOps, and Smarter Telemetry Pipelines

IT Monitoring News | May '26 Edition

May 5, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

Latest releases, resources, and events focused on Microsoft SCOM and modern ITOps & DataOps CONTACT.

Read Post

NiCE IT Mgmt

Read more about IT Monitoring News | May '26 Edition

About TazamaTech Software

May 5, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

NiCE and TazamaTech Announce Strategic Partnership.

Read Post

NiCE IT Mgmt

Read more about About TazamaTech Software

The World Beneath The Dashboards

May 5, 2026 By ScienceLogic In ScienceLogic

Most people assume the modern enterprise runs cleanly on the dashboards and cloud consoles that dominate today’s digital workspaces. Anyone who operates these environments understands a more complicated truth. The real work happens beneath those surfaces, in systems few people notice until something slips. Across industries, engineers face the same recurring scenario: a routine shift disrupted by signals of degradation somewhere in the environment.

Read Post

ScienceLogic

Read more about The World Beneath The Dashboards

SmartAssist and SQL Analytics - AI-powered querying

May 5, 2026 By Blog In Squared Up

SQL Analytics has always been one of my favourite SquaredUp features. That's not just because I can use raw SQL to achieve complex data transformations. The fact that I can run SQL queries over data from all sorts of sources — not just relational databases, gives incredible power and flexibility. The great news is that SQL Analytics now ships with our AI-driven SmartAssist technology.

Read Post

Squared Up

Read more about SmartAssist and SQL Analytics - AI-powered querying

What Is a Linux Server? Everything You Need to Know (2026)

May 5, 2026 By LogicMonitor In LogicMonitor

An open-source foundation for resilient infrastructure: on-prem, cloud, and hybrid. IT downtime costs organizations an average of $9,000 per minute, or more than $1 million per hour. That’s real money lost when websites crash, transactions fail, or internal systems go offline. For many organizations, avoiding those losses starts with choosing the right server operating system (OS). Why? The OS sets the foundation for how stable, secure, and cost-efficient your infrastructure will be.

Read Post

LogicMonitor

Read more about What Is a Linux Server? Everything You Need to Know (2026)

How to Monitor Your Node.js App on Hetzner with AppSignal

May 5, 2026 By Dejan Lukić In AppSignal

More and more developers are choosing self-hosting over traditional PaaS. At first, self-hosting may seem like unnecessary heavy lifting, especially when you can deploy as fast as creating a repo. However, with correct tooling, it’s easy to see why devs are moving away from PaaS. You get dedicated resources and (if needed) a European data center at a fraction of the cost.

Read Post

AppSignal

Read more about How to Monitor Your Node.js App on Hetzner with AppSignal

How Scalability Works in SolarWinds Observability Self-Hosted

May 5, 2026 By solarwindsinc In SolarWinds

Cheryl Nomanson, SolarWinds staff technical trainer, provides a comprehensive overview of SolarWinds architecture and scaling options for self-hosted deployments. She explains the centralized deployment model starting with a single SolarWinds server that handles polling, web console, and database connections. The presentation covers key scaling indicators including polling thresholds that warn users at 85% capacity and alert at 100%. She demonstrates how to add up to 100 polling engines per server and additional web servers to handle more concurrent users.

View Video

SolarWinds

Read more about How Scalability Works in SolarWinds Observability Self-Hosted

Moving Beyond SolarWinds: A Guide to Modern Observability

May 5, 2026 By Sofia Burton In LogicMonitor

Industry-leading observability experts provide strategic guidance on why and how modern IT teams are successfully moving beyond SolarWinds to more resilient, cloud-native platforms. IT teams running SolarWinds often know the pain points well before they start evaluating alternatives: separate modules for different monitoring needs, a self-hosted deployment model that requires ongoing maintenance, and pricing that gets harder to predict after each acquisition.

Read Post

LogicMonitor

Read more about Moving Beyond SolarWinds: A Guide to Modern Observability

Ep 41: The cost of not thinking: Who's responsible when AI agents get it wrong?

May 5, 2026 By Sumo Logic, Inc. In Sumo Logic

In this episode of Masters of Data, we get into the messier side of AI adoption, tackling questions like who actually owns the output when AI gets it wrong, and whether chasing efficiency is making us forget what it means to be human in the first place. We discuss tech CEOs proudly announcing they no longer think for themselves and debate whether AI is quietly eroding our critical thinking skills. We make the case that purpose-built, narrow AI is genuinely exciting, but that no efficiency gain is worth losing the human touch that makes work, connection, and creativity meaningful.

View Video

Sumo Logic

Read more about Ep 41: The cost of not thinking: Who's responsible when AI agents get it wrong?

Observability vs Monitoring: What's the Real Difference in 2026?

May 5, 2026 By Motadata In Motadata

Understand the real difference between observability and monitoring — and why modern IT teams in 2026 need both. Monitoring tells you something is broken; observability explains why. See real examples, faster troubleshooting workflows, and how Motadata ObserveOps unifies both in one platform. Don’t forget to like, share, and subscribe for more IT insights.

View Video

Motadata

Read more about Observability vs Monitoring: What's the Real Difference in 2026?

Launching Application Metrics

May 5, 2026 By Sentry In Sentry

We just launched Application Metrics! Track the signals you care about, and when something spikes, click straight into the trace, logs, and errors that caused it. No more guesswork. Counters, gauges, and distributions. One line of code. Full trace context.

View Video

Sentry

Monitoring

Read more about Launching Application Metrics

The one pipeline change that cuts costs AND improves visibility

May 5, 2026 By Cribl In Cribl

Your SIEM and observability tools are expensive, and you're probably using them for things a data lake could handle for a fraction of the cost. Let's fix that.

View Video

Cribl

Read more about The one pipeline change that cuts costs AND improves visibility

Introducing the Coralogix CLI: Headless Observability for Every Agent

May 5, 2026 By Coralogix Team In Coralogix

This article is a high-level overview of the Coralogix CLI. For a deeper look at how it works in practice, read the full technical deep dive here. Agent-driven investigation sounds simple: read the alert, query the data, return the cause. In reality, most agents either overload their context window with raw logs or guess at queries and return incorrect results.

Read Post

Coralogix

Read more about Introducing the Coralogix CLI: Headless Observability for Every Agent

Application Metrics - Getting Started

May 5, 2026 By Sentry In Sentry

We've officially launched Application Metrics! In this video @nikolovlazar is introducing you to them and is showing you how to use them. Each plan gets 5GB of Application Metrics for free. Chapters.

View Video

Sentry

Monitoring

Read more about Application Metrics - Getting Started

ActiveMQ JMS 2.0 Implementation Guide: Simplified API, Transactions & Spring

May 5, 2026 By meshIQ In meshIQ

For most of JMS's lifetime, writing a simple producer required creating a ConnectionFactory, creating a Connection, starting it, creating a Session, creating a MessageProducer, creating a Message, calling send(), and then closing the producer, session, and connection with the close calls safely wrapped in finally blocks to prevent resource leaks. Every developer knew the pattern. Every developer wrote it slightly differently. Every code review had the same comments about resource management.

Read Post

meshIQ

Read more about ActiveMQ JMS 2.0 Implementation Guide: Simplified API, Transactions & Spring

AI Factory Observability

May 5, 2026 By Virtana In Virtana

View Video

Virtana

Read more about AI Factory Observability

Introducing Application Metrics: Track the signal, see the spike, jump to the trace

May 5, 2026 By Ben Coe In Sentry

A few weeks ago we had a bug with Session Replay. Replays were failing in some browsers once more than 1,000 video segments loaded. We had no idea how often it happened or who was hitting it, and because the failure didn’t always produce an error, we had no way to find affected users to reproduce it. Before, we could’ve answered this with spans or logs, but it’s clunky — spans are often sampled, so you can miss outliers; logs are less structured and tend to change over time.

Read Post

Sentry

Read more about Introducing Application Metrics: Track the signal, see the spike, jump to the trace

April 2026 product updates

May 4, 2026 By Valeria Kurolapova In StatusGator

We’ve released a new set of updates to help you reduce noise, customize your experience, and get clearer insights into your services. Here’s what’s new this month.

Read Post

StatusGator

Read more about April 2026 product updates

Putting Bolt in The Hot Seat | HotFix by Sentry

May 4, 2026 By Sentry In Sentry

Welcome to HotFix, the show where Sentry customers return the favor. We helped Bolt fix their broken code. Now, it's their turn to fix something for us. Featuring Bolt Co-Founders, Eric Simons and Albert Pai.

View Video

Sentry

Monitoring

Read more about Putting Bolt in The Hot Seat | HotFix by Sentry

May the Logs Be With You: Graylog 7.1 Is Here

May 4, 2026 By The Graylog Product Team In Graylog

A long time ago, in a SOC far, far away…analysts were drowning in alerts, chasing context across fragmented screens, and watching real threats slip past detection gaps. Today, the Rebellion fights back. This isn’t a release built around a single marquee feature. It’s the result of our team listening to you on the front lines with an ear for removing the friction that makes your jobs harder than they need to be.

Read Post

Graylog

Read more about May the Logs Be With You: Graylog 7.1 Is Here

Monitor and optimize Supabase query performance with Datadog Database Monitoring

May 4, 2026 By Kyra Abbu In Datadog

Built on Postgres, Supabase is an open source, all-in-one backend platform for developers who want to ship applications without managing infrastructure. This makes it especially popular with frontend developers and vibe coders who may have little to no database expertise. Datadog's Supabase integration provides high-level infrastructure metrics, but developers also need query-level visibility to easily diagnose, optimize, and trace performance issues back to their source.

Read Post

Datadog

Read more about Monitor and optimize Supabase query performance with Datadog Database Monitoring

What Is AWS EKS, and How Does It Work with Kubernetes?

May 4, 2026 By LogicMonitor In LogicMonitor

Amazon EKS is AWS’s managed Kubernetes service for deploying and scaling containerized applications. Amazon Elastic Kubernetes Service (Amazon EKS) is a managed Kubernetes service that simplifies deploying, scaling, and running containerized applications on AWS and on-premises. EKS automates Kubernetes control plane management, ensuring high availability and seamless integration with AWS services like IAM, VPC, and ALB.

Read Post

LogicMonitor

Read more about What Is AWS EKS, and How Does It Work with Kubernetes?

Obkio Microsoft Teams Monitoring vs. Microsoft Teams Admin Center

May 4, 2026 By Andrii Kernitskyi In Obkio

Most IT teams rely on Microsoft Teams Admin Center as their default monitoring tool to find and fix Microsoft Teams issues, but there's a gap between what it shows and what actually causes call quality problems. Teams Admin Center gives you Microsoft's perspective on what happened after an MS Teams call ended. It doesn't tell you what was happening on your network, on your users' devices, or in the five minutes before the complaints started coming in.

Read Post

Obkio

Read more about Obkio Microsoft Teams Monitoring vs. Microsoft Teams Admin Center

NVIDIA DCGM Collector: Deep GPU Monitoring for Data Center and AI Infrastructure

May 4, 2026 By Shyam Sreevalsan In netdata

GPU infrastructure is expensive and increasingly central to production workloads. Whether you’re running ML training jobs, inference serving, video transcoding, or HPC workloads, understanding what your GPUs are actually doing, and what’s going wrong when performance degrades, is not optional.

Read Post

netdata

Read more about NVIDIA DCGM Collector: Deep GPU Monitoring for Data Center and AI Infrastructure

Taming Log Noise With the OpenTelemetry Collector's Drain Processor

May 4, 2026 By Mike Goldsmith In Honeycomb

Do you receive 50 million log lines per day and struggle to see what actually matters? Health checks, heartbeat pings, connection pool messages—they all drown out the errors and anomalies you're trying to find. Most teams deal with this by writing filter rules to drop the noisy patterns. But those rules are manual, per-pattern, and brittle. A new deployment changes a log format and the filter misses it. A new service starts logging a chatty startup sequence nobody thought to exclude.

Read Post

Honeycomb

Read more about Taming Log Noise With the OpenTelemetry Collector's Drain Processor

What's New with Progress WhatsUp Gold 2026.0

May 4, 2026 By Progress WhatsUp Gold In WhatsUp Gold

Progress WhatsUp Gold 2026.0 helps IT teams improve network visibility, strengthen security and work more efficiently. In this recorded webinar, explore what’s included in this free upgrade for customers with an active service agreement, including: Learn how Progress WhatsUp Gold 2026.0 can deliver proactive visibility with trusted security across your IT infrastructure.

View Video

WhatsUp Gold

Read more about What's New with Progress WhatsUp Gold 2026.0

This Month in Datadog - April 2026

May 4, 2026 By Datadog In Datadog

In the latest episode of This Month in Datadog, Jeremy shares how to run autonomous Cloud SIEM investigations, remediate vulnerabilities with auto-generated fixes, and use natural language to explore Datadog. Later, Sumedha Mehta spotlights the Datadog MCP Server, which gives AI agents real-time access to Datadog’s observability data. Then, Chetan Sharma walks through Datadog Experiments, which measures how product changes impact the user journey.

Read Post

Datadog

Read more about This Month in Datadog - April 2026

AI Supply Chain Attacks Are Here. And Most Organizations Aren't Ready

May 4, 2026 By Teneo In Teneo

When I read about the Vercel breach tied to a Context AI compromise, I wasn’t surprised. I’ve been talking with customers for a while now about how AI was going to introduce a new kind of supply chain risk. This is exactly what that looks like. What stands out to me is how familiar the pattern is. We saw it with open source, then again with SaaS, and again with cloud.

Read Post

Teneo

Read more about AI Supply Chain Attacks Are Here. And Most Organizations Aren't Ready

What is Cloud Threat Detection? An Ultimate Guide for 2026

May 4, 2026 By Jagdish Sajnani In Motadata

What if the next breach in your cloud is already in motion, and your team has no idea how to see it? Cloud workloads are growing fast. APIs, identities, and data are spread across AWS, Azure, GCP, and on-prem systems all at once. Every layer creates its own logs, its own alerts, and its own blind spots. Most security teams are short on visibility, context, and time. That is the gap cloud threat detection is built to close.

Read Post

Motadata

Read more about What is Cloud Threat Detection? An Ultimate Guide for 2026

How the Coralogix CLI Adds Production Intelligence to Any Agent for Any Use Case

May 4, 2026 By Chris Cooney In Coralogix

The new interface into production telemetry is a tool call, made from whichever agent runtime the operator happens to be using at that moment. A finance lead in Claude Code, a product manager in Cursor, an engineer in Codex. Three different jobs, three different agents, three different reasoning loops. The thing they have in common is the data layer underneath.

Read Post

Coralogix

Read more about How the Coralogix CLI Adds Production Intelligence to Any Agent for Any Use Case

Why Does MTTD Stay High Despite Observability Tools Running?

May 4, 2026 By Lightrun Team In Lightrun

Monitoring coverage, anomaly detection, and SLO-based alerting have significantly narrowed detection windows for most failure types, but MTTD remains stubbornly high for a specific silent failure. This blog covers why type mismatches, swallowed exceptions, and values that pass validation without occurring without triggering errors, and what changes when your monitoring stack can generate those signals without waiting for a failure to surface them.

Read Post

Lightrun

Read more about Why Does MTTD Stay High Despite Observability Tools Running?

What's New In Graylog V7.1

May 4, 2026 By Graylog In Graylog

Checkout the latest version of Graylog!

View Video

Graylog

Read more about What's New In Graylog V7.1

Federated Search | From Silos to Insight | Unified Datasets in AWS S3 with Ingest Processor

May 4, 2026 By Splunk In Splunk

Are storage costs and data silos slowing down your investigations? In this video, we dive into the Unified Dataset Experience to show you how to search data where it lives. Learn how to use the Splunk Ingest Processor to route high volume logs directly to AWS S3 while maintaining instant visibility via Federated Search. No more re-hydrating data, just fast cost-effective insights.

View Video

Splunk

Read more about Federated Search | From Silos to Insight | Unified Datasets in AWS S3 with Ingest Processor

ActiveMQ Security Hardening: TLS, JAAS, LDAP & CVE Patch Guide

May 4, 2026 By meshIQ In meshIQ

In October 2023, security researchers published CVE-2023-46604, a CVSS 10.0 remote code execution vulnerability in Apache ActiveMQ. Within days, it was being actively exploited in ransomware campaigns. The attack required nothing more than network access to port 61616. No authentication, no credentials, no social engineering. The attacker connected to the standard ActiveMQ port and executed arbitrary code on the server.

Read Post

meshIQ

Read more about ActiveMQ Security Hardening: TLS, JAAS, LDAP & CVE Patch Guide

OpenTelemetry VM Setup Guide: SigNoz Collection Agents Explained

May 4, 2026 By SigNoz - Open Source Observability Platform In SigNoz

About This Video: If you're working with OpenTelemetry, managing collector configurations across environments like VMs can quickly become difficult. In this video, we focus on VM-based setups and walk through how to configure SigNoz Collection Agents step by step. We start with an introduction to VM collection agents, then move into a practical project walkthrough using the OpenTelemetry demo. From there, we explore the documentation, set up configurations, run the collector, and finally validate everything inside SigNoz.

View Video

SigNoz

Read more about OpenTelemetry VM Setup Guide: SigNoz Collection Agents Explained

Vault, Enrollment Templates, and Expanded Connectivity

May 4, 2026 By VirtualMetric In VirtualMetric

The latest VirtualMetric DataStream release (version 1.10.1) focuses on three things security and infrastructure teams consistently need: better credential management, faster agent deployment at scale, and broader connectivity. Here’s what’s new.

Read Post

VirtualMetric

Read more about Vault, Enrollment Templates, and Expanded Connectivity

Get Observability in the Terminal, for You and Your Agents: gcx

May 4, 2026 By Grafana In Grafana

The way you write code is changing, which means the way you observe your systems and respond to issues needs to change, too. Engineers today spend much of their day working via command line, as agentic tools like Cursor and Claude Code have become highly effective at handling many day-to-day engineering tasks. This greatly accelerates code generation, but it doesn't solve for the context switching that comes when you have to jump into another tool that's not part of this new, faster workflow.

View Video

Grafana

Read more about Get Observability in the Terminal, for You and Your Agents: gcx

Accelerating MTTR with Faster Root Cause Diagnosis: AI Advisor Now Supports On-Demand Connectivity, Config Context, and Device Diagnostics

May 4, 2026 By Eric Hian-Cheong In Kentik

Knowing something is broken is easy. Figuring out why is hard. Introducing three new, native AI diagnostic capabilities in the Kentik Network Intelligence Platform to accelerate root cause analysis and keep your network running better.

Read Post

Kentik

Read more about Accelerating MTTR with Faster Root Cause Diagnosis: AI Advisor Now Supports On-Demand Connectivity, Config Context, and Device Diagnostics

AI Diagnostics in Kentik NMS (Network Monitoring System)

May 4, 2026 By Kentik In Kentik

Network problems are easy to spot. Proving root cause is the hard part — and it’s where most of MTTR gets burned. Kentik’s new AI diagnostics in the Network Monitoring System (NMS) close the gap between detection and diagnosis by bringing three capabilities directly into Kentik AI Advisor.

View Video

Kentik

Read more about AI Diagnostics in Kentik NMS (Network Monitoring System)

April 2026: IsDown Users Saved 16.5 Hours with Early Outage Detection

May 3, 2026 By Nuno Tomas In isDown

In April 2026, IsDown's early detection system gave users a 3.6-hour head start on a major outage — plenty of time to implement workarounds before the vendor even acknowledged the problem. Across 45 early detections, our users saved a collective 16.5 hours by knowing about outages an average of 22 minutes before official status pages were updated.

Read Post

isDown

Read more about April 2026: IsDown Users Saved 16.5 Hours with Early Outage Detection

Real-Time Database Monitoring: Solving Database Latency with Zero-Code eBPF Tracing

May 3, 2026 By Jonny Steiner In Coralogix

In high-throughput database environments, a latency spike is rarely a simple story. Modern data layers are distributed, stateful, and constantly changing as shards move, nodes rebalance, caches warm, queries evolve, and connections churn. In practice, spikes usually come from one of three places: For many SRE and Platform teams, the real challenge is disconnected tooling. As one engineering lead recently shared during a technical workshop: “It’s all disconnected.

Read Post

Coralogix

Read more about Real-Time Database Monitoring: Solving Database Latency with Zero-Code eBPF Tracing

What Is SNMP? Gain Real-Time Insights Into Network Performance (2026)

May 2, 2026 By LogicMonitor In LogicMonitor

SNMP is the universal protocol for monitoring network infrastructure, but its real value depends on which version you run, how you secure it, and how well your monitoring tool handles the OID work for you. SNMP (Simple Network Management Protocol) is the standard protocol IT teams use to monitor and manage network devices.

Read Post

LogicMonitor

Read more about What Is SNMP? Gain Real-Time Insights Into Network Performance (2026)

Dark Mode Has Arrived

May 2, 2026 By Matt Rideout In DNS Check

It's 2026, and DNS Check now has a dark mode. Yes, we noticed the year. Better late than dazzling our users at 2 a.m. when an MX record decides to misbehave.

Read Post

DNS Check

Read more about Dark Mode Has Arrived

Kubernetes Monitoring Tools: What Actually Works at Scale

May 2, 2026 By Faiz Shaikh In Last9

What actually works for Kubernetes monitoring at scale — not what looks good in a vendor demo with a five-pod cluster.

Read Post

Last9

Read more about Kubernetes Monitoring Tools: What Actually Works at Scale

Stop ECS Containers From Collapsing Into One Service in OpenTelemetry

May 2, 2026 By Prathamesh Sonpatki In Last9

Why ECS containers collapse under service.name = aws_ecs and how to fix it for both EC2 launch type and Fargate, including the resource-vs-log-record pitfall that quietly breaks log filtering. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Read Post

Last9

Read more about Stop ECS Containers From Collapsing Into One Service in OpenTelemetry

April 2026 Early Warning Signals

May 1, 2026 By Colin Bartlett In StatusGator

April saw widespread disruptions across SaaS platforms, developer tools, and cloud services, with login failures, pipeline issues, and general service outages among the most common problems. StatusGator’s Early Warning Signals consistently identified these incidents ahead of official provider updates. In several cases, the lead time was significant. Bitbucket pipeline failures were detected 1 hour 17 minutes before acknowledgment, while Claude performance issues surfaced 59 minutes early.

Read Post

StatusGator

Read more about April 2026 Early Warning Signals

Telemetry Talks ep 4: Retroactive sampling and OpenTelemetry

May 1, 2026 By VictoriaMetrics In VictoriaMetrics

This episode of Telemetry Talks explores the evolution of an OTLP/gRPC tracing pipeline for VictoriaTraces within OpenTelemetry and VictoriaMetrics, including a shift from standard gRPC-Go to a simplified HTTP/2-based implementation to reduce complexity and improve flexibility. Together with the our guest, Jiekun, we revisited the VictoriaMetrics KubeCon talk ideas on tail-based and retroactive sampling — and their impact on the broader OpenTelemetry community.

View Video

VictoriaMetrics

Read more about Telemetry Talks ep 4: Retroactive sampling and OpenTelemetry

When Dashboards Start Teaching the System: Why Selector's Natural Language Querying Matters

May 1, 2026 By Bob Slevin In Selector

Operations teams have lived with the same frustrating tradeoff for years: the data exists, but getting to the right answer often takes too much time and too much expertise. Engineers are expected to know platform-specific query languages, navigate layers of dashboards, and understand exactly where the right visualization lives before they can even begin troubleshooting. That approach can work in smaller environments, but as infrastructure grows more distributed and complex, it becomes a bottleneck.

Read Post

Selector

Read more about When Dashboards Start Teaching the System: Why Selector's Natural Language Querying Matters

Grafana K6 Community Call - Secrets Management & K6 AI Script Authoring

May 1, 2026 By Grafana In Grafana

In this Community call, we'll be discussing recent K6 updates. From new secrets management feature to Grafana AI assistant integration with K6. Hosts: Bukola Ayodele, Nicole van der Hoeven Experts: Facundo Batista Vicente Ortega Torres.

View Video

Grafana

Read more about Grafana K6 Community Call - Secrets Management & K6 AI Script Authoring

ActiveMQ Slow Consumer: Detection, Strategy & Prevention Guide

May 1, 2026 By meshIQ In meshIQ

One of the most counterintuitive failure modes in enterprise ActiveMQ deployments is this: a single application team deploys a new consumer for a high-volume market data topic. Their consumer is slow, maybe they added a database write on every message, or their processing thread pool is undersized.

Read Post

meshIQ

Read more about ActiveMQ Slow Consumer: Detection, Strategy & Prevention Guide

Add dynamically updating context to logs with Reference Tables and Observability Pipelines

May 1, 2026 By Micah Kim In Datadog

Security and platform engineering teams rely on context-rich logs to investigate threats, prioritize incidents, and meet compliance requirements. Context is often stored separately from applications that generate logs, in sources like threat intelligence feeds in Snowflake, asset lists in Amazon S3, ownership data in ServiceNow CMDB, and risk scores produced in Databricks.

Read Post

Datadog

Read more about Add dynamically updating context to logs with Reference Tables and Observability Pipelines

Operations | Monitoring | ITSM | DevOps | Cloud