Monthly Archive

Introducing the StatusGator MCP Server

Mar 31, 2026 By Colin Bartlett In StatusGator

Your AI agents can now monitor, triage, and respond to cloud outages autonomously. The way enterprises manage cloud infrastructure incidents is changing. AI agents are no longer just chatbots answering questions — they’re becoming first responders in your incident management pipeline. Today, we’re launching the StatusGator MCP Server, giving AI agents direct, structured access to the full power of StatusGator’s cloud status monitoring platform.

Read Post

StatusGator

Read more about Introducing the StatusGator MCP Server

New Beta feature: Google Cloud Private Status integration!

Mar 31, 2026 By Valeria Kurolapova In StatusGator

We are excited to announce that Google Cloud is the latest addition to our suite of Enterprise infrastructure integrations! While StatusGator has long monitored the public status of Google Cloud services, this new integration goes deeper. You can now monitor the personalized health of your specific Google Cloud projects directly within your StatusGator dashboard.

Read Post

StatusGator

Read more about New Beta feature: Google Cloud Private Status integration!

Sponsored Post

From Silos to Collaboration: How to Democratize Data in Product Analytics

Mar 31, 2026 By David Bunting In ChaosSearch

Companies who develop software products generate massive quantities of product performance and user engagement data that can be analyzed to support decision-making about everything from feature planning and UX design to sales, marketing, and customer support. Leveraging product data throughout the enterprise represents a significant opportunity to achieve a competitive advantage, but challenges like siloed data systems, poor data literacy, and the complexity of data analytics in the cloud can prevent organizations from making full use of their raw data.

Read Post

ChaosSearch

Read more about From Silos to Collaboration: How to Democratize Data in Product Analytics

API Latency Monitoring: Metrics, Percentiles, and Alerting Best Practices

Mar 31, 2026 By Dotcom-Monitor In Dotcom-Monitor

APIs power modern applications. Every login request, product search, payment authorization, and mobile app refresh depends on an API responding quickly and reliably. When latency increases, users feel it immediately. Pages stall. Transactions hang. Confidence drops. Most engineering teams measure API latency. Fewer truly monitor it. There is a difference. Many teams track average latency in dashboards and assume performance is healthy.

Read Post

Dotcom-Monitor

Read more about API Latency Monitoring: Metrics, Percentiles, and Alerting Best Practices

API Endpoint Monitoring: How to Ensure Reliability, Performance & Functional Accuracy

Mar 31, 2026 By Dotcom-Monitor In Dotcom-Monitor

APIs sit at the core of modern digital infrastructure. From e-commerce checkouts and payment processing to SaaS platforms and mobile applications, APIs move the data that keeps systems running. But APIs do not operate as a single unit. They are made up of individual endpoints, and each endpoint represents a specific function or resource that users depend on. As organizations shift toward microservices, cloud native applications, and third party integrations, the number of endpoints increases rapidly.

Read Post

Dotcom-Monitor

Read more about API Endpoint Monitoring: How to Ensure Reliability, Performance & Functional Accuracy

What is MRO? Maintenance, Repair, and Operations Explained

Mar 31, 2026 By Charles Mahler In InfluxData

MRO stands for maintenance, repair, and operations. It refers to the activities, supplies, and services that keep equipment, facilities, and infrastructure running safely and efficiently. Every industry that relies on physical assets depends on MRO, whether that means replacing a worn bearing on a production line, restocking safety gloves in a warehouse, or servicing an HVAC system in a hospital.

Read Post

InfluxData

Read more about What is MRO? Maintenance, Repair, and Operations Explained

Pull Request Velocity as a Proxy for AI Usage for Software Development

Mar 31, 2026 By Sematext In Sematext

While AI have usage has been growing steadily for the last several years, the LLM models noticeably improved around the end of 2025. Specifically, they become more viable for software development. We are seeing the results. The feature and product delivery has picked up. One way to visualize this is by looking at the number of pull requests for your organization / software development teams. This chart shows the number of Github pull requests created by a team. Can you spot when AI usage increased?

Read Post

Sematext

Read more about Pull Request Velocity as a Proxy for AI Usage for Software Development

Monitoring a Roborock Robot Vacuum With Healthchecks.io

Mar 31, 2026 By Pēteris Caune In Healthchecks

I semi-recently bought a used Roborock S5 Max robot vacuum, and installed Valetudo on it. The installation process involves rooting the robot, and gaining SSH access to it. Which got me thinking, could I get the robot to ping Healthchecks.io at regular intervals? When the robot runs into a problem (closes a door after itself, gets stuck, chokes on a loose wire) and cannot return to the base, it eventually shuts down.

Read Post

Healthchecks

Read more about Monitoring a Roborock Robot Vacuum With Healthchecks.io

Accelerate Your OpenTelemetry Migrations With Honeycomb's Agent Skills

Mar 31, 2026 By Austin Parker In Honeycomb

Since releasing our hosted MCP server last year, we've been thrilled to see customers not just adopt it but build Honeycomb deeply into their agentic development and observability workflows. Users have embraced it, leveraging Honeycomb to stay in conversation with their code and understand how it runs in production.

Read Post

Honeycomb

Read more about Accelerate Your OpenTelemetry Migrations With Honeycomb's Agent Skills

What feels fundamentally different about problems or enterprises are bringing up to you today?

Mar 31, 2026 By Virtana In Virtana

AI Doesn’t Add Complexity. It Multiplies It.

View Video

Virtana

Read more about What feels fundamentally different about problems or enterprises are bringing up to you today?

Episode 7 - Shatter Silos with an AI-Centric Enterprise

Mar 31, 2026 By Digitate In Digitate

In this episode of The Intelligent Enterprise, host Tom Stoneman steps outside the day-to-day noise to get inside a challenge a lot of leaders are feeling right now: AI that stays stuck in pockets of the business.

View Video

Digitate

Read more about Episode 7 - Shatter Silos with an AI-Centric Enterprise

AI Needs Better Inputs: Why Observability Is Becoming the Foundation of Enterprise AI Maturity

Mar 31, 2026 By ScienceLogic In ScienceLogic

Organizations across industries are accelerating their investments in AI for operations, yet the path to meaningful impact is proving far more complex than early expectations suggested. Analysts at Gartner, Forrester, Deloitte, and McKinsey continue to highlight the same structural barrier. AI cannot produce accurate predictions or safe automation when the operational data feeding it is fragmented, incomplete, or inconsistent.

Read Post

ScienceLogic

Read more about AI Needs Better Inputs: Why Observability Is Becoming the Foundation of Enterprise AI Maturity

Mastering the Trace Drilldown: How to Reduce MTTR with Coralogix

Mar 31, 2026 By Coralogix In Coralogix

Stop the "Scavenger Hunt" during incidents. In this video, we walk through the new Coralogix Trace Drilldown, now GA for all customers. Learn how to move from high-level trace views to deep span insights in a single, unified workspace—without ever losing context. Whether you're investigating a latency spike or a failing microservice, the Trace Drilldown helps you answer "Where is the bottleneck?" from three different perspectives in one frame. What you’ll learn.

View Video

Coralogix

Read more about Mastering the Trace Drilldown: How to Reduce MTTR with Coralogix

Balancing personal brand, company goals and open source in DevRel can be tricky

Mar 31, 2026 By VictoriaMetrics In VictoriaMetrics

DevRel often means juggling goals that feel completely opposite: building trust while driving adoption, serving developers while supporting business growth. In this short, we explore why these “contradictions” are actually the secret to great Developer Relations.

View Video

VictoriaMetrics

Read more about Balancing personal brand, company goals and open source in DevRel can be tricky

European Compliance Requirements 2026: Key Regulations and Implementation Steps

Mar 31, 2026 By ChangeTower In ChangeTower

The European regulatory landscape in 2026 looks less like a single finish line and more like a marathon with multiple checkpoints happening simultaneously. Organizations that spent years preparing for GDPR now face overlapping deadlines for AI governance, digital accessibility, operational resilience, and supply chain due diligence all converging within the same twelve-month period.

Read Post

ChangeTower

Read more about European Compliance Requirements 2026: Key Regulations and Implementation Steps

Ep 36: Do not resuscitate: Legacy tech in modern medicine

Mar 31, 2026 By Sumo Logic, Inc. In Sumo Logic

In this episode of Masters of Data, we dig into the cybersecurity nightmare that is modern healthcare IT, from ransomware attacks shutting down entire hospitals to IoT medical devices running software older than some of our passwords. We explore why healthcare organizations make such attractive targets for cybercriminals, and why the combination of life-or-death stakes, skeleton-crew security teams, and Windows-95-era equipment is a recipe for chaos.

View Video

Sumo Logic

Read more about Ep 36: Do not resuscitate: Legacy tech in modern medicine

Website Performance Monitoring, Site Speed and SEO

Mar 31, 2026 By Dotcom-Monitor In Dotcom-Monitor

Site speed is no longer a secondary SEO concern — it’s a confirmed ranking factor. Here’s how continuous website monitoring keeps your Core Web Vitals healthy, your uptime reliable, and your search visibility strong.

Read Post

Dotcom-Monitor

Read more about Website Performance Monitoring, Site Speed and SEO

What is Error Tracking? A Beginner's Guide to Monitoring Errors in Production

Mar 31, 2026 By Rollbar In Rollbar

Every app breaks eventually. A button stops working. A checkout flow throws an exception. An API returns a 500 error at 2 AM on a Saturday. The question isn't whether your app will have bugs; it's whether you'll find out before your users do. That's exactly what error tracking is for.

Read Post

Rollbar

Read more about What is Error Tracking? A Beginner's Guide to Monitoring Errors in Production

Digital Trading: Why "Healthy Systems" Still Lose Trades

Mar 31, 2026 By Lily Waldorf In Coralogix

Digital trading firms operate in environments where milliseconds determine profit and loss. During volatile market conditions, platforms can appear fully operational while execution quality quietly degrades. When prices shift in so quickly, even a minor drift in your order-routing path means your competitors are exploiting the delta, while your platform appears perfectly green. For trading firms, observability is not just about uptime.

Read Post

Coralogix

Read more about Digital Trading: Why "Healthy Systems" Still Lose Trades

Improved Iframe integration

Mar 30, 2026 By Valeria Kurolapova In StatusGator

We’ve rolled out a set of improvements to our iframe embed integration to give you more flexibility and a better out-of-the-box experience. Whether you’re embedding your status page into a dashboard, help center, or internal tool, these updates make customization easier and results cleaner.

Read Post

StatusGator

Read more about Improved Iframe integration

Incident Management in 2026: Best Practices, Tools Guide & More

Mar 30, 2026 By Leo Baecker In Hyperping

When systems go down, every minute counts. You need more than just quick fixes. You need a solid system to spot problems early, take action fast, and learn from each incident to keep your users happy. That's what incident management is. In this guide, we'll walk through everything you need to know about incident management, from basic concepts to advanced strategies used by top DevOps teams.

Read Post

Hyperping

Read more about Incident Management in 2026: Best Practices, Tools Guide & More

Website Maintenance Plans: Checklist, Tools, ROI & Cost Breakdown (2026)

Mar 30, 2026 By Leo Baecker In Hyperping

While most businesses invest heavily in website creation, many overlook the ongoing website maintenance plans needed to keep their digital presence performing at its peak. Data from recent studies reveals a harsh truth: 88% of online consumers won't return to a website after encountering technical issues or outdated information.

Read Post

Hyperping

Read more about Website Maintenance Plans: Checklist, Tools, ROI & Cost Breakdown (2026)

Incident Response Automation Guide: Cut MTTR by 33% in 2026

Mar 30, 2026 By Leo Baecker In Hyperping

Every minute matters when you're dealing with a security incident. The longer a breach goes undetected and unresolved, the more damage it can cause to your systems, data, and reputation. But traditional incident response is plagued with challenges: alert fatigue, manual processes, skill shortages, and the sheer complexity of modern IT environments. Security teams are drowning in alerts while struggling to respond quickly enough to the threats that matter.

Read Post

Hyperping

Read more about Incident Response Automation Guide: Cut MTTR by 33% in 2026

DevOps Workflow Strategy for Startups: 7-Step Guide (2026)

Mar 30, 2026 By Leo Baecker In Hyperping

Reliability is the foundation of successful startups. Your product could have the most innovative features, but if it's plagued by downtime or performance issues, customers will eventually jump ship. Fortunately, creating an effective DevOps workflow strategy doesn't have to be complicated. This guide breaks down the essential components and implementation steps that startup DevOps and SRE teams need to focus on.

Read Post

Hyperping

Read more about DevOps Workflow Strategy for Startups: 7-Step Guide (2026)

What is the Citrix License Activation Service (LAS)?

Mar 30, 2026 By Wendy Howard In eG Innovations

One of the hot topics from our recent Citrix-focused webinar was the Citrix License Activation Service (LAS). I had the chance to present alongside George Spiers— Citrix Expert and EUC Architect —and we walked through what LAS is, how it works, and what teams should be aware of.

Read Post

eG Innovations

Read more about What is the Citrix License Activation Service (LAS)?

Logging in Next.js is hard (But it doesn't have to be)

Mar 30, 2026 By Kyle Tryon In Sentry

A typical Next.js deployment can execute code in up to three different runtimes: Edge, Node.js, and the browser. You may already be capturing logs from server-side code, but if you are not capturing the full request from middleware through server rendering to the browser, you are missing a lot of debugging info when things go wrong. TL;DR: A typical Next.js deployment can run in up to three environments; Node, Edge, and the browser.

Read Post

Sentry

Read more about Logging in Next.js is hard (But it doesn't have to be)

Using Loggly for Amazing Customer Service

Mar 30, 2026 By solarwindsinc In SolarWinds

Learn how you can empower your customer support team to trace a customer's problem through log data.

View Video

SolarWinds

Read more about Using Loggly for Amazing Customer Service

Lightrun AI SRE: Quick Look

Mar 30, 2026 By Lightrun In Lightrun

In this video, Dan Putman, Solution Architect at Lightrun, walks you through the power of Lightrun AI SRE. He shows how it transforms automated incident response and platform reliability by correlating signals from Monitoring tools and Incident management systems with live runtime code execution to identify and verify root causes in real time.

View Video

Lightrun

Read more about Lightrun AI SRE: Quick Look

Grafana Cloud Demo in Under 5 minutes | Full Stack Observability and more

Mar 30, 2026 By Grafana In Grafana

Overview & demo of how Cloud provides an end to end Observability Platform that empowers users who have adopted open standards like or to improve their systems reliability using & a shift left approach with performance testing while optimizing their observability costs.

View Video

Grafana

Read more about Grafana Cloud Demo in Under 5 minutes | Full Stack Observability and more

From Trace to Root Cause: Mastering the new Trace Drilldown

Mar 30, 2026 By Jonny Steiner In Coralogix

In a high-pressure incident, speed is everything. During incidents, engineers often jump between tools like traces, logs, and metrics, losing context at every step. This ruins the investigation state, slows down the process, and increases MTTR.

Read Post

Coralogix

Read more about From Trace to Root Cause: Mastering the new Trace Drilldown

OpenClaw Monitoring & Observability with OpenTelemetry and SigNoz

Mar 30, 2026 By SigNoz - Open Source Observability Platform In SigNoz

Learn how to implement monitoring and observability for OpenClaw systems using OpenTelemetry and SigNoz. In this video, we cover how to instrument OpenClaw, collect traces, metrics, and logs, and visualize everything in SigNoz for real-time insights into performance and reliability. You’ll see how to quickly identify bottlenecks, debug issues, and improve system stability in production.

View Video

SigNoz

Read more about OpenClaw Monitoring & Observability with OpenTelemetry and SigNoz

Detecting, Investigating, and Responding to Threats: Best Practices | WhatsUp Gold

Mar 30, 2026 By Progress WhatsUp Gold In WhatsUp Gold

As the speed of cyberattacks accelerates through the use of generative AI, traditional static playbooks are no longer sufficient to maintain organizational resilience. This webinar provides a deep exploration of modern security operations center methodologies that unify detection, investigation, and response into a single, seamless motion. By focusing on practical strategies for reducing alert fatigue and closing visibility gaps at the edge, this session equips decision-makers with the technical criteria to evaluate solutions that offer true forensic clarity.

View Video

WhatsUp Gold

Read more about Detecting, Investigating, and Responding to Threats: Best Practices | WhatsUp Gold

Analyzing round trip query latency

Mar 27, 2026 By Alex Weisberger In Datadog

It’s an all too common scenario: You get paged for some queries timing out, but when you investigate, the database performance looks unchanged. Something must have changed, though. If the database doesn’t look overloaded, where are these timeouts coming from? The answer often lies outside the database itself. Round trip query latency includes every hop between your application and the database, including connection pools, load balancers, and proxies.

Read Post

Datadog

Read more about Analyzing round trip query latency

Real-Time Visibility, Orchestrated Deployments, and More

Mar 27, 2026 By VirtualMetric In VirtualMetric

The latest VirtualMetric DataStream release brings a significant step forward in platform observability and deployment flexibility. Version 1.9.0 gives security and infrastructure teams direct visibility into what’s happening across their pipelines in real time while expanding support for cloud-native environments and broadening connectivity options. Here’s what’s new.

Read Post

VirtualMetric

Read more about Real-Time Visibility, Orchestrated Deployments, and More

Enhancing our API for better agentic consumption

Mar 27, 2026 By Mattias Geniar In Oh Dear

AI coding agents like Claude Code and Codex are becoming a real part of developer workflows. They don't just write code, they call APIs, interpret responses, and take action based on what they find. That means the quality of your API responses directly affects how useful an agent can be. We've shipped a series of improvements to the Oh Dear API with this in mind. Every change helps humans too, but we specifically optimized for how agents consume and reason about data.

Read Post

Oh Dear

Read more about Enhancing our API for better agentic consumption

When IT instability becomes a patient safety risk in healthcare

Mar 27, 2026 By Chanté Frazer In Nexthink

Inside hospitals and health systems, the performance of clinical technology underpins nearly every care workflow and directly influences the timeliness and quality of patient care. Electronic health records sit at the center of admissions, discharge, imaging, lab coordination, and prescribing, so even minor technology friction can become a patient safety and operational risk. At scale, reliability becomes a prerequisite for consistent care.

Read Post

Nexthink

Read more about When IT instability becomes a patient safety risk in healthcare

k8s-monitoring-helm Chart Office Hours (March 2026)

Mar 27, 2026 By Grafana In Grafana

In the March edition of the Kubernetes Monitoring Helm chart office hours, we discuss the version 4.0 major release, the upcoming 4.1 release and features, and we discuss the upcoming deprecation of the 1.x and 2.0 versions.

View Video

Grafana

Read more about k8s-monitoring-helm Chart Office Hours (March 2026)

What are #containers ? #containersecurity #containerization #devops #devsecops

Mar 27, 2026 By Sysdig In Sysdig

View Video

Sysdig

Read more about What are #containers ? #containersecurity #containerization #devops #devsecops

Solving the Ticket Noise Problem: What We Learned from Our ServiceNow Webinar

Mar 27, 2026 By Dallon Robinette In Selector

On March 18th, we hosted a session focused on a challenge that continues to undermine even the most mature IT operations teams: ticket noise. It’s easy to dismiss noise as just “too many alerts”. But as we explored in the webinar, the real issue runs deeper. Ticket noise is a symptom of something more fundamental — a lack of correlation, context, and shared visibility across the stack.

Read Post

Selector

Read more about Solving the Ticket Noise Problem: What We Learned from Our ServiceNow Webinar

The Benefits of Historical Data for Network Monitoring

Mar 27, 2026 By Andrii Kernitskyi In Obkio

Your phone rings. A user is complaining that “the network was slow" or "had issues around 3pm." You run a speed test. Green across the board. No active alerts. Everything looks fine. So what do you tell them? If you don't have a continuous, time-stamped record of what your network was doing at 3pm, you can't tell them anything, not with confidence. You're stuck choosing between "I didn't see anything" and "I'll keep an eye on it," neither of which fixes the problem or satisfies the user.

Read Post

Obkio

Read more about The Benefits of Historical Data for Network Monitoring

Configuring JavaScript caches for better performance

Mar 27, 2026 By Addie Beach In Datadog

Caching is critical to modern apps, helping you serve data more quickly and improve your app’s overall performance. Effective caches can result in better Core Web Vitals (CWV), improved search visibility, and a smoother user experience.

Read Post

Datadog

Read more about Configuring JavaScript caches for better performance

Observability and Security for the AI Era

Mar 27, 2026 By Datadog In Datadog

Datadog has always been driven by a broader vision of helping teams understand and operate complex systems. In this session, you’ll hear from Yrieix Garnier, VP of Product, and Hugo Kaczmarek, Senior Director of Product, as they share the latest updates across the Datadog product suite and discuss how that vision continues to shape the platform’s evolution and support the next generation of AI-driven applications.

View Video

Datadog

Read more about Observability and Security for the AI Era

Setting up NTP Check on Uptime.com

Mar 27, 2026 By Uptime Website Monitoring In uptime

Welcome to Uptime.com! In this video, we'll guide you through the process of setting up and configuring an NTP Check. Learn how to log in, navigate the Monitoring section, and complete the setup, including intervals, contacts, locations, and advanced settings. Also, find out how to handle NTP alerts and verify time offsets.

View Video

uptime

Read more about Setting up NTP Check on Uptime.com

Sponsored Post

Did we miss the end of LaMa?

Mar 26, 2026 By Avantra Team In Avantra

In mid-2024, SAP announced the discontinuation of SAP Landscape Management ("LaMa"). This wasn't a huge surprise, as the 2027 end-of-support date aligns neatly with Solution Manager, Focused Run and standard support for ECC. SAP also terminated work on SAP Landscape Management Cloud, discontinuing that product immediately. Responses to the post were as expected: customers asking, "What now?" and even expressing a bit of dismay. One response from SAP was especially telling: "Moving the ERP system to the cloud hands over the tasks realized with SAP Landscape Management to SAP as cloud vendor.

Read Post

Avantra

Read more about Did we miss the end of LaMa?

The Observability Gap: Why Monitoring Data Should Drive Tests

Mar 26, 2026 By Matt LeRay In Speedscale

Most teams already know a lot about production. They have dashboards. They have traces. They have alerts. They have enough telemetry to explain what happened after an incident and enough graphs to argue about it for the rest of the week. Then they go to test a change and start from scratch. The integration tests hit a hand-written mock that returns {"status": "ok"}. The load tests replay a CSV somebody exported months ago. Staging is close enough to production right up until it matters.

Read Post

Speedscale

Read more about The Observability Gap: Why Monitoring Data Should Drive Tests

Observability Is Now a Boardroom Priority Even If Nobody Wants to Say It Out Loud

Mar 26, 2026 By ScienceLogic In ScienceLogic

Executives rarely state the full truth publicly, but inside boardrooms the conversation has changed. Observability, once viewed as a technical capability deep within operations, has become a strategic requirement for understanding business performance. Leaders may not always use the term itself, yet they focus intensely on the outcomes it promises. Their environments have grown too fast, too fragmented, and too interdependent for traditional visibility approaches to keep pace.

Read Post

ScienceLogic

Read more about Observability Is Now a Boardroom Priority Even If Nobody Wants to Say It Out Loud

Debunking the Myth of the Homogeneous Network

Mar 26, 2026 By Mehul Patel In Broadcom

If you have been in network operations for more than a week, you know the dream of the single vendor shop is exactly that, just a dream. In the practical reality of your daily job, the network is a diverse, chaotic ecosystem. It is a complex stack in which layers of technology from different times and vendors coexist, often uneasily.

Read Post

Broadcom

Read more about Debunking the Myth of the Homogeneous Network

Monitoring Your App Without Running Your Own Prometheus Stack

Mar 26, 2026 By Osinachi Okpara In AppSignal

Prometheus and Grafana are the default monitoring recommendations across DevOps blogs, Reddit, and Hacker News, and for good reason. Prometheus is open-source and backed by the CNCF, but it’s not actually a complete monitoring system. It’s more of a metric collection engine.

Read Post

AppSignal

Read more about Monitoring Your App Without Running Your Own Prometheus Stack

Beyond the Queue: Modernizing Legacy Middleware with Apache Kafka 4.x

Mar 26, 2026 By meshIQ In meshIQ

Apache Kafka 4.x eliminates the final barriers to legacy middleware modernization. With KRaft mode removing ZooKeeper dependency and native queue semantics bridging the gap, enterprises can finally transition from point-to-point messaging to event-driven architectures.

Read Post

meshIQ

Read more about Beyond the Queue: Modernizing Legacy Middleware with Apache Kafka 4.x

Olivier Pomel and Alexis Lê-Quôc on Datadog's origin, AI, and more | This Month in Datadog

Mar 26, 2026 By Datadog In Datadog

Get an insider’s view of Datadog from the people who built it. On a special episode of This Month in Datadog, co-founders Olivier Pomel and Alexis Lê-Quôc sit down for a rare, in-depth look at the challenge that inspired them to build the Datadog platform, what the company is working on today, AI, and more. This Month in Datadog brings you the latest updates on our newest product features, announcements, resources, and events.

View Video

Datadog

Read more about Olivier Pomel and Alexis Lê-Quôc on Datadog's origin, AI, and more | This Month in Datadog

60 Second Segment on Incident Management releases

Mar 26, 2026 By Datadog In Datadog

Every second counts during an incident. In 60 seconds, see how five new Incident Management releases can help you more easily stay up to date and collaborate. Check out these announcements and more on This Month in Datadog.#shorts.

View Video

Datadog

Read more about 60 Second Segment on Incident Management releases

What Are Containers? (And Why "It Works on My Machine" Finally Dies)

Mar 26, 2026 By Sysdig In Sysdig

What are containers in DevOps—and why do they solve the classic “it works on my machine” problem? In this episode of Cloud Security in a Minute, Sysdig breaks down containers in simple terms: what they are, how they work, and why they’ve become the backbone of modern cloud applications. You’ll learn: Containers package everything an application needs—code, dependencies, and system tools—so it runs consistently anywhere: your laptop, the cloud, or at massive scale.

View Video

Sysdig

Read more about What Are Containers? (And Why "It Works on My Machine" Finally Dies)

What is in #kubernetes #kubernetescourse #kubernetestraining #kubernetescluster #kubernetestutorial

Mar 26, 2026 By Sysdig In Sysdig

View Video

Sysdig

Read more about What is in #kubernetes #kubernetescourse #kubernetestraining #kubernetescluster #kubernetestutorial

Monitor Nutanix clusters, hosts, and VMs with Datadog

Mar 26, 2026 By Mahip Deora In Datadog

Nutanix is a hyperconverged infrastructure (HCI) platform that combines compute, storage, and virtualization into a single software-defined stack. By collapsing traditional infrastructure tiers into one platform, Nutanix simplifies provisioning and operations for virtualized workloads. Clusters are managed through Prism Central, which provides visibility into health, performance, capacity, and operational activity across hosts and VMs.

Read Post

Datadog

Read more about Monitor Nutanix clusters, hosts, and VMs with Datadog

Datadog achieves ISO 42001 certification for responsible AI

Mar 26, 2026 By Aaron Ta In Datadog

As AI-powered products and services become central to how organizations operate, the need for responsible AI governance has never been greater. Customers, partners, and regulators are seeking assurance that AI systems are built, managed, and monitored responsibly and effectively. Datadog is committed to the responsible use of AI, both in how we build our products and in how we help customers observe their AI workloads.

Read Post

Datadog

Read more about Datadog achieves ISO 42001 certification for responsible AI

Introducing Bits AI Dev Agent for Code Security

Mar 26, 2026 By Kassen Qian In Datadog

As organizations adopt AI-assisted development and increase their release velocity, they are not only generating more code but also finding more vulnerabilities from static analysis. The traditional remediation workflow of manually triaging issues, creating tickets, and opening individual pull requests (PRs) cannot keep pace. Fixing tens of thousands of vulnerabilities one by one is not a viable remediation strategy.

Read Post

Datadog

Read more about Introducing Bits AI Dev Agent for Code Security

Automate Your Monitoring and Incident Handling: How Agents Dominate the Checkly CLI

Mar 26, 2026 By Checkly In Checkly

50% of Checkly's CLI users are already coding agents. We predict that agents will become dominant by the end of 2026. This video demonstrates an agentic workflow where an alert reports a broken Shopify store login flow, and Claude Code, using the installed Checkly Skill and the Checkly CLI, pulls monitoring results, identifies a Playwright test failure, investigates the codebase, finds and fixes a bug, and then updates a Checkly status page by creating an incident.

View Video

Checkly

Read more about Automate Your Monitoring and Incident Handling: How Agents Dominate the Checkly CLI

How to Reduce MTTR with AI

Mar 26, 2026 By Margo Poda In LogicMonitor

The quick download: AI reduces MTTR by helping teams detect issues sooner, pinpoint root causes faster, and resolve incidents with less manual effort. IT downtime costs organizations an average of $9,000 per minute. AI-powered observability can cut incident resolution time by up to 70%. Here’s what it takes to get there. Every minute an incident goes unresolved, the meter is running.

Read Post

LogicMonitor

Read more about How to Reduce MTTR with AI

Checkly and the Agentic Software Layer

Mar 26, 2026 By Hannes Lenke In Checkly

November 24th, the Opus 4.5 release turned around the entire tech industry. This was the moment when agents became capable. Capable enough to write solid staff-level code. Capable enough to reason about alerts, investigate root causes much faster than most engineers, and set up the reliability layer faster. For me, this feels like an iPhone moment on steroids; the adoption of AI is accelerating much faster than any adoption curve I’ve seen over the past few decades.

Read Post

Checkly

Read more about Checkly and the Agentic Software Layer

One CLI, Two Audiences: How We Built for Agents and Human

Mar 26, 2026 By Stefan Judis In Checkly

Half of the Checkly CLI users are already coding agents. This is not a prediction — it's what the data shows today. Since February, more and more agents have been using the CLI to manage and configure their Checkly monitoring setups. Right now, we're at 50% human and 50% agentic CLI users. And we predict that by the end of 2026, it won't be humans using the CLI; the agents will have taken over. The terminal became the primary interface for AI agents doing real work in the Checkly ecosystem.

Read Post

Checkly

Read more about One CLI, Two Audiences: How We Built for Agents and Human

Telegraf Enterprise Beta is Now Available: Centralized Control for Telegraf at Scale

Mar 26, 2026 By Scott Anderson In InfluxData

Telegraf is incredibly good at what it does: collecting metrics, logs, and events from just about anywhere and sending them wherever you need. But once Telegraf becomes part of your production telemetry pipeline, spread across environments, teams, regions, and edge locations, the hard part isn’t installing agents; it’s operating them. Configs drift. “Temporary” overrides linger. Rolling out changes across hundreds (or thousands) of agents becomes a careful, manual process.

Read Post

InfluxData

Read more about Telegraf Enterprise Beta is Now Available: Centralized Control for Telegraf at Scale

The AI Partner Transforming IT Operations

Mar 26, 2026 By ScienceLogic In ScienceLogic

What if your IT operations platform didn’t just alert you to problems but actually understood, explained, and guided you to the best outcomes? In this video, ScienceLogic CEO Dave Link dives into Skylar Advisor, an AI-native partner designed to transform how teams manage complex IT environments.

View Video

ScienceLogic

Read more about The AI Partner Transforming IT Operations

Finding performance bottlenecks with Pyroscope and Alloy: An example using TON blockchain

Mar 26, 2026 By Anatoly Korniltsev In Grafana

Performance optimization often feels like searching for a needle in a haystack. You know your code is slow, but where exactly is the bottleneck? This is where continuous profiling comes in. In this blog post, we’ll explore how continuous profiling with Alloy and Pyroscope can transform the way you approach performance optimization.

Read Post

Grafana

Read more about Finding performance bottlenecks with Pyroscope and Alloy: An example using TON blockchain

Uptime.com Mobile App Walkthrough

Mar 26, 2026 By Uptime Website Monitoring In uptime

This video is an overview of the Uptime.com Mobile App.

View Video

uptime

Read more about Uptime.com Mobile App Walkthrough

Getting Scout Data Into Your AI Workflow

Mar 26, 2026 By Quinn Milionis In Scout

If you’ve spent any time in developer tooling lately, you’ve probably noticed a pattern: every product is rushing to add a chatbot, an AI summary, or some kind of “magic” button. We get it — it’s tempting. But at Scout, we’ve been deliberately taking a different approach. Instead of building AI into our product first, we’ve focused on making Scout’s data accessible to the AI tools you’re already using.

Read Post

Scout

Read more about Getting Scout Data Into Your AI Workflow

Securing the Future: Scaling AI, Sovereignty, and Resilience in ANZ ITOps

Mar 25, 2026 By solarwindsinc In SolarWinds

Enterprises in Australia and New Zealand are accelerating AI adoption, driven by strong digital trust frameworks. To remain competitive and compliant, the IT Operations (ITOps) landscape must evolve to manage hybrid complexity and persistent cyber risks. Join us for an exclusive, in-depth webinar as IDC and SolarWinds explore the strategic investments and unique challenges shaping future-proof ITOps across the ANZ region.

View Video

SolarWinds

Read more about Securing the Future: Scaling AI, Sovereignty, and Resilience in ANZ ITOps

Scary Things Happen in Production. Context Helps You Find Them.

Mar 25, 2026 By Charity Majors In Honeycomb

Production is a rowdy place of chaos, especially at scale. When you have millions of requests per second flowing through your system, weird things are always happening. Outliers, unusual request patterns, spikes and pulses of traffic from unknown sources, port scanning…it’s all there. To the naked eye, it looks like noise. If you know what you are looking for…patterns emerge. The night sky: every dot is a request. Without intent, it's an undifferentiated field of light.

Read Post

Honeycomb

Read more about Scary Things Happen in Production. Context Helps You Find Them.

A new Host Map for modern infrastructure

Mar 25, 2026 By Amy Zhou In Datadog

A host map is a visual representation of your infrastructure that displays hosts and related resources such as clusters, pods, and containers in a single, interactive view. We introduced the Datadog Host Map more than a decade ago to help you “know thy infrastructure” and answer critical questions: Does everything look healthy? Has anything changed? Does the shape of my environment match what I expect?

Read Post

Datadog

Read more about A new Host Map for modern infrastructure

Monitor Juniper Mist in Datadog

Mar 25, 2026 By Angelina Jin In Datadog

From point-of-sale (POS) terminals to cloud-based applications and mobile devices, reliable connectivity is critical to business operations. Even brief disruptions can negatively impact user experiences, resulting in failed transactions, delayed application responses, or repeated attempts to reconnect. Juniper Mist is an AI-powered networking platform that provides insight into wireless environments, including access point performance and radio frequency health.

Read Post

Datadog

Read more about Monitor Juniper Mist in Datadog

An Oh Dear skill for use in Claude Code or Codex

Mar 25, 2026 By Mattias Geniar In Oh Dear

AI coding agents are getting good at calling tools. Claude Code, Codex, and others can run shell commands, parse JSON, and reason about the results. But they need to know what tools are available and how to use them. That's what skills are for. A skill is a small package of documentation that teaches an AI agent how to use a specific tool. We've built one for Oh Dear.

Read Post

Oh Dear

Read more about An Oh Dear skill for use in Claude Code or Codex

Smarter Alerts, Faster Root Cause, & Proactive IT Ops with SolarWinds AI Observability

Mar 25, 2026 By solarwindsinc In SolarWinds

Discover how AI is transforming IT operations with SolarWinds Observability. In this video, we showcase powerful new AI-driven features designed to help you detect issues faster, reduce alert noise, and stay ahead of performance problems across your entire stack. From applications and databases to networks, cloud infrastructure, and end-user experience SolarWinds AI delivers deep insights where it matters most.

View Video

SolarWinds

Read more about Smarter Alerts, Faster Root Cause, & Proactive IT Ops with SolarWinds AI Observability

When Code Becomes Cheap: The New Reliability Constraint in Software Engineering

Mar 25, 2026 By James Barnes In StatusCake

For most of the history of software engineering, the primary constraint was production. Code was expensive, skilled engineers were scarce, and shipping features required concentrated human effort. Velocity was limited by how fast people could reason, implement, test, and deploy. That constraint shaped everything from team size, architecture, release cadence, through to how we thought about technical debt. When production is expensive, you optimise for output. You remove friction from shipping.

Read Post

StatusCake

Read more about When Code Becomes Cheap: The New Reliability Constraint in Software Engineering

From raw data to flame graphs: A deep dive into how the OpenTelemetry eBPF profiler symbolizes Go

Mar 25, 2026 By Marc Sanmiquel In Grafana

Imagine you're troubleshooting a production issue: your application is slow, the CPU is spiking, and users are complaining. You turn to your profiler for answers—after all, this is exactly what it's built for. The profiler runs, collecting thousands of stack samples. eBPF profilers, including the OpenTelemetry eBPF profiler, operate at the kernel level, so they capture raw program counters: memory addresses pointing into your binary.

Read Post

Grafana

Read more about From raw data to flame graphs: A deep dive into how the OpenTelemetry eBPF profiler symbolizes Go

Sentry Demo: Debugging for Game Developers

Mar 25, 2026 By Sentry In Sentry

Try Sentry for free: https://sentry.io/welcome
Docs: https://docs.sentry.io

View Video

Sentry

Read more about Sentry Demo: Debugging for Game Developers

How to Measure MOS Score for VoIP (Step-by-Step)

Mar 25, 2026 By Alyssa Lamberti In Obkio

Poor voice call quality isn't just annoying, it's a productivity killer. Dropped calls mid-negotiation, garbled audio on client meetings, and one-sided conversations where half the words don't make it through: these aren't random technical glitches. They're symptoms of network performance problems that haven't been identified, measured, or fixed. And when your business runs on VoIP, Microsoft Teams, or any cloud-based communication platform, unmeasured voice quality is a liability.

Read Post

Obkio

Read more about How to Measure MOS Score for VoIP (Step-by-Step)

Beyond the Data Lake: Leading Cross-Domain Operational Intelligence

Mar 25, 2026 By Kamal Hathi In Splunk

As we wrap up RSAC, one theme that repeatedly emerged in conversations with security leaders is that the modern enterprise has reached a critical inflection point where the velocity of machine-generated telemetry has outpaced the capacity of traditional architectures. This trend requires an approach that moves beyond the storage of information to the activation of it in ways that don’t simply exacerbate alert fatigue.

Read Post

Splunk

Read more about Beyond the Data Lake: Leading Cross-Domain Operational Intelligence

Migrating from ManageEngine OpManager to WhatsUp Gold: A Practical, No Nonsense Guide

Mar 25, 2026 By Jirka Knapek In WhatsUp Gold

If you’re planning to move from ManageEngine OpManager to the Progress WhatsUp Gold solution, this guide outlines key differences, recommended migration steps, and practical checks to help you transition with minimal disruption. It also includes an example script you can use to start monitoring imported devices in the WhatsUp Gold solution.

Read Post

WhatsUp Gold

Read more about Migrating from ManageEngine OpManager to WhatsUp Gold: A Practical, No Nonsense Guide

ROI of AI: How CIOs Measure Real Business Impact

Mar 25, 2026 By Arpit Sharma In Motadata

Since the advent of Artificial Intelligence (AI), it has become the buzzword for modern day businesses. It has tremendous benefits which has lured enterprises invest hefty money with a view of getting ahead of their competitors. Yet, many CIOs are still figuring out ways to get the best ROI of AI that resonates with their businesses. While there are many initial programs and proof of concepts that show promise, in the long run they fail to deliver their promise.

Read Post

Motadata

Read more about ROI of AI: How CIOs Measure Real Business Impact

How to Automate Your Entire Cloud Deployment Lifecycle with IaC

Mar 25, 2026 By OpsMatters In OpsMatters

In today's digital world, businesses depend on cloud infrastructure to run applications, manage data, and deliver services smoothly. However, managing cloud environments manually can quickly become complex and time-consuming. Teams often deal with repeated tasks, inconsistent setups, and unexpected errors.

Read Post

OpsMatters

Read more about How to Automate Your Entire Cloud Deployment Lifecycle with IaC

Cribl Search Demo: Security Investigation

Mar 24, 2026 By Cribl In Cribl

In this demo, Nate Zemanek , Staff Solutions Engineer, shows how Cribl Search runs fast investigations. As an open data platform, Cribl Search lets you pull data from multiple sources and query everything from a single pane of glass. You’ll see how to run fast queries with the new lakehouse engine, search historical data with a federated approach, and bring everything together for full context. Then, use Notebooks to collaborate and share findings across teams to understand what happened—faster.

View Video

Cribl

Read more about Cribl Search Demo: Security Investigation

Next.js observability gaps and how to close them

Mar 24, 2026 By Sergiy Dybskiy In Sentry

This blog is based on a recent live workshop. You can watch the the full livestream on Youtube. Next.js gives you a lot for free; server-side rendering, file-based routing, edge runtimes. What it doesn’t give you is a clear picture of what’s actually happening in production.

Read Post

Sentry

Read more about Next.js observability gaps and how to close them

How a Runtime Aware AI SRE Agent Transforms System Reliability

Mar 24, 2026 By Lightrun Team In Lightrun

A runtime aware AI SRE extends existing AI SRE approaches by moving beyond telemetry correlation into runtime-validated reliability. While the majority of AI SRE tools accelerate incident triage using logs, metrics, and traces, they cannot confirm execution behavior if critical runtime signals were never captured. By generating on-demand evidence inside running services, AI SRES can eliminate slow redeploy cycles, ensuring your distributed systems remain resilient under real-world traffic conditions.

Read Post

Lightrun

Read more about How a Runtime Aware AI SRE Agent Transforms System Reliability

Top Root Cause Analysis Tools Built for Runtime Context

Mar 24, 2026 By Lightrun Team In Lightrun

Root cause analysis tools are designed to help engineering teams understand why failures happen in production and other remote environments. As modern systems become more distributed and input-dependent, many incidents cannot be reproduced outside live environments. The stakes are significant: high-impact IT outages cost organizations a median of $2 million per hour, with annual downtime costs reaching $76 million per organization.

Read Post

Lightrun

Read more about Top Root Cause Analysis Tools Built for Runtime Context

The Hidden Tax of Complexity: Why Modern Environments Cost More Than Leaders Realize

Mar 24, 2026 By ScienceLogic In ScienceLogic

Enterprises rarely notice the moment complexity begins to reshape their environment. Growth initiatives move forward. New cloud services are adopted. Modernization programs introduce new architectures. Business units implement tools that solve immediate problems. Acquisitions add their own ecosystems. Each change is logical in isolation. The cumulative effect becomes something else entirely.

Read Post

ScienceLogic

Read more about The Hidden Tax of Complexity: Why Modern Environments Cost More Than Leaders Realize

AI, Anxiety & 400 Open Windows: GEOFF WRIGHT RETURNS

Mar 24, 2026 By Nexthink In Nexthink

Geoff Wright returns to unpack the messy reality of work in the AI era. From having 400 windows open and feeling less productive, to explaining why AI should fuel curiosity rather than replace human judgment, Geoff brings his usual mix of optimism, humor, and hard-earned perspective. The conversation explores prompt engineering, digital overwhelm, enterprise adoption, and why “being human first” matters more than ever. It is a wide-ranging, thoughtful discussion on anxiety, complexity, and the promise of AI, with a surprisingly funny detour into why the robots might eventually just leave Earth for Pluto.

View Video

Nexthink

Read more about AI, Anxiety & 400 Open Windows: GEOFF WRIGHT RETURNS

How to Protect Website Monitoring from Cloud Disruptions

Mar 24, 2026 By Pingdom In SolarWinds

The cloud is often spoken of as a separate realm where data exists safely away from the messy realities of the physical world. But as the events of March 2026 have reminded us, the cloud has a physical home, and that home is susceptible to the same disruptions as any other infrastructure. Here’s how diversified monitoring across independent data centers can keep visibility intact when cloud services go down.

Read Post

SolarWinds

Read more about How to Protect Website Monitoring from Cloud Disruptions

Mastering DX Netops Upgrade Automation

Mar 24, 2026 By Saurabh Sharma In Broadcom

Upgrading a large DX NetOps environment with multiple components across distributed infrastructure can be a challenging endeavor. Network interruptions, time-consuming validations, and the need for detailed diagnostics have been persistent pain points for administrators. With the release of version 25.4.6 of the DX NetOps Upgrade Automation Tool, we've addressed these challenges head-on. This release introduces powerful new capabilities that fundamentally change how you approach upgrade operations.

Read Post

Broadcom

Read more about Mastering DX Netops Upgrade Automation

Coralogix Earns 196 Badges in G2 Spring 2026 Reports Across 15 Categories

Mar 24, 2026 By Coralogix Team In Coralogix

We’re proud to announce that Coralogix has earned 196 badges across 15 categories in the G2 Spring 2026 Reports, our strongest G2 performance to date. Placing in 369 reports, this represents a significant leap from Spring 2025, when we placed in 318 reports and earned 141 badges. These results are a direct reflection of the trust our customers place in Coralogix and their willingness to share honest feedback on the world’s largest software review platform.

Read Post

Coralogix

Read more about Coralogix Earns 196 Badges in G2 Spring 2026 Reports Across 15 Categories

Dashboard Server 7.1 Release Webinar

Mar 24, 2026 By SquaredUp In Squared Up

Discover the latest innovations in Dashboard Server 7.1 — including powerful new Dashboard Variables, an all-new tile, enhancements to the Alert Tile, and much more.

View Video

Squared Up

Read more about Dashboard Server 7.1 Release Webinar

DHCP Ports 67 and 68 Explained: How DHCP Works

Mar 24, 2026 By Arpit Sharma In Motadata

Every device that connects to a network needs a unique IP address to communicate. Manually assigning addresses to hundreds or thousands of endpoints is impractical and error-prone. This is where the Dynamic Host Configuration Protocol (DHCP) comes in, automating IP address management.

Read Post

Motadata

Read more about DHCP Ports 67 and 68 Explained: How DHCP Works

The Role of Employee Monitoring in Securing Remote Teams: A Comprehensive Guide

Mar 24, 2026 By Prateek Arora In OpsMatters

How secure is your organisation when employees work from anywhere? Remote work has transformed how modern teams collaborate. They offer flexibility, broader talent pools, and improved productivity. Still, it has also introduced new cybersecurity challenges. 92% of IT professionals believe remote work has increased cybersecurity threats, even as organisations struggle to secure remote access points, home networks, and personal devices.

Read Post

OpsMatters

Read more about The Role of Employee Monitoring in Securing Remote Teams: A Comprehensive Guide

Observability Lessons From OpenAI

Mar 23, 2026 By Pablo Fernandez In VictoriaMetrics

Writing code is moving from the good old IDE into the realm of autonomous AI agents. One example of this is OpenAI, which has been developing internally with 0 lines of manually written code. You can read about their workflow in their engineering blog: Harness engineering: leveraging Codex in an agent-first world. For me, the main takeaway of OpenAI’s article is how AI has rewritten the constraints equation.

Read Post

VictoriaMetrics

Read more about Observability Lessons From OpenAI

Leveraging Cognitive Diversity to Tackle System Complexity

Mar 23, 2026 By Nick Travaglini In Honeycomb

Most engineering leaders today understand that diversity matters. They've built teams that reflect a range of backgrounds, functions, and experience levels. They run postmortems, retrospectives, and architecture reviews that bring multiple voices to the table. They believe, not unreasonably, that this variety of perspectives leads to better decisions. But there's a problem hiding inside that assumption that can undermine everything: who people are is a surprisingly poor predictor of how they think.

Read Post

Honeycomb

Read more about Leveraging Cognitive Diversity to Tackle System Complexity

Dark mode is now available for the Oh Dear dashboard

Mar 23, 2026 By Mattias Geniar In Oh Dear

Oh Dear's dashboard now supports dark mode. You can choose between light, dark, or system-based theming, and your preference is saved to your profile so it follows you everywhere.

Read Post

Oh Dear

Read more about Dark mode is now available for the Oh Dear dashboard

Icinga Installation Guide - Part 2 - Installing Icinga Director and configuring your first objects

Mar 23, 2026 By Icinga In Icinga

Take the next step with Icinga by adding the powerful configuration management tool Icinga Director to your setup. In this second part of our installation guide, we focus on simplifying and scaling your configuration using the Director. You’ll learn how to connect it to your existing Icinga 2 instance, create reusable templates, and start monitoring hosts and services through a more flexible, web-based interface.

View Video

Icinga

Monitoring

Read more about Icinga Installation Guide - Part 2 - Installing Icinga Director and configuring your first objects

Icinga Installation Guide - Part 1 - Getting started with a base Icinga Installation

Mar 23, 2026 By Icinga In Icinga

Get up and running with Icinga 2 and Icinga Web in this step-by-step installation guide. In this video, we walk you through a complete base installation of Icinga, covering everything from setting up the database to accessing the web interface for the first time. This will help you get to the point of a working installation, especially if you're new to Icinga. We take you through the full process, including installing required components, configuring databases, enabling services, and completing the web setup wizard.

View Video

Icinga

Monitoring

Read more about Icinga Installation Guide - Part 1 - Getting started with a base Icinga Installation

Internet Speed Monitoring - How to Proactively Test Your Internet Connections

Mar 23, 2026 By Babu Sundaram In eG Innovations

Recent enhancements to eG Enterprise have added functionality to allow you to proactively test your internet speed with synthetic monitoring (“robot” tests that simulate real user activity). Using the new functionality you can proactively monitor internet speeds 24×7 from any location. The performance and quality of an Internet connection plays a major role in any IT environment. Use cases for this new functionality include.

Read Post

eG Innovations

Read more about Internet Speed Monitoring - How to Proactively Test Your Internet Connections

How to Communicate the Value of DEX Across Your Organization

Mar 23, 2026 By Megan Brake In Nexthink

For many EUC and Digital Workplace leaders, the challenge with digital employee experience (DEX) isn’t the technology, it’s building alignment. You can see the data. You know where friction exists. You can quantify disruption, productivity loss, and inefficiencies. But you struggle to achieve your targets, because you need buy in from other teams, and right now, they don’t want to hear anything about DEX. Security has different priorities. Application owners are focused on releases.

Read Post

Nexthink

Read more about How to Communicate the Value of DEX Across Your Organization

Monitor Oracle Fusion Cloud Applications with Datadog

Mar 23, 2026 By Ellie Cohen In Datadog

Many organizations rely on Oracle Fusion Cloud Applications to run core business workflows across finance, HR, and supply chain operations. Because these SaaS-based applications run on Oracle Cloud Infrastructure (OCI), engineering teams have limited visibility into their performance. Without direct access to the underlying stack, they often lack the signals needed to detect regressions or investigate degraded user experience.

Read Post

Datadog

Read more about Monitor Oracle Fusion Cloud Applications with Datadog

Explore Kubernetes with native OpenTelemetry data

Mar 23, 2026 By Allie Rittman In Datadog

Kubernetes environments generate a constant stream of signals across clusters, nodes, pods, and workloads. For teams that have standardized on OpenTelemetry (OTel), maintaining ownership of that data is critical. But in practice, many observability platforms require translation into vendor-specific data formats, leading to fragmented product experiences, blank dashboards, and uncertainty about data integrity.

Read Post

Datadog

Read more about Explore Kubernetes with native OpenTelemetry data

Annotate traces to improve LLM quality with Datadog LLM Observability

Mar 23, 2026 By Rashel Hoover In Datadog

LLM applications rarely crash. They degrade quietly. Once these applications are shipped to production, subtle quality failures become harder to catch with traditional signals. Tone shifts, hallucinated details, off-topic responses, and incomplete reasoning can emerge while latency and token usage look stable.

Read Post

Datadog

Read more about Annotate traces to improve LLM quality with Datadog LLM Observability

Autonomous IT: What It Is and How to Get Started

Mar 23, 2026 By Sofia Burton In LogicMonitor

Autonomous IT is the operating model where systems detect, decide, and act so your engineers spend less time fighting fires and more time defining what ‘good’ looks like. On a typical day, a mid-size enterprise generates tens of thousands of alerts across on-prem infrastructure, multiple clouds, and AI workloads, including every endpoint. Most of them don’t need a human. A few of them do, and telling the difference, fast enough to matter, is where IT teams are losing ground.

Read Post

LogicMonitor

Read more about Autonomous IT: What It Is and How to Get Started

How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Mar 23, 2026 By Chris Watts In Grafana

Chris Watts is Head of Enterprise Engineering at OpenRouter, building infrastructure for AI applications. Previously at Amazon and a startup founder. As large language models become core infrastructure for more and more applications, teams are discovering a familiar challenge in a new context: you can't improve what you can't see.

Read Post

Grafana

Read more about How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Bridging the gap between mobile experience and technical reality

Mar 23, 2026 By Ofri Grushka In Coralogix

For mobile-first organizations, the distance between a “slow app” and a “resolved ticket” is often filled with guesswork. Mobile performance is notoriously difficult to capture because it lives at the intersection of device hardware, network stability, and local code execution. Today, we are closing that gap with the launch of Coralogix Mobile Performance.

Read Post

Coralogix

Read more about Bridging the gap between mobile experience and technical reality

Making encrypted Java traffic observable with eBPF

Mar 23, 2026 By Nikolay Sivko In Coroot

Coroot's node agent uses eBPF to capture network traffic at the kernel level. It hooks into syscalls like read and write, reads the first bytes of each payload, and detects the protocol: HTTP, MySQL, PostgreSQL, Redis, Kafka, and others. This works for any language and any framework without touching application code. For encrypted traffic, we attach eBPF uprobes to TLS library functions like SSL_write and SSL_read in OpenSSL, crypto/tls in Go, and rustls in Rust.

Read Post

Coroot

Read more about Making encrypted Java traffic observable with eBPF

What is Virtana Application Observability and how is it different?

Mar 23, 2026 By Virtana In Virtana

Application Observability, Built for Hybrid Reality Modern applications don’t live in one place. A single transaction might span: Traditional APM shows you the trace. But hybrid reality doesn’t stop at the service layer. True application observability ties transactions to the infrastructure that actually delivered them across cloud, on-prem, and everything in between. Because in hybrid environments, the root cause rarely lives in just one tier.

View Video

Virtana

Read more about What is Virtana Application Observability and how is it different?

New API update: Filter incidents by phase & severity

Mar 21, 2026 By Valeria Kurolapova In StatusGator

We’ve enhanced our API for custom monitors with more powerful filtering options – giving you better control over how incidents are tracked and surfaced.

Read Post

StatusGator

Read more about New API update: Filter incidents by phase & severity

Grafana Campfire - Release Pipelines - (Grafana Community Call - March 2026)

Mar 21, 2026 By Grafana In Grafana

In this Campfire Community call, we'll be exploring Grafana's release pipelines - covering both our on-prem (public and private) artifact delivery and our Rolling Release Channels for building Grafana Cloud We'll walk through the fundamentals of how our pipelines work, including how ICs can patch branches and manage their own core Grafana releases, and where we're headed in the future. Plus much more!

View Video

Grafana

Read more about Grafana Campfire - Release Pipelines - (Grafana Community Call - March 2026)

API Availability Monitoring: How to Measure True API Availability

Mar 21, 2026 By Dotcom-Monitor In Dotcom-Monitor

APIs are no longer just integration layers. They power customer logins, payment processing, SaaS workflows, partner ecosystems, and mobile applications. When an API becomes unavailable, revenue stops, user trust declines, and service level agreements are immediately at risk. Yet many teams still define API availability in the simplest possible way. If an endpoint responds with a 200 OK, the API is considered available. Monitoring dashboards stay green. Alerts remain silent. Everything appears healthy.

Read Post

Dotcom-Monitor

Read more about API Availability Monitoring: How to Measure True API Availability

API Error Monitoring: A Complete Guide to Detecting and Resolving API Failures

Mar 21, 2026 By Dotcom-Monitor In Dotcom-Monitor

APIs power nearly every modern digital experience. From mobile apps and SaaS platforms to payment gateways and internal microservices, APIs handle authentication, transactions, content delivery, and system-to-system communication. When an API fails, users often experience broken features, slow responses, or complete service outages. In many cases, they leave before your team even realizes something is wrong. The business impact of API failures is significant.

Read Post

Dotcom-Monitor

Read more about API Error Monitoring: A Complete Guide to Detecting and Resolving API Failures

Applications Manager now officially supports Podman monitoring!

Mar 20, 2026 By Sujitha Paduchuri In ManageEngine

As organizations shift away from traditional container engines to embrace Podman’s rootless and daemon-less design, visibility often becomes a challenge. Because Podman doesn't rely on a central background service, traditional monitoring tools can leave you in the dark. Applications Manager's new Podman monitoring feature bridges that gap, giving you total visibility into your Podman workloads without compromising the security model you worked so hard to build.

Read Post

ManageEngine

Read more about Applications Manager now officially supports Podman monitoring!

Sponsored Post

The AI Readiness Paradox: The Agentic Value Gap And The Agentic Operational Model

Mar 20, 2026 By Shailesh Manjrekar In Fabrix

The disconnect between enterprise confidence and AI capability is real. MIT reports fewer than 5% of enterprises have achieved measurable ROI from AI, yet Cisco claims 13% feel ready. The gap isn’t about AI technology—it’s about organizational rigidity and change management. More importantly, most studies focus on business intelligence rather than operational use cases, which are far less risky and more measurable.

Read Post

Fabrix

Read more about The AI Readiness Paradox: The Agentic Value Gap And The Agentic Operational Model

Benchmarking Kubernetes Log Collectors: vlagent, Vector, Fluent Bit, OpenTelemetry Collector, and more

Mar 20, 2026 By Vadim Alekseev In VictoriaMetrics

At VictoriaMetrics, we built vlagent as a high-performance log collector for VictoriaLogs. To validate its performance and correctness under a real production-like load, we developed a benchmark suite and ran it against 8 popular log collectors. This post covers the methodology, throughput results, resource usage, and delivery correctness. Collectors under the test: We’ve made all benchmark configurations and source code public, so you can reproduce and verify the results independently.

Read Post

VictoriaMetrics

Read more about Benchmarking Kubernetes Log Collectors: vlagent, Vector, Fluent Bit, OpenTelemetry Collector, and more

What is Kubernetes? Explained in 2 Minutes

Mar 20, 2026 By Sysdig In Sysdig

What is Kubernetes, and how do companies like Netflix handle millions of users without crashing? In this quick guide, we break down Kubernetes in simple terms — from containers to pods, nodes, and the control plane — so you can understand how modern cloud applications stay reliable and scalable. Kubernetes acts like an air traffic controller for your apps, automatically managing where they run, restarting them if they fail, and balancing traffic across machines. Whether you're new to cloud computing or brushing up on DevOps basics, this video gives you a clear, beginner-friendly explanation.

View Video

Sysdig

Read more about What is Kubernetes? Explained in 2 Minutes

Announcing the Datadog Terraform provider v4.0.0

Mar 20, 2026 By David Iparraguirre In Datadog

Datadog supports managing Datadog configuration as code through the Datadog Terraform provider. As platform engineering practices evolve, we are focused on making this provider more reliable and trustworthy at enterprise scale.

Read Post

Datadog

Read more about Announcing the Datadog Terraform provider v4.0.0

Instrument zerocode observability for LLMs and agents on Kubernetes

Mar 20, 2026 By Ishan Jain In Grafana

Building AI services with large language models and agentic frameworks often means running complex microservices on Kubernetes. Observability is vital, but instrumenting every pod in a distributed system can quickly become a maintenance nightmare. OpenLIT Operator solves this problem by automatically injecting OpenTelemetry instrumentation into your AI workloads—no code changes or image rebuilds required.

Read Post

Grafana

Read more about Instrument zerocode observability for LLMs and agents on Kubernetes

Monitor Model Context Protocol (MCP) servers with OpenLIT and Grafana Cloud

Mar 20, 2026 By Ishan Jain In Grafana

Large language models don’t work in a vacuum. They often rely on Model Context Protocol (MCP) servers to fetch additional context from external tools or data sources. MCP provides a standard way for AI agents to talk to tool servers, but this extra layer introduces complexity. Without visibility, an MCP server becomes a black box: you send a request and hope a tool answers. When something breaks, it’s hard to tell if the agent, the server or the downstream API failed.

Read Post

Grafana

Read more about Monitor Model Context Protocol (MCP) servers with OpenLIT and Grafana Cloud

Observe your AI agents: Endtoend tracing with OpenLIT and Grafana Cloud

Mar 20, 2026 By Ishan Jain In Grafana

In another post in this series, we discussed how to instrument large language model (LLM) calls. This can be a good starting point, but generative AI workloads increasingly rely on agents, which are systems that plan, call tools, reason, and act autonomously. And their non‑deterministic behavior makes incidents harder to diagnose, in part, because the same prompt can trigger different tool sequences and costs.

Read Post

Grafana

Read more about Observe your AI agents: Endtoend tracing with OpenLIT and Grafana Cloud

How to monitor LLMs in production with Grafana Cloud,OpenLIT, and OpenTelemetry

Mar 20, 2026 By Ishan Jain In Grafana

Moving a large language model (LLM) application from a demo to a production‑scale service raises very different questions than the ones you ask when playing with an API key in a notebook. In production, you have to answer: How much is each model costing us? Are we keeping latency within our service‑level objectives? Are we accidentally returning hallucinations or toxic content? Is the system vulnerable to prompt‑injection attacks?

Read Post

Grafana

Read more about How to monitor LLMs in production with Grafana Cloud,OpenLIT, and OpenTelemetry

Balancing Data Locality, Data Sovereignty, and Data Replication

Mar 20, 2026 By Datadog In Datadog

Modern distributed systems must simultaneously respect where data must live, where it should live for performance, and where it needs to live for resilience. Data sovereignty and residency requirements increasingly affect technical design decisions, not only in regulated industries, but in any global product that must navigate regional expectations, latency constraints, cost structures, and operational realities.

View Video

Datadog

Read more about Balancing Data Locality, Data Sovereignty, and Data Replication

Datadog Data Observability, enables you to detect data quality and pipeline issues early.

Mar 20, 2026 By Datadog In Datadog

See our latest Episode of This Month in Datadog, for a spotlight of Datadog Data Observability, which enables you to detect data quality and pipeline issues early, as well as remediate those issues with end-to-end lineage. We also cover: This Month in Datadog brings you the latest updates on our newest product features, announcements, resources, and events.

View Video

Datadog

Read more about Datadog Data Observability, enables you to detect data quality and pipeline issues early.

Seer fixes Seer: How Seer pointed us toward a bug and helped fix an outage

Mar 20, 2026 By Kush Dubey In Sentry

Seer is our AI agent that takes bugs and uses all of the context Sentry has to find the root cause and suggest a fix. We use it all the time to help us improve Sentry. Seer fixes Sentry. More recently, Seer has been helping us fix itself — Seer fixing Seer. An upstream outage triggered a bit of an avalanche, revealing a bug that had been hiding away for months. When it came time to fix it, Seer pointed us exactly where we needed to look.

Read Post

Sentry

Read more about Seer fixes Seer: How Seer pointed us toward a bug and helped fix an outage

Error Monitoring for Elixir: Now in Scout APM

Mar 20, 2026 By Lance Erickson In Scout

Elixir’s “let it crash” philosophy is one of the best ideas in modern software design. Supervisors restart failed processes, the system self-heals, and life goes on. It’s like having a really good immune system. The problem is that a really good immune system can also hide chronic conditions. A GenServer crashing and restarting is working as designed.

Read Post

Scout

Read more about Error Monitoring for Elixir: Now in Scout APM

AURA in practice: real-world use cases for production AI agent infrastructure

Mar 20, 2026 By Mezmo In Mezmo

How platform and SRE teams are using Mezmo's open-core agent framework — with any LLM, any tools, any observability backend.

Read Post

Mezmo

Read more about AURA in practice: real-world use cases for production AI agent infrastructure

API Response Time Monitoring: Metrics, SLAs & Optimization Guide

Mar 20, 2026 By Dotcom-Monitor In Dotcom-Monitor

Modern applications are powered by APIs. Every login request, checkout transaction, mobile interaction, and third-party integration depends on APIs responding quickly and reliably. When an API slows down, the entire user experience suffers. Even a one-second delay in response time can: For ecommerce platforms, fintech systems, SaaS products, and real-time applications, slow APIs do not simply create inconvenience. They directly affect revenue, customer retention, and operational stability.

Read Post

Dotcom-Monitor

Read more about API Response Time Monitoring: Metrics, SLAs & Optimization Guide

API Observability Tools: Complete Guide to Platforms, Features & Use Cases (2026)

Mar 20, 2026 By Dotcom-Monitor In Dotcom-Monitor

Modern software runs on APIs. Whether you are operating microservices, integrating third party services, or building customer facing platforms, APIs are the backbone of your architecture. As systems become more distributed, simply knowing whether an endpoint is up or down is no longer enough. Teams need deeper visibility into performance, reliability, latency, and behavior across environments. That is where API observability tools come in. API observability goes beyond basic health checks.

Read Post

Dotcom-Monitor

Read more about API Observability Tools: Complete Guide to Platforms, Features & Use Cases (2026)

API Status Monitoring: Real-Time Health & Uptime Tracking

Mar 20, 2026 By Dotcom-Monitor In Dotcom-Monitor

APIs sit at the center of modern digital infrastructure. Mobile applications, SaaS platforms, microservices, and third party integrations all depend on APIs to exchange data and execute business logic in real time. When an API becomes unavailable, slows down, or returns incorrect data, users feel it immediately. Transactions fail. Dashboards stop updating. Logins break. Revenue and trust are affected within minutes.

Read Post

Dotcom-Monitor

Read more about API Status Monitoring: Real-Time Health & Uptime Tracking

Getting started with Azure dashboards

Mar 20, 2026 By SquaredUp In Squared Up

Sameer Mhaisekar, DevRel Engineer gives a brief demonstration of the SquaredUp plugin for Azure.

View Video

Squared Up

Read more about Getting started with Azure dashboards

VirtualMetric DataStream + Splunk: Pre-Ingest CIM Normalization Without the TA Tax

Mar 20, 2026 By VirtualMetric In VirtualMetric

Splunk is built around a deceptively simple premise: get your data in, search it, and act on it. In practice, the gap between “get your data in” and “data that actually works in Splunk ES” is where most of the engineering effort goes. CIM normalization is non-trivial. Technology Add-on development is slow. Volume-based licensing penalizes growth. And the combination means that as environments expand, Splunk becomes harder to operate efficiently.

Read Post

VirtualMetric

Read more about VirtualMetric DataStream + Splunk: Pre-Ingest CIM Normalization Without the TA Tax

Sponsored Post

Why Microsoft SCOM Dashboards Often Fail

Mar 19, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

SCOM dashboards should be the most accurate representation of operational health in an enterprise. They have access to rich, stateful monitoring data, deep object models, and years of operational context. Yet in many environments, dashboards are ignored, distrusted, or reduced to wall art.

Read Post

NiCE IT Mgmt

Read more about Why Microsoft SCOM Dashboards Often Fail

What Engineers Want from AI in Observability... According to the 2026 Observability Survey Report

Mar 19, 2026 By Grafana In Grafana

The results show strong interest in AI for forecasting, root cause analysis, onboarding, and generating dashboards, alerts, and queries. But when it comes to autonomous action, practitioners are more cautious — and 95% say AI needs to show its work to earn trust.

View Video

Grafana

Read more about What Engineers Want from AI in Observability... According to the 2026 Observability Survey Report

Unifying Telemetry in Battery Energy Storage Systems

Mar 19, 2026 By Allyson Boate In InfluxData

Battery energy storage systems (BESS) play a critical role in modern energy infrastructure. Utilities rely on these systems to balance renewable generation, stabilize grid operations, and respond to changing electricity demand. As deployments scale in size and complexity, operators require continuous insight into battery health, system performance, and grid interaction. Operators rely on telemetry generated across several operational platforms.

Read Post

InfluxData

Read more about Unifying Telemetry in Battery Energy Storage Systems

Fireside Chat with Datadog CTO Alexis Lê-Quôc

Mar 19, 2026 By Datadog In Datadog

Join Datadog CTO and Co-Founder Alexis Lê-Quôc and a special guest as they discuss emerging technologies and innovation, how they impact businesses today, and the new opportunities they create for you.

View Video

Datadog

Read more about Fireside Chat with Datadog CTO Alexis Lê-Quôc

Architecting Log Management for Privacy and Scale without the Headache

Mar 19, 2026 By Datadog In Datadog

As companies grow, they inevitably hit a wall: observability data explodes while privacy requirements become stricter. For years, engineers have faced a painful tradeoff—either ship petabytes of sensitive data to a central cloud (incurring egress costs and compliance risks) or manage a complex self-hosted stack that is painful to scale.

View Video

Datadog

Read more about Architecting Log Management for Privacy and Scale without the Headache

The Cognitive Ceiling: Why Modern Environments Outgrew Human Interpretation

Mar 19, 2026 By ScienceLogic In ScienceLogic

For more than a decade, organizations invested in tools and telemetry with the belief that more visibility would create more control. Monitoring expanded across cloud, application, network, and infrastructure layers. Observability platforms entered the mainstream. Automation tools promised faster detection and improved coordination. Yet despite these advancements, incidents are not easier to understand. War rooms still fill with conflicting interpretations. Signals generate more questions than answers.

Read Post

ScienceLogic

Read more about The Cognitive Ceiling: Why Modern Environments Outgrew Human Interpretation

You're probably overdue for a Sentry SDK upgrade

Mar 19, 2026 By Sergiy Dybskiy In Sentry

Session Replay. Structured logs. AI monitoring. Automatic OpenTelemetry tracing. Feature flag tracking. If you haven't seen these in your Sentry dashboard, your SDK version is probably the reason. Whether you're on @sentry/react, @sentry/nextjs, @sentry/vue, @sentry/angular, @sentry/sveltekit, or any other @sentry/* package, they all version together. When we say v10, we mean all of them.

Read Post

Sentry

Read more about You're probably overdue for a Sentry SDK upgrade

How to Perform a Network Health Check: Step-by-Step Guide

Mar 19, 2026 By Andrii Kernitskyi In Obkio

Your apps are slow. Users are complaining. You're staring at a dashboard trying to figure out what broke and when. Sound familiar? This is the reality of reactive network monitoring. By the time someone opens a ticket, the issue has already been affecting performance for minutes, sometimes hours. A network health check flips that script. Instead of chasing problems after the fact, you're catching them before users ever notice.

Read Post

Obkio

Read more about How to Perform a Network Health Check: Step-by-Step Guide

Claude Code is running bash commands on your infrastructure. Here's how to watch it.

Mar 19, 2026 By David Girvin In Sumo Logic

I’ve been staring at Claude Code telemetry for the past few weeks, and I keep noticing the same thing: most teams drop it into their environment, say “it’s amazing,” and have absolutely no idea what it’s actually doing at the system level. That’s fine for a personal dev tool. It’s not fine when you’ve rolled it out to 50 engineers.

Read Post

Sumo Logic

Read more about Claude Code is running bash commands on your infrastructure. Here's how to watch it.

The Invisible Thread: Why Your "Post Office" Needs a Master Courier

Mar 19, 2026 By meshIQ In meshIQ

When Apache Artemis and Apache Camel work separately, critical transactions can vanish into the "Integration Blind Spot." Learn how unified visibility transforms these invisible handshakes into clear, trackable flows that protect revenue and prevent outages.

Read Post

meshIQ

Read more about The Invisible Thread: Why Your "Post Office" Needs a Master Courier

Claude Code + Lightrun MCP: Your AI Agent Now Has Live Runtime Vision

Mar 19, 2026 By Lightrun Team In Lightrun

Claude Code, Anthropic’s coding agent, now integrates with Lightrun through MCP. AI code assistants have been flying blind. Google Dora’ 2025 report found it is causing, an almost 10% increase in code instability. Even with up to 1M tokens of context available in Claude, this powerful agenti cannot see how the code it writes actually behaves inside a live system under real traffic, real dependencies, and under a load of 10,000 requests per second.

Read Post

Lightrun

Read more about Claude Code + Lightrun MCP: Your AI Agent Now Has Live Runtime Vision

How to Manage Icinga with Ansible Webinar

Mar 19, 2026 By Icinga In Icinga

Managing monitoring environments shouldn’t be a manual chore. In this hands-on webinar, we show you how to fully automate your Icinga infrastructure using the Ansible Collection for Icinga. We take you step by step through everything from installing Icinga 2 to configuring master instances, setting up monitoring agents, building core objects, and integrating common components like Icinga Web, all driven by Ansible.

View Video

Icinga

Read more about How to Manage Icinga with Ansible Webinar

The AI-Driven Security Pipeline: Bindplane at RSAC 2026 Conference

Mar 19, 2026 By Laura Luttmer In ObservIQ

RSAC 2026 Conference is right around the corner, and we're unveiling new security capabilities at Booth N-5285. Book a 15-minute slot if you want an in-person walkthrough!

Read Post

ObservIQ

Read more about The AI-Driven Security Pipeline: Bindplane at RSAC 2026 Conference

Bridging the Gaps in Modern Operations: How Real-Time Messaging Improves System Reliability

Mar 19, 2026 By OpsMatters In OpsMatters

In modern IT environments, reliability is no longer defined solely by system uptime or infrastructure resilience. It is equally shaped by how effectively systems, teams, and processes communicate under pressure. As architectures become more distributed and operations more complex, the gaps between tools, teams, and data streams have become one of the most persistent challenges in maintaining consistent performance.

Read Post

OpsMatters

Read more about Bridging the Gaps in Modern Operations: How Real-Time Messaging Improves System Reliability

Cloud Migration Statistics for 2026

Mar 18, 2026 By Dana Krook In Auvik

Cloud adoption has officially crossed a tipping point. In 2026, the conversation is shifting from whether companies are moving to the cloud to how complicated things are getting once they’ve moved. Hybrid architectures, multi-cloud strategies, AI workloads, and rising security pressure are turning “the cloud” into a web of interconnected environments. For IT and network teams, that creates huge opportunity—and plenty of room for chaos if visibility doesn’t keep pace.

Read Post

Auvik

Read more about Cloud Migration Statistics for 2026

Buy vs Build in the Age of AI (Part 3)

Mar 18, 2026 By James Barnes In StatusCake

In Part 1, we looked at how AI has reduced the cost of building monitoring tools. Then in Part 2, we explored the operational and economic burden of owning them. Now we need to talk about something deeper. Because the real shift isn’t just economic; it’s structural. AI isn’t just helping engineers write code faster. It’s accelerating the entire software ecosystem; including how monitoring tools are built, maintained, and trusted.

Read Post

StatusCake

Read more about Buy vs Build in the Age of AI (Part 3)

How to Use Git Bisect to Pinpoint Bugs Precisely

Mar 18, 2026 By Alexander Rieß In Icinga

A feature that used to work suddenly broke. The problem? There were 300 commits since the last time I knew it worked. Checking each commit manually would take forever. Fortunately, Git has a tool designed exactly for this situation: git bisect.

Read Post

Icinga

Read more about How to Use Git Bisect to Pinpoint Bugs Precisely

Production Is Where the Rigor Goes

Mar 18, 2026 By Charity Majors In Honeycomb

In early February, Martin Fowler and the good folks at Thoughtworks sponsored a small, invite-only unconference in Deer Valley, Utah—birthplace of the Agile Manifesto—to talk about how software engineering is changing in the AI-native era. They recently published a summary of key insights and themes from the summit, sorted into ten topical buckets.

Read Post

Honeycomb

Read more about Production Is Where the Rigor Goes

Netdata Cloud MCP: Give Your AI Agents Full Infrastructure Context

Mar 18, 2026 By Netdata In netdata

Netdata has shipped MCP servers on every Agent since v2.6.0. Now we're taking the next step: a cloud-hosted MCP endpoint that gives AI agents and assistants infrastructure-wide visibility through a single connection.

View Video

netdata

Read more about Netdata Cloud MCP: Give Your AI Agents Full Infrastructure Context

Take the next steps for observability with autonomous IT platforms and Elastic

Mar 18, 2026 By Elastic Observability Team In Elastic

TL;DR: Autonomous IT platforms combine observability data and AI to automatically detect, diagnose, and resolve issues—shifting operations from reactive monitoring to predictive, self-healing systems.

Read Post

Elastic

Read more about Take the next steps for observability with autonomous IT platforms and Elastic

AppSignal's MCP Server: Connect AI Agents to Your Monitoring Data

Mar 18, 2026 By Serena Chou In AppSignal

Your AI coding assistant already knows your codebase. Now it can know your production environment too. AppSignal's MCP server gives AI agents and AI code editors direct access to your monitoring data — errors, performance metrics, and more — so they can help you debug, investigate and resolve issues without switching context. And with our new public endpoint, getting started is simpler than ever.

Read Post

AppSignal

Read more about AppSignal's MCP Server: Connect AI Agents to Your Monitoring Data

Scaling Kubernetes workloads on custom metrics

Mar 18, 2026 By Datadog In Datadog

The 2025 State of Containers and Serverless report found that 64% of organizations use the Kubernetes Horizontal Pod Autoscaler (HPA) to manage Kubernetes workload capacity. But only 20% of those deployments scale on custom metrics. The other four-fifths of organizations rely on resource metrics—CPU and memory utilized by their pods—to trigger autoscaling activity.

Read Post

Datadog

Read more about Scaling Kubernetes workloads on custom metrics

How to design cloud environments for AI-powered threat analysis

Mar 18, 2026 By Datadog In Datadog

Cloud environments generate high volumes of security signals every day. With each one, you have to determine if it’s benign, a clear false positive, or something worth investigating. The challenge is needing to make these calls continuously, often without knowing whether any single event is part of a larger attack. Spending too much time investigating benign activity reduces the ability to detect threats elsewhere, and missing a legitimate threat has clear consequences.

Read Post

Datadog

Read more about How to design cloud environments for AI-powered threat analysis

Migrating from SolarWinds to WhatsUp Gold

Mar 18, 2026 By Progress WhatsUp Gold In WhatsUp Gold

This video guides you through the steps to migrate your network monitoring from SolarWinds to WhatsUp Gold, emphasizing key differences, benefits, and best practices to ensure a smooth transition.

View Video

WhatsUp Gold

Read more about Migrating from SolarWinds to WhatsUp Gold

AI in observability in 2026: Huge potential, lingering concerns

Mar 18, 2026 By Trevor Jones In Grafana

The role of AI in observability is evolving rapidly, but the data from our fourth annual Observability Survey makes one thing abundantly clear: the potential is real, and so are the reservations. Practitioners overwhelmingly see value in using AI to help surface anomalies, forecast and spot trends, assist with root cause analysis, and get new users up to speed quicker.

Read Post

Grafana

Read more about AI in observability in 2026: Huge potential, lingering concerns

Open standards in 2026: The backbone of modern observability

Mar 18, 2026 By Trevor Jones In Grafana

Open source software and open standards are now an essential part of how organizations maintain their systems. That's not to say they haven't always been important, but the fourth annual Observability Survey, brought to you by Grafana Labs, shows just how deeply the shift to open has taken hold, with 77% of respondents saying open source and open standards are important1 to their observability strategy.

Read Post

Grafana

Read more about Open standards in 2026: The backbone of modern observability

OTEL Collector: Customizable YAML Configuration

Mar 18, 2026 By Sumo Logic, Inc. In Sumo Logic

This video shows you how custom YAML configuration works with the Sumo Logic OpenTelemetry collector.

View Video

Sumo Logic

Read more about OTEL Collector: Customizable YAML Configuration

Track and Fix NodeJS errors Using Rollbar

Mar 18, 2026 By Rollbar In Rollbar

Setting up Rollbar with your NodeJS application with custom parameters and people tracking.

View Video

Rollbar

Read more about Track and Fix NodeJS errors Using Rollbar

Engineers Want AI in Observability - With One Catch: 4th Annual Observability Survey by Grafana Labs

Mar 18, 2026 By Grafana In Grafana

Actually useful AI is welcome in observability. AI for the sake of AI is not. In this overview of Grafana Labs’ 4th annual Observability Survey, Marc Chipouras shares what 1,300+ respondents from 76 countries told us about the current state of observability — and what comes next. This year’s survey explores four major themes: The results show strong interest in AI for forecasting, root cause analysis, onboarding, and generating dashboards, alerts, and queries. But when it comes to autonomous action, practitioners are more cautious — and 95% say AI needs to show its work to earn trust.

View Video

Grafana

Read more about Engineers Want AI in Observability - With One Catch: 4th Annual Observability Survey by Grafana Labs

The World's Best Infrastructure Teams Trust Kentik

Mar 18, 2026 By Kentik In Kentik

Why do network and infrastructure teams at leading enterprises including Canva, Dropbox, Google ConocoPhillips, and ServiceNow choose Kentik? In their own words, customers describe epic cost savings, dramatic return on investment, and blockbuster efficiency improvements that only Kentik can deliver. Learn why Kentik is the must-see network intelligence solution any enterprise that depends on reliable connectivity.

View Video

Kentik

Read more about The World's Best Infrastructure Teams Trust Kentik

What Happens When You Replace Legacy Network Tools With AI Advisor?

Mar 18, 2026 By Kentik In Kentik

What happens when you replace legacy network tools with Kentik AI Advisor?

View Video

Kentik

Read more about What Happens When You Replace Legacy Network Tools With AI Advisor?

Monitor schema health with engine.schema_fields: Structure, Drift, and Volatility

Mar 18, 2026 By Coralogix Team In Coralogix

If you’ve worked with an observability pipeline, you’ve probably experienced schema problems: a field disappears, a type shifts from string to number, or a new label quietly appears. The causes are everywhere. Different teams adopt different naming conventions. A dependency upgrade changes the shape of a library’s log output. Over time, these small, reasonable decisions compound into schema sprawl: dashboards break, alerts misfire, and teams scramble to find out what happened.

Read Post

Coralogix

Read more about Monitor schema health with engine.schema_fields: Structure, Drift, and Volatility

Flow State in an AI Workplace - Digital Friction 1:1 with Mike Lovewell

Mar 18, 2026 By Nexthink In Nexthink

Tom welcomes Mike Lovewell to explore how digital friction continues to shape the modern workplace. From early days of low awareness to today’s complex, AI-influenced environments, Mike shares how friction has evolved in scale rather than cause. They discuss the growing importance of flow state, the measurable business impact of small disruptions, and why adoption—not just technology—is the key to success. AI emerges as both a solution and a new source of friction, depending on trust and usability.

View Video

Nexthink

Read more about Flow State in an AI Workplace - Digital Friction 1:1 with Mike Lovewell

Product Update - March 2026

Mar 18, 2026 By Hrishikesh Barua In IncidentHub

IncidentHub's latest product updates focus on improving the public status page, adding integrations with ticketing systems, private status page ingestion, and making the notifications more useful to the end user. Some of these improvements are driven by user feedback. Feedback is what makes the product better, and I am personally grateful to all our customers who have shared their feedback with us.

Read Post

IncidentHub

Read more about Product Update - March 2026

Network Monitoring as Code

Mar 18, 2026 By Checkly In Checkly

Tangling DNS, TCP handshake failures, packet loss: your network has blind spots that application-level dashboards miss. In this session, Daniel Paulus (VP Engineering, Checkly) sets up DNS, TCP, and ICMP monitors from scratch and deploys them as code using the Checkly CLI. You'll see how to import checks from the UI to a code project, use coding agents to build monitors, and debug network failures with Rocky AI, trace routes, and packet captures.

View Video

Checkly

Read more about Network Monitoring as Code

Free escalation procedure template (download & customize)

Mar 17, 2026 By Leo Baecker In Hyperping

Your monitoring fires at 2 AM. The on-call engineer picks up but doesn't know who to call next, what information to include, or which Slack channel to use. Sound familiar? That's what happens when escalation procedures exist only in people's heads — or worse, don't exist at all. The fix isn't complicated: a documented escalation procedure that every team member can follow under pressure. The problem is building one from scratch takes hours.

Read Post

Hyperping

Read more about Free escalation procedure template (download & customize)

Complete HTTP Status Codes List & Reference (2026)

Mar 17, 2026 By Leo Baecker In Hyperping

This is a comprehensive reference of every HTTP status code defined in the HTTP specification (RFC 9110) and common extensions. Use it as a quick lookup when you encounter a status code in your browser, server logs, or API responses. For a beginner-friendly guide to the most common codes, see From 200 to 503: Understanding the Most Common HTTP Status Codes.

Read Post

Hyperping

Read more about Complete HTTP Status Codes List & Reference (2026)

Bridge the DevSec divide: Using Grafana Cloud and Miggo for runtime protection

Mar 17, 2026 By Jonathan Price In Grafana

Note: This blog post is co-authored by Daniel Shechter, CEO and co-founder of Miggo Security. Modern runtime security is critical to understand complex systems and detect and protect against attacks, especially in rapidly evolving cloud native architectures. For many security teams, however, achieving deep visibility into runtime risks remains a moving target.

Read Post

Grafana

Read more about Bridge the DevSec divide: Using Grafana Cloud and Miggo for runtime protection

Debug while you build with Seer via MCP

Mar 17, 2026 By Sentry In Sentry

Try Sentry for free: https://sentry.io
Docs: https://docs.sentry.io

View Video

Sentry

Read more about Debug while you build with Seer via MCP

5 Database Monitoring Tips Every DBA Should Use to Reduce Firefighting

Mar 17, 2026 By udara.ratnakumara In Redgate

This is a guest post from udara.ratnakumara. In a recent webinar I hosted with my colleague Chris Hawkins, Inside a DBA’s Day: What Really Happens and How to Stay Ahead, we talked through the realities of a typical DBA day and the practical ways teams can stay ahead of issues rather than constantly reacting. For many DBAs, the day doesn’t start with coffee. It starts with an alert. A report is suddenly slow. An application query is timing out.

Read Post

Redgate

Read more about 5 Database Monitoring Tips Every DBA Should Use to Reduce Firefighting

From Data Chaos to Results: The New Data Strategy for the Agentic Era

Mar 17, 2026 By Kamal Hathi In Splunk

The world is generating data at a pace that defies the human ability to draw insights and comprehend. By 2028, we’ll reach almost 400 zettabytes of global data—with over 55% of it coming from machines talking to machines. For enterprises, this isn’t just a storage problem; it’s an existential challenge.

Read Post

Splunk

Read more about From Data Chaos to Results: The New Data Strategy for the Agentic Era

How Does Skylar Advisor Cut Alert Noise?

Mar 17, 2026 By ScienceLogic In ScienceLogic

What if you could start your day without hundreds of alerts? Skylar Advisor transforms noisy event streams into a short list of prioritized advisories by grouping related alerts and signals together. It shows what is happening in your environment, explains why it matters, and provides clear next steps so instead of chasing alerts, IT teams get guidance focused on real operational impact.

View Video

ScienceLogic

Read more about How Does Skylar Advisor Cut Alert Noise?

How GDIT Automated Early Response to Preserve Critical Event Context

Mar 17, 2026 By ScienceLogic In ScienceLogic

In this video, Jason Boig, Solutions Engineer at GDIT, shares how his team uses ScienceLogic to streamline network infrastructure monitoring and improve response times. Instead of relying on manual processes after an alert is triggered, ScienceLogic helps automate the initial response and capture critical data the moment an event occurs. This ensures nothing is lost as conditions change and gives teams immediate visibility into issues.

View Video

ScienceLogic

Read more about How GDIT Automated Early Response to Preserve Critical Event Context

Fair Source Software in the AI age

Mar 17, 2026 By Chad Whitacre In Sentry

Have you noticed AI recently? Yeah, us too. Generative AI is wreaking havoc on the software status quo, and that includes licensing, and that generates … opinions. Sentry has a long history of having opinions about software licensing. We started life as an unlicensed side project in 2008, then went through BSD, to BSL, to writing our own license, FSL.

Read Post

Sentry

Read more about Fair Source Software in the AI age

The Hidden Crisis in Modern IT: Interpretation Risk

Mar 17, 2026 By ScienceLogic In ScienceLogic

Technology leaders spent the past decade investing heavily in visibility. They expanded monitoring footprints, adopted cloud-native observability tools, integrated analytics dashboards, and layered on automation intended to streamline detection. Every addition promised deeper insight. Every initiative aimed to bring clarity to increasingly complex environments. Yet operations feel more chaotic, not less. Outages move faster. Incidents cross more boundaries. Signals appear without context.

Read Post

ScienceLogic

Read more about The Hidden Crisis in Modern IT: Interpretation Risk

Shifting Metrics Right

Mar 17, 2026 By Martin Thwaites In Honeycomb

In the shift left era where it feels like we’re pushing everything as far to the start of the SDLC as we can, it may seem counterintuitive to shift anything right. That is, however, exactly what I suggest when it comes to generating metrics. How far you go to the right of the SDLC is a much more nuanced question and is dependent on a lot of factors, and on what metrics you’re talking about.

Read Post

Honeycomb

Read more about Shifting Metrics Right

Instrumenting Rust TLS with eBPF

Mar 17, 2026 By Nikolay Sivko In Coroot

Coroot is an open source observability tool that uses eBPF to collect telemetry directly from applications and infrastructure. One of the things it does is capture L7 traffic from TLS connections without any code changes, by hooking into TLS libraries and syscalls. Works great for OpenSSL. Works for Go. Then rustls enters the picture and everything stops being obvious. With OpenSSL, everything is nicely wrapped: From eBPF’s point of view this is perfect: Everything happens inside one call.

Read Post

Coroot

Read more about Instrumenting Rust TLS with eBPF

Event Intelligence for Agentic IT Operations

Mar 17, 2026 By david.arrowsmith In Interlink

Modern IT teams are experimenting with AI agents. But individual agents, working in isolation are not enough. To truly achieve Agentic IT Operations, organisations need a platform — one that coordinates, governs, and contextualises AI-driven actions across the entire IT landscape. That’s where Interlink Software comes in.

Read Post

Interlink

Read more about Event Intelligence for Agentic IT Operations

Monitor Aruba Central in Datadog

Mar 17, 2026 By Datadog In Datadog

Modern organizations often operate from multiple locations. From retail stores to global enterprises, many companies rely on distributed wired and wireless networks to keep business-critical applications online. Aruba Central provides a centralized, cloud-based platform for managing that infrastructure at scale.

Read Post

Datadog

Read more about Monitor Aruba Central in Datadog

Monitor your application and network load balancer logs

Mar 17, 2026 By Datadog In Datadog

Load balancers are the primary entry points to distributed applications. By strategically directing the flow of incoming web traffic to specific endpoints, load balancers help optimize throughput and ensure the horizontal scalability of applications. In modern systems, load balancers often do more than their name suggests: Beyond basic load distribution, they analyze requests and route traffic based on a wide range of variables, such as client identity.

Read Post

Datadog

Read more about Monitor your application and network load balancer logs

Cost Optimization in Action: How We Cut Amazon SQS Costs by 87%

Mar 17, 2026 By Jean-Charles Thouin In LogicMonitor

JC, the Director of Software Engineering, Cloud at LogicMonitor, shares how Cost Optimization enabled his team to shift to Cost-Intelligent Observability and tackle an unexpected and growing cloud bill. As engineers, we live and breathe performance. We obsess over latency, reliability, and uptime, the hallmarks of a healthy system. But there’s another metric that’s becoming just as critical: cost.

Read Post

LogicMonitor

Read more about Cost Optimization in Action: How We Cut Amazon SQS Costs by 87%

A New Scale Tier for Amazon Timestream for InfluxDB

Mar 17, 2026 By InfluxData In InfluxData

InfluxDB 3 on Amazon Timestream for InfluxDB now scales to 15-node clusters, unlocking higher ingestion, greater query concurrency, and real-time performance at scale. In this video, PM Pete Barnett breaks down what this means for high-resolution, high-velocity workloads, and how you can scale from Core to Enterprise with zero downtime or data migration.

View Video

InfluxData

Read more about A New Scale Tier for Amazon Timestream for InfluxDB

Taming the Broker Network: Achieving Reliable Apache ActiveMQ Operations

Mar 17, 2026 By meshIQ In meshIQ

Broker networks grow from success but often become fragile webs. A global retailer's journey from Apache ActiveMQ chaos to reliable operations shows how unified visibility, automation, and governed self-service transform messaging from liability to strategic asset.

Read Post

meshIQ

Read more about Taming the Broker Network: Achieving Reliable Apache ActiveMQ Operations

Captur: Observability-First Mobile ML Inference for Better Customer Confidence

Mar 17, 2026 By Datadog In Datadog

Captur builds a mobile SDK that brings real-time image recognition and actionable feedback directly into customers’ apps, running complex machine learning models entirely on device without cloud inference. This architecture delivers privacy and performance, but also creates unique challenges when it comes to observability and debugging, especially as crashes can originate from the host app rather than the SDK itself.

View Video

Datadog

Read more about Captur: Observability-First Mobile ML Inference for Better Customer Confidence

Episode 8 - The Rise of Autonomous Teams

Mar 17, 2026 By Digitate In Digitate

In this episode of The Intelligent Enterprise, host Tom Stoneman takes us inside the evolving use-cases for AI across different enterprises. Digitate recently conducted a survey of over 600 IT decision makers from across North America. The aim was to get a better sense of how AI tools are being implemented across workplaces — and the results are fascinating.

View Video

Digitate

Read more about Episode 8 - The Rise of Autonomous Teams

AIOps vs Observability vs Monitoring: What Enterprises Actually Need

Mar 17, 2026 By Renuka Suresh In HEAL Software

IT Directors, CTOs, and Application Heads managing complex, multi-tool enterprise environments.

Read Post

HEAL Software

Read more about AIOps vs Observability vs Monitoring: What Enterprises Actually Need

DevEx Talks episode 2 - Women in DevRel: What Matters in Open Source?

Mar 17, 2026 By VictoriaMetrics In VictoriaMetrics

In this DevEx Talks episode, Adriana Villela and Cortney Nickerson explore what truly matters in open source through the lens of women in Developer Relations and Community roles. From diverse career paths to navigating DevRel as women in tech, they share honest reflections on impact, feedback, and long-term motivation in cloud native ecosystems.

View Video

VictoriaMetrics

Read more about DevEx Talks episode 2 - Women in DevRel: What Matters in Open Source?

Role of Control Room Design in Improving Monitoring Accuracy

Mar 17, 2026 By OpsMatters In OpsMatters

Monitoring mistakes rarely happens randomly. Most of them originate in control rooms where operators struggle with poorly positioned screens, awkward equipment placement, or lighting that makes critical data difficult to see. In high-stakes environments like power grids, security operations, transportation systems, and manufacturing plants, monitoring accuracy directly affects operational stability and safety. Even highly skilled operators can make mistakes when their workspace works against them.

Read Post

OpsMatters

Read more about Role of Control Room Design in Improving Monitoring Accuracy

How to Set Up Heatmaps on Your Website with Hotjar

Mar 16, 2026 By Super Monitoring In Super Monitoring

The visual tools assist you to observe the area in which people click, scroll, or spend time on your web pages in the form of heatmaps. Hotjar is among the most trendy tools to create heatmaps – and it is not hard to install it, as long as you follow the steps below. At the conclusion of this guide, you will know.

Read Post

Super Monitoring

Read more about How to Set Up Heatmaps on Your Website with Hotjar

A New Scale Tier for Time Series on Amazon Timestream for InfluxDB

Mar 16, 2026 By Pat Walsh In InfluxData

When we first announced the availability of InfluxDB 3 Core and Enterprise on Amazon Timestream for InfluxDB last year, we set a new standard for managed time series on AWS. We gave developers a simple way to harness high performance at scale while removing the burden of infrastructure management. But as our customers have taught us, “at scale” is a moving target. Across Industrial IoT, physical AI, and real-time observability, data is growing in both volume and resolution.

Read Post

InfluxData

Read more about A New Scale Tier for Time Series on Amazon Timestream for InfluxDB

Powering enterprise AI at scale: The Elastic and NVIDIA cuVS integration

Mar 16, 2026 By Brian Bergholm In Elastic

Seamlessly vectorize high-volume data and accelerate your time to production with the new gold standard for GPU-accelerated vector search.

Read Post

Elastic

Read more about Powering enterprise AI at scale: The Elastic and NVIDIA cuVS integration

Quickly go from exploration to action with new one-click integrations in Grafana Drilldown

Mar 16, 2026 By Brendan O'Handley In Grafana

The Grafana Drilldown apps gives you a queryless, point-and-click way to explore your metrics, logs, traces, and profiles. But finding an insight is only half the job—you still need to act on it. Previously, that meant leaving Drilldown, manually copying queries, and navigating through Grafana's dashboards, Alerting, and "Explore" interfaces to pick up where you left off.

Read Post

Grafana

Read more about Quickly go from exploration to action with new one-click integrations in Grafana Drilldown

Choosing a JavaScript logging library: The 2026 definitive guide

Mar 16, 2026 By Kyle Tryon In Sentry

With AI writing more and more of our code, properly monitoring and debugging that code has become an increasingly critical part of the development workflow that can't be ignored. Luckily, we have more time than ever to implement the right tools to do so. Implementing a production-ready logging solution is easy to do, and provides you and your LLM Agents with a wealth of debugging information from your app, across users and environments.

Read Post

Sentry

Read more about Choosing a JavaScript logging library: The 2026 definitive guide

Monitoring Phoenix LiveView Performance With Scout APM

Mar 16, 2026 By Lance Erickson In Scout

Phoenix LiveView is one of those technologies that feels like cheating. You get rich, interactive UIs without writing JavaScript, and the server handles the state. It’s elegant. But that elegance comes with a trade-off that’s easy to forget: all that interactivity runs on your server.

Read Post

Scout

Read more about Monitoring Phoenix LiveView Performance With Scout APM

Architecting the Future: The evolution of Apache ActiveMQ for enterprise messaging and the path to mission control

Mar 16, 2026 By meshIQ In meshIQ

Apache ActiveMQ is evolving from simple transport to intelligent fabric. Key shifts include replicated KahaDB for cloud-native resilience, Spring decoupling in v7, and OpenTelemetry observability—transforming messaging infrastructure for modern enterprise needs.

Read Post

meshIQ

Read more about Architecting the Future: The evolution of Apache ActiveMQ for enterprise messaging and the path to mission control

The Context Window: Agentic LLMs 101

Mar 16, 2026 By Grafana In Grafana

In this episode, Mat Ryer and Cyril Tovena will talk about agentic LLMs with host Tiffany Jernigan.

View Video

Grafana

Read more about The Context Window: Agentic LLMs 101

Mastering Trace Highlights: How to Solve Complex System Outliers in Seconds

Mar 16, 2026 By Coralogix In Coralogix

Transform millions of spans into a clear visual map. In this demo, we use Coralogix Trace Highlights to isolate a performance regression and pivot from 400k spans down to the exact root cause in just a few clicks.

View Video

Coralogix

Read more about Mastering Trace Highlights: How to Solve Complex System Outliers in Seconds

How to Spot Vulnerabilities in Your Supply Chain Quickly

Mar 16, 2026 By OpsMatters In OpsMatters

Ensuring shipments are secure before leaving a warehouse is essential for preventing losses and delays. Essential checks before approving a shipment for dispatch include verifying documentation, inspecting packaging, and confirming that transport processes are properly followed. Completing these checks helps logistics teams detect potential problems before they escalate into costly issues. Supply chain vulnerabilities can disrupt operations, create financial risks, and damage a company's reputation. Taking proactive steps ensures that goods reach their destination safely and efficiently.

Read Post

OpsMatters

Read more about How to Spot Vulnerabilities in Your Supply Chain Quickly

Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive

Mar 15, 2026 By Sentry In Sentry

Livestream from New Systems (Toronto) — Pi is the extensible coding agent behind OpenClaw. Featuring a talk from Daniel Griesser on his journey with Pi, a live AMA with Pi creator Mario Zechner and Armin Ronacher, plus a Pi extensions tour with Ben Vinegar. March 14, 12pm–4pm EST.

View Video

Sentry

Monitoring

Read more about Pi Day: AMA with Pi's Creator + Talks & Extensions Deep Dive

Golang memory arenas [101 guide]

Mar 14, 2026 By Vladimir Mihailenco In Uptrace

Go 1.20 introduced an experimental arena package that lets you allocate many objects from a contiguous region of memory and free them all at once — bypassing the garbage collector entirely. The package remains experimental and its future is uncertain, but arenas are a valuable concept for understanding Go memory management and writing high-performance code. The arena package is experimental and on hold indefinitely. The Go team has made no guarantees about compatibility or its continued existence.

Read Post

Uptrace

Read more about Golang memory arenas [101 guide]

Sponsored Post

Top infrastructure monitoring mistakes (and how to avoid them)

Mar 13, 2026 By Site24x7 In Site24x7

Infrastructure monitoring is meant to simplify operations, not overwhelm teams with noise. Yet the average IT team receives more than 10,000 alerts every day. Despite this constant stream of notifications, critical issues still slip through the cracks. This volume of fragmented data creates a dangerous visibility gap across the infrastructure. As a result, teams can spend more time sorting through alerts than actually resolving issues.

Read Post

Site24x7

Read more about Top infrastructure monitoring mistakes (and how to avoid them)

The Obkio Story: Building a Network Observability & Monitoring Solution

Mar 13, 2026 By Alyssa Lamberti In Obkio

In 2016, before Obkio existed, we ran a market audit. We interviewed banks, manufacturing companies, and service providers, and asked them one simple question: Why aren't you using a Network Performance Monitoring solution? The answer was unanimous: the tools were too complex, and nobody had the internal resources to run them full-time. If that was true for enterprises with dedicated networking staff, it was even more true for smaller businesses with generalist IT teams.

Read Post

Obkio

Read more about The Obkio Story: Building a Network Observability & Monitoring Solution

Reduce alert noise with Site24x7's Event Correlation

Mar 13, 2026 By ManageEngine Site24x7 In Site24x7

Alert fatigue remains one of the most underestimated problems in IT operations. Srinivasa Raghavan, director of product management, explains how event correlation addresses it. Event correlation is the process of grouping related alerts from across your infrastructure into a single, contextual incident to reduce the volume of noise during an outage or service degradation. In this short clip, Srinivasa walks through what how the feature functions and why high-volume alert environments make this kind of signal-to-noise reduction operationally significant.

View Video

Site24x7

Read more about Reduce alert noise with Site24x7's Event Correlation

OpAMP for OpenTelemetry: Managing Collector Fleets and Introducing the New OpAMP Gateway Extension

Mar 13, 2026 By Andy Keller In ObservIQ

Today, Bindplane is launching the OpAMP Gateway Extension in alpha — a new component that extends OpAMP fleet management into network-segmented and firewalled environments where direct agent-to-server connectivity is not possible. It also addresses fleet scaling by fanning many agent connections into a small upstream pool, reducing connection load on the OpAMP server. We also hope to donate the OpAMP Gateway Extension upstream to the OpenTelemetry project and welcome community contributions.

Read Post

ObservIQ

Read more about OpAMP for OpenTelemetry: Managing Collector Fleets and Introducing the New OpAMP Gateway Extension

Cloud Observability Is Broken - Hybrid Operations Need a New Intelligence Model

Mar 13, 2026 By Dallon Robinette In Selector

Cloud adoption was supposed to simplify operations. Infrastructure would become programmable, scalability would become elastic, and distributed architectures would enable resilience at global scale. In practice, cloud has delivered extraordinary flexibility, but it has also introduced a level of operational complexity that traditional observability approaches were never designed to handle.

Read Post

Selector

Read more about Cloud Observability Is Broken - Hybrid Operations Need a New Intelligence Model

What is Industry 4.0? Everything You Need to Know in 2026

Mar 13, 2026 By Company In InfluxData

Industry 4.0 is the term used to describe the fourth industrial revolution, a name given to the integration of physical and digital systems, which includes the internet of things (IoT) and artificial intelligence that are transforming a huge number of industries. At a high level, its goal is to create an efficient, automated process for creating products or services that can be adapted quickly and efficiently to changing customer needs.

Read Post

InfluxData

Read more about What is Industry 4.0? Everything You Need to Know in 2026

Digital Adoption + AI: The Secret Route to Zero Tickets

Mar 13, 2026 By Ella Drimer In Nexthink

Generative AI has the potential to transform workplace productivity – but do organizations know how to deliver on that promise? New research shows that employees who use generative AI tools engage with them up to ten times per day, spending over three hours per week interacting with AI at work. And yet within the same organizations, large groups of employees have never meaningfully engaged with these tools at all.

Read Post

Nexthink

Read more about Digital Adoption + AI: The Secret Route to Zero Tickets

4 Key DEXOps Process Improvements

Mar 13, 2026 By Megan Brake In Nexthink

Most IT organizations want to improve the digital employee experience. But good intentions alone rarely move the needle. The real shift happens when organizations evolve how IT operates. Traditional IT operations are built around reacting to incidents. But ticket-based operations, or operations based on poor data, lack the ability to create truly predictive ways of working.

Read Post

Nexthink

Read more about 4 Key DEXOps Process Improvements

Why the New Normal in Cyberattacks Demands Network Intelligence

Mar 13, 2026 By Steve Stover In Kentik

As cyberattacks evolve into “machine-speed” disruption campaigns that span cloud, identity, and network planes, traditional monitoring is no longer enough to protect modern enterprise infrastructure. Shifting to a network intelligence model, powered by real-time telemetry and AI-driven reasoning, enables security teams to detect weak signals and automate defenses before an incident becomes systemic.

Read Post

Kentik

Read more about Why the New Normal in Cyberattacks Demands Network Intelligence

MCP and A2A: What They Are and Why They Matter for Autonomous IT

Mar 13, 2026 By Margo Poda In LogicMonitor

MCP and A2A are the two protocols that make agentic AI governable at enterprise scale. One controls how agents use tools, and the other controls how agents work together. AI in the enterprise is no longer confined to chat windows. It’s operating inside incident queues and automation pipelines. Increasingly, teams are using AI agents to take action: detecting incidents, executing remediations, updating tickets, coordinating across systems.

Read Post

LogicMonitor

Read more about MCP and A2A: What They Are and Why They Matter for Autonomous IT

What is SSL Certificate Monitoring?

Mar 13, 2026 By Dotcom-Monitor In Dotcom-Monitor

SSL Certificate Monitoring is the automated process of validating the integrity, trust chain, and expiration status of TLS certificates across network endpoints to prevent connection failures. SSL/TLS certificates are required for encrypted data transmission and server authentication. If a certificate is expired or fails validation (hostname, trust chain, issuer, etc.), properly configured clients will terminate the connection.

Read Post

Dotcom-Monitor

Read more about What is SSL Certificate Monitoring?

Nexthink Flow: Product Overview

Mar 13, 2026 By Nexthink In Nexthink

Nexthink Flow combines AI-powered data with a real-time, low code orchestration engine to continuously optimize complex workflows, monitor progress, handle exceptions, and ensure that all tasks are completed as intended. Repurpose hours spent on recurring issues to optimize resources, save costs, and improve IT and employee productivity.

View Video

Nexthink

Read more about Nexthink Flow: Product Overview

From signals to savings: Optimizing cloud costs with Grafana Assistant and MCP servers

Mar 13, 2026 By Daniel Fitzgerald In Grafana

In today's cloud-native environments, managing resource waste and optimizing costs can feel like a constant battle. Operators, along with their fearless FinOps teams, spend countless hours hunting down unused resources, deciphering complex telemetry data, and manually implementing code or configuration changes to try to reduce cloud costs. But what if you could automate the entire process, from identifying waste to implementing the fix, all based on actual production telemetry?

Read Post

Grafana

Read more about From signals to savings: Optimizing cloud costs with Grafana Assistant and MCP servers

Why Miami Businesses Need IT Support That Sees Problems Coming

Mar 13, 2026 By Vince Louie Daniot In OpsMatters

In Miami, downtime rarely stays small for long. A dropped connection in Brickell can stall a sales call. A failed backup in Coral Gables can turn into a compliance headache. A slow server in Doral can drag down an entire team before anyone even realizes what is happening. That is why more companies are moving away from reactive, break-fix support and looking for Miami-based IT services with proactive monitoring.

Read Post

OpsMatters

Read more about Why Miami Businesses Need IT Support That Sees Problems Coming

Adapting Your Mobile Device Management for Evolving Cyber Threats

Mar 13, 2026 By OpsMatters In OpsMatters

You can reduce this risk with multifactor authentication, where users confirm their identity through a second step, such as a mobile notification or biometric verification. Even if credentials are compromised, attackers cannot easily gain access to your systems.

Read Post

OpsMatters

Read more about Adapting Your Mobile Device Management for Evolving Cyber Threats

Observability vs Monitoring: Why the Difference Still Matters in Complex Systems

Mar 13, 2026 By OpsMatters In OpsMatters

In modern infrastructure, the words observability and monitoring are often used as if they mean the same thing. That shortcut sounds harmless, but it creates real confusion inside technical teams and business discussions. The two ideas are connected, yet they solve different problems. In simple systems, the gap may feel small. In complex systems, the gap becomes impossible to ignore because the cost of misunderstanding it usually appears during failure, not during routine operation.

Read Post

OpsMatters

Read more about Observability vs Monitoring: Why the Difference Still Matters in Complex Systems

Microsoft March 2026 Patch Tuesday:

Mar 12, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

SCOM Vulnerability: CVE-2026-20967.

Read Post

NiCE IT Mgmt

Read more about Microsoft March 2026 Patch Tuesday:

Improved Azure status integration

Mar 12, 2026 By Valeria Kurolapova In StatusGator

Monitoring Azure health across large environments should not require complicated setup. Until recently, connecting Azure to StatusGator required configuring access at the subscription level, which could become difficult for organizations managing dozens or even hundreds of subscriptions. We redesigned the Azure integration to make it simpler, more scalable, and easier to manage.

Read Post

StatusGator

Read more about Improved Azure status integration

Apple Developer outage on March 10th

Mar 12, 2026 By Andy Libby In StatusGator

On March 10, 2026, developers around the world began experiencing issues with Apple Developer services that prevented apps from being verified or launched on physical devices. For many teams building and testing iPhone apps, the outage disrupted development workflows and blocked deployment to test devices. The issue appeared to involve Apple’s developer certificate verification systems.

Read Post

StatusGator

Read more about Apple Developer outage on March 10th

Updated status page privacy settings

Mar 12, 2026 By Valeria Kurolapova In StatusGator

We’ve updated the Status Page privacy settings to make access control clearer and more flexible. The new layout groups related options together and introduces email-based authentication for enterprise customers.

Read Post

StatusGator

Read more about Updated status page privacy settings

Evaluating Observability Tools for the AI Era

Mar 12, 2026 By Kale Bogdanovs In Honeycomb

Every observability vendor has an AI story right now. Most have an MCP. Many have a chatbot. All have a demo where the AI finds the root cause of an incident in thirty seconds and everyone in the room nods. In the context of a public demo, these tools look almost identical. Ask the AI a question, the tool returns an answer, and the engineer fixes the bug. Impressive. But if you buy based on the demo, you may end up with an AI layer that looks great on a call and disappoints in production.

Read Post

Honeycomb

Read more about Evaluating Observability Tools for the AI Era

Bindplane Community Call in March 2026

Mar 12, 2026 By Bindplane In ObservIQ

Tune in for the Bindplane Community Call in March to learn more about SSO going GA, a wave of new updates, connectors, sources, and destinations, including a VictoriaMetrics partner integration — and a preview of what we're building next. We'll also share details on meeting the Bindplane team at KubeCon + CloudNativeCon Europe in Amsterdam. As always, hands-on demos and a live Q&A at the end.

View Video

ObservIQ

Read more about Bindplane Community Call in March 2026

Why Generic AI Fails in Ops: What Trustworthy Actually Requires

Mar 12, 2026 By ScienceLogic In ScienceLogic

Enterprise operations reached a point where complexity outpaced human interpretation and outgrew the capabilities of generic AI. As environments became more distributed and interdependent, every incident, anomaly, and degradation produced ripple effects across systems that require context, lineage, and reasoning. Yet most AI models were not built for this reality. They were trained for general knowledge tasks, not the deeply connected operational truths that define enterprise performance.

Read Post

ScienceLogic

Read more about Why Generic AI Fails in Ops: What Trustworthy Actually Requires

Mastering the Diagnostic pivot from Health Policy to Pod

Mar 12, 2026 By Jonny Steiner In Coralogix

In the world of modern microservices, scale is a necessary challenge. Enterprise service inventories start modestly with a handful of components, only to balloon to hundreds over time. Traditional monitoring approaches cannot support that weight. The more organizations build, the more work they create, often only to keep systems running.

Read Post

Coralogix

Read more about Mastering the Diagnostic pivot from Health Policy to Pod

Actually Useful AI: Troubleshoot Issues Fast with Grafana Assistant

Mar 12, 2026 By Grafana In Grafana

From "something's off" to "here's why (and how to fix it)." Grafana Assistant goes full detective mode: pulls the clues, connects the dots, and recommend a fix... with receipts.

View Video

Grafana

Read more about Actually Useful AI: Troubleshoot Issues Fast with Grafana Assistant

Syncing LDAP Users & Groups with the Icinga Notifications Web API

Mar 12, 2026 By Johannes Meyer In Icinga

If you’re running Icinga in a mid-to-large organization, chances are your users and teams are already defined in LDAP or Active Directory. Manually re-creating contacts and contact groups in Icinga Notifications Web is tedious and error-prone, but thankfully, it doesn’t have to be that way. The Icinga Notifications Web REST API gives you everything you need to automate this synchronization. In this post, we’ll walk through how to build a reliable LDAP-to-Icinga sync using the v1 API.

Read Post

Icinga

Read more about Syncing LDAP Users & Groups with the Icinga Notifications Web API

How to Reduce MTTR with AI-Powered Runtime Diagnosis

Mar 12, 2026 By Lightrun Team In Lightrun

Reducing Mean Time to Resolution (MTTR) in production systems requires understanding failure behavior in real time. While AI code agents significantly accelerated software development and deployment, incident resolution has remained constrained by incomplete pre-captured telemetry. AI SRE tools improve signal correlation, but MTTR reduction requires runtime-verified diagnosis that confirms execution behavior directly in production systems.

Read Post

Lightrun

Read more about How to Reduce MTTR with AI-Powered Runtime Diagnosis

How to Solve "Cannot Reproduce" Bugs That Cost Support Teams Hours

Mar 12, 2026 By Maor Yaffe In Lightrun

Support teams frequently face vague customer reports and incomplete data but need to offer fast resolutions autonomously without escalating to developers. In this article, learn how to equip support engineers with tools to diagnose root causes in minutes, increasing self-sufficient issue resolution. We explore eliminating the ‘Reproduction Tax’ for ‘cannot reproduce’ bugs using runtime context to achieve technical certainty at scale.

Read Post

Lightrun

Read more about How to Solve "Cannot Reproduce" Bugs That Cost Support Teams Hours

6 Key Roles Every DEX Team Needs

Mar 12, 2026 By Megan Brake In Nexthink

Digital employee experience doesn’t fail because of technology. It fails because of operating models. Many digital workplace leaders invest in visibility tools, dashboards, automation capabilities, and sentiment platforms. And yet, months later, they’re still stuck in reactive mode. Tickets are down slightly. Reporting is better. But the organization hasn’t fundamentally shifted.

Read Post

Nexthink

Read more about 6 Key Roles Every DEX Team Needs

Native OpenTelemetry inside Alloy: Now you can get the best of both worlds

Mar 12, 2026 By Bejal Lewis In Grafana

We're big proponents of OpenTelemetery, which has quickly become a new unified standard for delivering metrics, logs, traces, and even profiles. It's an essential component of Alloy, our popular telemetry agent, but we're also aware that some users would prefer to have a more "vanilla" OpenTelemetry experience.

Read Post

Grafana

Read more about Native OpenTelemetry inside Alloy: Now you can get the best of both worlds

Monitoring Your Node.js App Health on Fly.io

Mar 12, 2026 By Tarun Singh In AppSignal

The Node.js service has just been containerized and deployed with a single fly deploy command across continents. Everything seems to be alright, but then a week later, a user messages you saying the app is slow. You run the fly logs command and scroll through some logs, and find nothing out of the ordinary. The Fly.io dashboard says the app is running and healthy, but something behind the scenes is slowing down the app, and you have no idea what. You don’t even know where to start.

Read Post

AppSignal

Read more about Monitoring Your Node.js App Health on Fly.io

Mastering Root Cause Analysis with Monitoring & Traffic Insights

Mar 12, 2026 By Progress WhatsUp Gold In WhatsUp Gold

IT teams today face increasing pressure to resolve issues quickly, - but hybrid environments, rising complexity, and endless alerts often slow everything down. In this expert‑led 30‑minute webinar, you’ll see how combining Progress WhatsUp Gold infrastructure monitoring with deep traffic analysis delivers the visibility needed to diagnose problems faster and significantly reduce time‑to‑resolution.

View Video

WhatsUp Gold

Read more about Mastering Root Cause Analysis with Monitoring & Traffic Insights

Let's Encrypt 45-Day Certificate Expiration: Monitoring & More

Mar 12, 2026 By Dotcom-Monitor In Dotcom-Monitor

TLS certificate lifetimes are shrinking fast — and that changes how every organization handles renewals, validation, and outage prevention. Let’s Encrypt has confirmed it will move from 90-day certificates to 45-day certificates (with staged rollouts) and dramatically shorten authorization reuse windows. At the same time, the CA/Browser Forum’s Ballot SC-081v3 has adopted a broader industry schedule that ultimately caps public TLS certificates at 47 days by March 15, 2029.

Read Post

Dotcom-Monitor

Read more about Let's Encrypt 45-Day Certificate Expiration: Monitoring & More

Claude Agent SDK Monitoring & Observability with OpenTelemetry and SigNoz

Mar 12, 2026 By SigNoz - Open Source Observability Platform In SigNoz

Learn how to implement monitoring and observability for the Claude Agent SDK using OpenTelemetry and SigNoz. In this video, we walk through instrumenting your Claude-based agents, capturing traces, metrics, and logs, and visualizing everything in SigNoz for real-time insights. You’ll learn how to debug agent behavior, identify latency bottlenecks, and monitor performance in production environments.

View Video

SigNoz

Read more about Claude Agent SDK Monitoring & Observability with OpenTelemetry and SigNoz

Blind spots in hybrid IT: SolarWinds report finds 77% of IT teams lack full visibility across on-prem and cloud

Mar 11, 2026 By SolarWinds In SolarWinds

New data shows AI is accelerating incident response, reducing noise, and closing visibility gaps across increasingly complex IT environments.

Read Post

SolarWinds

Read more about Blind spots in hybrid IT: SolarWinds report finds 77% of IT teams lack full visibility across on-prem and cloud

Multi-Language Status Page Widgets: Customize Widget Messages in Any Language

Mar 11, 2026 By Nuno Tomas In isDown

If your product serves users in multiple regions, your status page widget shouldn't be stuck in English. A customer in São Paulo seeing "All Systems Operational" when they expect "Todos os Sistemas Operacionais" is a small friction, but small frictions compound. It signals that their language isn't a priority, and it adds cognitive load during the exact moment they're checking whether something is broken. Until now, IsDown widgets shipped with hardcoded English messages. That's changed.

Read Post

isDown

Read more about Multi-Language Status Page Widgets: Customize Widget Messages in Any Language

Claude outage analysis: What happened on March 11

Mar 11, 2026 By Andy Libby In StatusGator

On March 11, 2026, users around the world began reporting problems with Claude, including login failures, API errors, and stalled responses. While the disruption did not affect every user, reports quickly showed that the issue was widespread. StatusGator began receiving outage reports at 13:56 UTC. Using its Early Warning Signals system, StatusGator detected the growing incident at 14:22 UTC. The provider officially acknowledged the outage later at 14:44 UTC.

Read Post

StatusGator

Read more about Claude outage analysis: What happened on March 11

Understanding Karpenter architecture for Kubernetes autoscaling

Mar 11, 2026 By David Lentz In Datadog

Karpenter is a fast, flexible Kubernetes autoscaler designed to improve cluster performance and cost efficiency. When the cluster doesn’t have capacity to schedule a pod, Karpenter requests additional compute from the cloud provider, specifying a right-sized instance that matches the preferences you’ve set (for example, instance family).

Read Post

Datadog

Read more about Understanding Karpenter architecture for Kubernetes autoscaling

Key metrics for monitoring Karpenter

Mar 11, 2026 By David Lentz In Datadog

In Part 1 of this series, we explored how Karpenter’s architecture enables just-in-time provisioning and active node consolidation. Because Karpenter is constantly making infrastructure decisions based on real-time scheduling pressure, its metrics can give you early warning of provisioning slowdowns, cloud API throttling, and misconfigurations that prevent it from scaling the way you expect.

Read Post

Datadog

Read more about Key metrics for monitoring Karpenter

Tools for collecting metrics and logs from Karpenter

Mar 11, 2026 By David Lentz In Datadog

In the first two parts of this series, we explored how Karpenter’s architecture enables just-in-time provisioning and active node consolidation, and we identified the key Karpenter metrics you should track to keep your cluster performant and cost-efficient. In this post, we’ll look at vendor-agnostic tools you can use to capture these signals.

Read Post

Datadog

Read more about Tools for collecting metrics and logs from Karpenter

Monitor Karpenter with Datadog

Mar 11, 2026 By David Lentz In Datadog

In this series, we’ve explored Karpenter’s architecture, the key metrics that reflect its health and performance, and the vendor-agnostic tools for collecting and analyzing its telemetry data. In this final post, we’ll show you how Datadog helps you monitor and alert on Karpenter alongside your Kubernetes cluster and the infrastructure that runs it.

Read Post

Datadog

Read more about Monitor Karpenter with Datadog

What your product data is actually saying

Mar 11, 2026 By Milene Darnis In Datadog

As tools such as AI agents become more integrated with the instrumentation, governance, and centralization of product analytics data, product managers (PMs) still own the meaning of those events and the connected outcomes. Knowing when to trust the data, forming strong hypotheses, and being able to act on the insights requires an expert in the loop.

Read Post

Datadog

Read more about What your product data is actually saying

Why DevOps and SRE Teams are replacing 3-4 monitoring tools with Atatus?

Mar 11, 2026 By Mohana Ayeswariya J In Atatus

Your on-call engineer gets paged. A critical service is down. Error rates are spiking. They open Sentry for errors. Flip to Grafana for metrics. Pivot to Kibana to search logs. Then jump to Lumigo, but that only covers the Lambda functions, not the Node.js backend throwing the actual errors. Three tabs become five. Five become eight. Half the incident is gone and your team is still piecing together what happened instead of fixing it. Sound familiar?

Read Post

Atatus

Read more about Why DevOps and SRE Teams are replacing 3-4 monitoring tools with Atatus?

Log Correlation for Security and Performance Monitoring

Mar 11, 2026 By Jeff Darrington In Graylog

International travel comes with amazing sights, cultural experiences, and local delicacies. However, most travelers know that it comes with differing economies that impact a money’s value and various currencies. When people need cash, they have to translate the money in their wallets to the local currency, which means different coins and bills. Depending on the exchange rate, the currency’s value can change as the person moves from one country to another.

Read Post

Graylog

Read more about Log Correlation for Security and Performance Monitoring

The future of Search is here: Faster, simpler, AI-driven

Mar 11, 2026 By Jack Coates and In Cribl

Do more with less. That’s the mandate we’re all hearing. AI has fundamentally changed how we work. Modern AI workloads generate 10-100x more queries than humans ever could, pushing legacy architectures past performance limits. And the audacity of it all? Legacy logging vendors continue to raise costs without delivering meaningful innovation. IT and security teams are still forced to choose between speed and retention. Investigations are still slow. Data onboarding is still painful.

Read Post

Cribl

Read more about The future of Search is here: Faster, simpler, AI-driven

Observability Where You Work: Introducing the Honeycomb Slackbot in Beta

Mar 11, 2026 By Mei Luo In Honeycomb

Engineers are constantly context switching between tools, adding cognitive overhead on top of already complex work. You're deep in an investigation, you need to analyze some data, pull up a runbook somewhere else, and share findings back in Slack. Context gets lost in the shuffle, correlating across data sources becomes painful, and everything just takes longer. In high-pressure situations like incidents, that friction has a real cost to the business.

Read Post

Honeycomb

Read more about Observability Where You Work: Introducing the Honeycomb Slackbot in Beta

Honeycomb Metrics Is Now Generally Available

Mar 11, 2026 By Toni Chou In Honeycomb

It’s Black Friday. Checkout latency is spiking. Your on-call engineer pulls up the dashboard and starts working through the list. Is it a regional issue? No, all regions look fine. A payment provider? Stripe, PayPal, Apple Pay all nominal. A bad deployment? Nothing shipped in the last six hours. All your infrastructure dashboards are showing green. But customers are complaining. Checkout is slow, carts are being abandoned and revenue is draining away.

Read Post

Honeycomb

Read more about Honeycomb Metrics Is Now Generally Available

Launching Virtana Application Observability

Mar 11, 2026 By Virtana In Virtana

Breaking down silos, unifying teams, and seeing problems before they impact performance—Virtana Application Observability empowers both application and infrastructure teams to collaborate smarter and act faster. This is the future of full-stack visibility.

View Video

Virtana

Read more about Launching Virtana Application Observability

Update Management, Content Hub Expansion, and KQL Support

Mar 11, 2026 By VirtualMetric In VirtualMetric

The latest VirtualMetric DataStream release introduces several important capabilities across platform security, data management, and operational workflows. This update strengthens access protection, simplifies infrastructure management, and expands the ways security teams can work with live telemetry. It also extends platform connectivity and improves the user experience across many areas of the interface. Let’s take a closer look.

Read Post

VirtualMetric

Read more about Update Management, Content Hub Expansion, and KQL Support

DNS Monitoring

Mar 11, 2026 By Leo Baecker In Hyperping

You can now monitor DNS records directly from Hyperping. DNS issues are often invisible until your users start complaining. With DNS monitoring, Hyperping checks that your records resolve correctly from multiple locations and alerts you the moment something goes wrong. Head to your monitors dashboard to create a DNS monitor. You can also manage DNS monitors via the API. Questions? Reach out via in-app chat or email us at hello@hyperping.io.

Read Post

Hyperping

Read more about DNS Monitoring

Why Your NOC Will Ignore AI

Mar 11, 2026 By Yann Guernion In Broadcom

Imagine you are driving to work and a yellow check engine light flickers on your dashboard. The car feels fine. It accelerates normally, there is no strange noise, and the temperature gauge is steady. What do you do? If you are like most people, you keep driving. You might make a mental note to look at it later, but you don't pull over on the highway and call a tow truck.

Read Post

Broadcom

Read more about Why Your NOC Will Ignore AI

Shadow AI and the Coming Workplace Reckoning (w/ Kay Firth-Butterfield)

Mar 11, 2026 By Nexthink In Nexthink

In this episode of The DEX Show, we’re joined by Kay Firth-Butterfield, the world’s first Chief AI Ethics Officer and former Head of AI at the World Economic Forum. From human rights law and human trafficking to Davos and large language models, Kay traces her remarkable journey into AI governance. We explore shadow AI, workplace “hallucinations,” AI companions, and the hidden risks leaders are underestimating. Kay shares why organizations need cross-functional AI governance, stronger guardrails, and far better training — and why the future of work may depend as much on the humanities as technology.

View Video

Nexthink

Read more about Shadow AI and the Coming Workplace Reckoning (w/ Kay Firth-Butterfield)

See what your users see: Delivering digital experiences that drive measurable outcomes

Mar 11, 2026 By ManageEngine Site24x7 In Site24x7

Learn about the nuances of digital experience monitoring from our expert Q&A format webinar.

View Video

Site24x7

Monitoring

Read more about See what your users see: Delivering digital experiences that drive measurable outcomes

Track and Fix NextJS errors Using Rollbar

Mar 11, 2026 By Rollbar In Rollbar

Setting up Rollbar with your NextJS application with custom parameters and people tracking.

View Video

Rollbar

Monitoring

Read more about Track and Fix NextJS errors Using Rollbar

Buy vs Build in the Age of AI (Part 2)

Mar 11, 2026 By James Barnes In StatusCake

In Part 1, we explored how AI has dramatically reduced the cost of building monitoring tooling. That much is clear. You can scaffold uptime checks quickly, generate alert logic in minutes, and set-up dashboards faster than most teams used to schedule the kickoff meeting. So the barriers to entry have fallen. But there’s a quieter question that rarely gets asked in the excitement of building. Have you ever calculated what it would actually cost to replace your monitoring provider?

Read Post

StatusCake

Read more about Buy vs Build in the Age of AI (Part 2)

Unleashing Resilience: Why the Agentic Era Demands a Unified Data Fabric

Mar 11, 2026 By JK Lialias In Splunk

Imagine starting your day with a dozen disconnected apps where your calendar does not sync with your reminders, your maps do not know your appointments, and your contacts are not linked to your messages. You would constantly be scrambling, missing key details, and reacting late to what matters most. In our personal lives, we depend on tight integration to keep pace with the world. In business, the stakes are even higher.

Read Post

Splunk

Read more about Unleashing Resilience: Why the Agentic Era Demands a Unified Data Fabric

Find bugs before they ship with Seer code review

Mar 11, 2026 By Sentry In Sentry

Try Sentry for free: https://sentry.io
Docs: https://docs.sentry.io

View Video

Sentry

Monitoring

Read more about Find bugs before they ship with Seer code review

Rising Demand for Elderly Care: Why Skilled Workers are in High Demand

Mar 11, 2026 By OpsMatters In OpsMatters

People are living longer lives, a trend that brings both joy and new logistical challenges. Families now face difficult decisions about how to support aging loved ones. A growing need for professional assistance is reshaping the job market and household budgets. Finding the right balance between medical needs and personal comfort is a major goal for millions.

Read Post

OpsMatters

Read more about Rising Demand for Elderly Care: Why Skilled Workers are in High Demand

Infrastructure Under Scrutiny: Turning Visibility into Cost Control

Mar 10, 2026 By Kristy Slimmer In Galileo

A practical discussion with infrastructure leaders on how visibility is shaping cost control, renewal planning, and financial accountability across hybrid environments. Runtime: 41:32 The conversation around infrastructure has shifted. IT teams are no longer measured only on uptime or performance.

Read Post

Galileo

Read more about Infrastructure Under Scrutiny: Turning Visibility into Cost Control

The hidden reason your reports don't match

Mar 10, 2026 By Harsitha P In ManageEngine

There is a quiet moment that sometimes happens right before a meeting begins. The slides are ready. Dashboards are open. The numbers look neat on the screen. But the revenue doesn’t match last week’s number. A trend line suddenly looks different. Someone says, "That’s strange." And the conversation shifts. Instead of talking about strategy or growth, the room starts trying to figure out what happened to the data. Moments like this rarely happen because someone made a mistake.

Read Post

ManageEngine

Read more about The hidden reason your reports don't match

Technology in the Workplace Statistics for 2026

Mar 10, 2026 By Dana Krook In Auvik

Workplace tech has officially entered high gear. AI is embedding itself into everyday operations, and the modern workplace is more distributed and demanding than ever. For network and IT teams, the upside is significant—but only with the visibility and control needed to keep everything running smoothly. Here are 20+ technology in the workplace statistics shaping 2026 that can give IT and network teams a glimpse into where we’re headed.

Read Post

Auvik

Read more about Technology in the Workplace Statistics for 2026

The best observability platforms for developers

Mar 10, 2026 By Julie Kent In Honeybadger

At some point, logs stop being enough. As applications grow more distributed, understanding what's actually happening in production becomes harder. That's what observability platforms are built for. The hard part is figuring out which one is actually right for your application — and your budget. This guide covers some popular options: what they do well, where they fall short, and who they're for.

Read Post

Honeybadger

Read more about The best observability platforms for developers

Unlocking the Power of SolarWinds Through Training - SolarWinds TechPod 107

Mar 10, 2026 By solarwindsinc In SolarWinds

In this SolarWinds TechPod episode, hosts Chrystal Taylor and Sean Sebring talk with Cheryl Nomanson, a SolarWinds Academy trainer with 14 years at the company. They discuss the importance of technical education for complex software and networks, exploring SolarWinds' comprehensive training offerings including the SolarWinds Academy with its on-demand courses, instructor-led virtual classes, and office hours format. Cheryl explains the SolarWinds Certified Professional (SCP) certification program and the newer SolarWinds Certified Instructor (SCI) program for training partners globally.

View Video

SolarWinds

Read more about Unlocking the Power of SolarWinds Through Training - SolarWinds TechPod 107

Olly for SREs: 3 ways I actually use it in production

Mar 10, 2026 By Coralogix Team In Coralogix

There’s a moment after an alert where you’re not fixing anything yet. You’re trying to answer a much simpler question: Is it actually down? Sometimes it’s obvious. Sometimes it’s 20 alerts at once with no clear starting point. Sometimes it’s a small upstream degradation that might cascade. Sometimes it’s just a spike that resolves on its own. That first phase is orientation. Is the signal real or transient? Is it isolated or spreading? Root cause or symptom?

Read Post

Coralogix

Read more about Olly for SREs: 3 ways I actually use it in production

Expanding Uptime Monitoring Down The Stack: ICMP Monitors Are Now Available In Checkly

Mar 10, 2026 By Susa Tünker In Checkly

When we started building Checkly's uptime monitoring suite, the goal was to give engineering teams complete visibility across every layer of their stack, from application down to network, in one place. URL, TCP, DNS, and Heartbeat monitors covered a lot of that ground. But one fundamental piece was missing: the ability to simply ping a host and know if it's reachable.

Read Post

Checkly

Read more about Expanding Uptime Monitoring Down The Stack: ICMP Monitors Are Now Available In Checkly

When Your Plant Talks Back: Conversational AI with InfluxDB 3

Mar 10, 2026 By Suyash Joshi In InfluxData

No one wants to stare at a plant and guess if it needs water. It’s much easier if the plant can say, “I’m thirsty.” A few years ago, we built Plant Buddy using InfluxDB Cloud 2.0. The linked article is still a great guide for cloud-first IoT prototyping as it shows how quickly you can connect devices, store time series data, and build dashboards in the cloud with the previous version of InfluxDB. But this time, the goal was different.

Read Post

InfluxData

Read more about When Your Plant Talks Back: Conversational AI with InfluxDB 3

Bring Clarity and Confidence Back to Ops: How Trustworthy Guidance Sets a New Standard

Mar 10, 2026 By ScienceLogic In ScienceLogic

For years, enterprises have chased the promise of artificial intelligence as a remedy for growing operational complexity. It seemed logical that if environments were expanding faster than teams could keep up, smarter models could fill the gap. But early deployments of generic AI proved a difficult truth. Intelligence alone does not create operational clarity. It does not guarantee safety.

Read Post

ScienceLogic

Read more about Bring Clarity and Confidence Back to Ops: How Trustworthy Guidance Sets a New Standard

ICMP Monitors Are Now Available in Checkly

Mar 10, 2026 By Checkly In Checkly

Checkly introduces ICMP monitoring to complement its existing uptime and synthetic monitoring (URL/HTTP, TCP, DNS, and heartbeat checks) for systems without HTTP endpoints, such as database hosts, VPN gateways, and load balancers.

View Video

Checkly

Read more about ICMP Monitors Are Now Available in Checkly

Release software with confidence using Datadog Feature Flags

Mar 10, 2026 By Datadog In Datadog

In this technical product demo, see how Datadog Feature Flags helps teams release software with confidence by connecting every feature flag to real-time observability data. Configure progressive, multi-step rollouts with automated guardrails tied to APM, RUM, and Product Analytics so you can pause or roll back instantly if latency, errors, or key business metrics degrade.

View Video

Datadog

Read more about Release software with confidence using Datadog Feature Flags

The architecture advantage: Why the data layer decides the AI race

Mar 10, 2026 By David Girvin In Sumo Logic

Dozens of startups are sprinting to build the next “agentic SIEM” that can autonomously detect, investigate, and respond to threats. They’re well-funded, well-marketed, but structurally hollow. Here’s what it usually looks like: an LLM layer on top of a thin orchestration engine on top of fragmented or customer-hosted data lakes. While it looks impressive in a demo, it quickly falls apart in production. Why? It’s not built on a strong foundation.

Read Post

Sumo Logic

Read more about The architecture advantage: Why the data layer decides the AI race

Root Cause Analysis in Software Testing: Methods, Techniques, and How AI Is Changing the Game

Mar 10, 2026 By Rollbar In Rollbar

If you've ever fixed a bug only to watch it come back two weeks later, you already understand why root cause analysis matters. Patching symptoms feels productive - it's not. Getting to the actual cause is what prevents the same issue from eating your team's time over and over again. This guide covers everything you need to know about root cause analysis (RCA) in software testing: what it is, how to do it, which tools help, and where AI is taking it next.

Read Post

Rollbar

Read more about Root Cause Analysis in Software Testing: Methods, Techniques, and How AI Is Changing the Game

How One ISP Kept Transit Costs Flat While Traffic Grew 7x (MetroNet + Kentik)

Mar 10, 2026 By Kentik In Kentik

"We pay less in transit IP costs today than we did when we turned up Kentik. Our cost per Mbps is lowered over the years — and our traffic is seven times what it was when we started." — Michael Leclaire, MetroNet.

View Video

Kentik

Read more about How One ISP Kept Transit Costs Flat While Traffic Grew 7x (MetroNet + Kentik)

What's New at Cribl 4.17: On release days, we wear teal.

Mar 10, 2026 By Cribl In Cribl

In this episode, Leon runs through all the updates in Cribl release 2603, which includes a massive update to Cribl Search, the ability to detect PII and secrets in the background as part of Cribl Guard, and two cool enhancements to Cribl Packs - monitoring and enhanced routing. Try Cribl Now! Sandboxes let you get hands-on experience with Cribl without the fuss or friction.

View Video

Cribl

Read more about What's New at Cribl 4.17: On release days, we wear teal.

What is Cribl Guard background detection?

Mar 10, 2026 By Cribl In Cribl

Security and compliance teams need to know exactly what sensitive data is flowing through their environments and where it’s going. Because surprise PII is no one’s favorite kind of surprise. Meanwhile, upstream teams are shipping new apps, changing schemas, adding fields, and generally moving fast. However, you can only manage and protect the data you currently know of and expect. But sensitive data has a habit of showing up where no one expected it…

View Video

Cribl

Read more about What is Cribl Guard background detection?

Meet the new Cribl Search: Faster investigations with AI

Mar 10, 2026 By Cribl In Cribl

Get a quick look at the new Cribl Search experience—built to help teams investigate faster, onboard data easily, and get answers from their logs without complex query languages. In this quick overview, we show how Cribl Search helps you move from raw data to insights in minutes: The result? Faster investigations, simpler workflows, and powerful AI-assisted analysis across your telemetry. Learn how the new Cribl Search makes exploring and analyzing data easier for everyone—from experienced analysts to teams just getting started.

View Video

Cribl

Read more about Meet the new Cribl Search: Faster investigations with AI

What is AI really going to bring to the table when it comes to migration?

Mar 10, 2026 By Elastic In Elastic

Explore the real capabilities and limitations of AI in system and SIEM migrations. Learn where AI accelerates processes and where human review remains essential. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about What is AI really going to bring to the table when it comes to migration?

Navigating Machine Data at Infinite Scale: Why the Modern Enterprise Demands a New Data Architecture

Mar 10, 2026 By Seth Brickman In Splunk

In the modern enterprise, data is no longer just a byproduct of business; it is the lifeblood. However, we have moved beyond the era of simple transactional data. We are now living in the age of machine data.

Read Post

Splunk

Read more about Navigating Machine Data at Infinite Scale: Why the Modern Enterprise Demands a New Data Architecture

Why status pages suck

Mar 9, 2026 By Andy Libby In StatusGator

Cloud status pages were supposed to bring transparency to outages. Instead, they’ve become one of the most frustrating parts of incident response. Just to illustrate, here are only a few of the many posts on X: When a cloud service fails, status pages are often slow to update, incomplete, or missing information. Crowdsource platforms are noisy and misleading.

Read Post

StatusGator

Read more about Why status pages suck

Improved SSO setup and logging

Mar 9, 2026 By Valeria Kurolapova In StatusGator

We’ve made several improvements to Single Sign-On (SSO) in StatusGator to make authentication easier to configure and easier to monitor. As a reminder the StatusGator dashboard includes SAML-based SSO on all plan tiers, even our free plan. This update introduces a simplified SSO setup flow along with a new Audit logs tab that provides visibility into authentication activity.

Read Post

StatusGator

Read more about Improved SSO setup and logging

SharePoint Online outage on March 6, 2026

Mar 9, 2026 By Colin Bartlett In StatusGator

On March 6, 2026, SharePoint Online experienced a disruption that prevented some users from loading sites, accessing files, or authenticating successfully. The incident did not affect every user, but reports came in from multiple regions including North America and Europe. StatusGator detected the problem early through user outage reports and triggered an Early Warning Signal before Microsoft officially acknowledged the issue.

Read Post

StatusGator

Read more about SharePoint Online outage on March 6, 2026

Create a Custom Service Health Board With the Honeycomb MCP

Mar 9, 2026 By Jessica Kerr (Jessitron) In Honeycomb

Your software is sending data to Honeycomb. Now where is the dashboard you want? The best dashboard is one created just for your application, or your service, or your team. You can get that in minutes with the Honeycomb MCP. Open your coding agent in your IDE, or on the command line in your code repository. Configure the Honeycomb MCP and authenticate with Read and Write permissions. Now tell it what you want. You can be high-level: Make me a service health board for the frontend service.

Read Post

Honeycomb

Read more about Create a Custom Service Health Board With the Honeycomb MCP

Why You Should Automate Network Troubleshooting

Mar 9, 2026 By Andrii Kernitskyi In Obkio

It's 2 AM. The Network Is Down. Where Do You Start? You get the call. Users can't connect. VoIP is choppy. Something is broken somewhere between your office and the cloud. You open your monitoring dashboard and it says something is wrong, but not where. Not why. Not since when? So you do what IT teams have done for decades. You open a terminal, run a traceroute, SSH into the router, pull up SNMP, check the firewall logs.

Read Post

Obkio

Read more about Why You Should Automate Network Troubleshooting

Top 5 Web Applications for Reverse Phone Lookup & Contact Verification

Mar 9, 2026 By Super Monitoring In Super Monitoring

In today’s world, even a missed call can cause concern. Was it important? Who was trying to reach you? Or was it just another spammer? The question who called me from this phone number has become extremely relevant. Not only ordinary users feel this relevance, but also small businesses, security services, and companies that strive to maintain accurate contact databases. In response to this increase in unwanted calls and fraud, specialized web applications have emerged.

Read Post

Super Monitoring

Read more about Top 5 Web Applications for Reverse Phone Lookup & Contact Verification

How to Design Competitor Monitoring Reports That Drive Strategic Decisions

Mar 9, 2026 By ChangeTower In ChangeTower

Competitor monitoring reports often end up as data graveyards, filled with information nobody acts on. The difference between reports that gather dust and reports that drive decisions comes down to design choices made before the first data point gets collected. To get the most from your competitor monitoring, building a comprehensive and actionable report is key.

Read Post

ChangeTower

Read more about How to Design Competitor Monitoring Reports That Drive Strategic Decisions

Approaching your observability migration with the right mindset

Mar 9, 2026 By Nick Vecellio In Datadog

This guest blog post is authored by Nick Vecellio, Principal Engineer and Co-founder of NoBS, a Premier Datadog Partner specializing in hands-on Datadog migrations and optimizations. At NoBS, we help enterprises migrate their observability stack to Datadog. Teams often come to us after a migration has technically “worked,” but the new setup requires optimization tweaks to provide the clarity, reliability, or operational benefits they’re looking for.

Read Post

Datadog

Read more about Approaching your observability migration with the right mindset

Four ways engineering teams use the Datadog MCP Server to power AI agents

Mar 9, 2026 By Bowen Chen In Datadog

Since the Datadog Model Context Protocol (MCP) Server first launched in Preview, Datadog has experienced an overwhelming amount of interest and feedback from customers. We appreciate those who requested access to test our product, provided feedback, and shared their stories of how the MCP Server helped them overcome engineering challenges.

Read Post

Datadog

Read more about Four ways engineering teams use the Datadog MCP Server to power AI agents

Apono integration for Grafana: Enabling Just-in-Time access for data sources

Mar 9, 2026 By Ben Avner In Grafana

Ben Avner is the Head of Ecosystem and Strategic Alliances at Apono, where he leads the company’s global partner strategy and technology alliances. He focuses on building and scaling strategic partnerships that drive product innovation, partner-influenced pipeline, and long-term growth. A former founder and engineer, Ben brings a strong technical foundation and a builder’s mindset, combined with experience across marketing, product partnerships, and go-to-market strategy.

Read Post

Grafana

Read more about Apono integration for Grafana: Enabling Just-in-Time access for data sources

AI Systems Status Report - February 2026

Mar 8, 2026 By Nuno Tomas In isDown

This report covers the operational status of major AI systems during February 2026, including Anthropic, Cohere, DeepSeek, Google Gemini, Groq Cloud, OpenAI, Perplexity, Replicate, and xAI. The data includes official incidents reported on vendor status pages and unconfirmed incidents detected through IsDown's monitoring systems.

Read Post

isDown

Read more about AI Systems Status Report - February 2026

What to do during #llmjacking #aisecurity #devsecops #cloudsecurity #securecloud #llmsecurity

Mar 7, 2026 By Sysdig In Sysdig

View Video

Sysdig

Read more about What to do during #llmjacking #aisecurity #devsecops #cloudsecurity #securecloud #llmsecurity

Microsoft SCOM Myth

Mar 6, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

Yes. We’ve Heard It All Before.

Read Post

NiCE IT Mgmt

Read more about Microsoft SCOM Myth

New API: Submit outage reports

Mar 6, 2026 By Valeria Kurolapova In StatusGator

We’ve added a new endpoint to the StatusGator API that allows you to submit outage reports for monitors on your board. With the new Outage Reports API, you can programmatically report issues you’re experiencing with a service. These reports help StatusGator detect outages faster and improve visibility for other users who rely on the same services.

Read Post

StatusGator

Read more about New API: Submit outage reports

Episode 6 - The evolution from automation to autonomy

Mar 6, 2026 By Digitate In Digitate

Tom and Akhilesh unpack why automation alone will never deliver autonomy, and why intelligence means anticipating change rather than constantly reacting to it. They explore the role of people in enterprise transformation, the limits of technology without trust and context, and why the most powerful use of AI is freeing humans to focus on what they do best. Plus, Akhilesh makes the case for ping pong as a surprisingly effective way to reset when the pressure is on.

View Video

Digitate

Read more about Episode 6 - The evolution from automation to autonomy

Accelerate Vulnerability Remediation with Atatus: From Detection to Secure Deployment

Mar 6, 2026 By Pavithra Parthiban In Atatus

In microservices and cloud-native environments, vulnerabilities buried in transitive dependencies or runtime behaviors can go undetected for weeks. During that time, your attack surface keeps expanding and production systems remain exposed. The longer remediation is delayed, the greater the risk of exploitation, compliance failures, and operational disruption.

Read Post

Atatus

Read more about Accelerate Vulnerability Remediation with Atatus: From Detection to Secure Deployment

Oh Dear is now mobile-friendly

Mar 6, 2026 By Mattias Geniar In Oh Dear

Oh Dear has always been a desktop-first tool. If you checked your monitors on your phone, you'd get the full desktop layout squeezed into a tiny screen, with lots of horizontal scrolling and tiny tap targets. That's fixed now. Every page in the app works on mobile.

Read Post

Oh Dear

Read more about Oh Dear is now mobile-friendly

Best Rails APM Tools in 2026: A Developer's Guide

Mar 6, 2026 By Sarah Morgan In Scout

Rails applications have a specific set of performance challenges that make monitoring genuinely useful rather than just box-checking. ActiveRecord is convenient to use and also convenient to accidentally write N+1 queries with. Memory bloat in long-running processes, particularly when Sidekiq or Action Cable is involved, is a recurring production problem for a lot of teams. Background job performance tends to degrade quietly until it becomes noticeable.

Read Post

Scout

Read more about Best Rails APM Tools in 2026: A Developer's Guide

How Autonomous Are Your IT Operations, Really?

Mar 6, 2026 By Margo Poda In LogicMonitor

This post introduces a six-level maturity model that defines what true autonomy looks like in IT operations, from basic AI chat interfaces to fully coordinated agent ecosystems. ITOps teams have more automation tooling than ever, and yet incident response still depends heavily on human judgment to hold it together. Alerts fire, engineers dig through dashboards, context gets assembled by hand, and someone at the end of the workflow makes the final call.

Read Post

LogicMonitor

Read more about How Autonomous Are Your IT Operations, Really?

What is Agentic Observability?

Mar 6, 2026 By LogicMonitor In LogicMonitor

Agentic observability is the instrumentation and correlation needed to explain and control agent behavior across multi-step workflows. Legacy observability focuses on runtime health and service behavior. You monitor metrics like CPU usage, memory, latency, and error rates to confirm that applications and infrastructure are functioning as expected. When a workflow degrades, the proximate cause is often a crash, timeout, permission error, or resource constraint.

Read Post

LogicMonitor

Read more about What is Agentic Observability?

Datadog Incident Response: One platform from alert to resolution

Mar 6, 2026 By Datadog In Datadog

When incidents strike, speed and clarity are critical. Datadog Incident Response brings the full incident lifecycle into one platform so teams can move from detection to resolution with confidence. Operate from a single, unified view of your systems, coordinate across the tools your teams already use, and leverage AI that analyzes incidents in real time to surface context, guide decisions, and accelerate resolution.

View Video

Datadog

Read more about Datadog Incident Response: One platform from alert to resolution

Observability for Azure Virtual Desktop with SquaredUp

Mar 6, 2026 By SquaredUp In Squared Up

Managing Azure Virtual Desktop doesn’t have to mean jumping between portal blades, logs, and metrics trying to piece together what’s happening. In this webinar, you’ll learn how to design and implement a single, operational observability dashboard for Azure Virtual Desktop (AVD) using SquaredUp Cloud — transforming fragmented telemetry into clear, actionable insight. Whether you're responsible for performance, user experience, or operational stability, this session will give you a structured, repeatable framework for monitoring your AVD estate with confidence.

View Video

Squared Up

Read more about Observability for Azure Virtual Desktop with SquaredUp

More details on #llmjacking #llmsecurity #llm #aisecurity #apikeys #cloudsecurity

Mar 6, 2026 By Sysdig In Sysdig

View Video

Sysdig

Read more about More details on #llmjacking #llmsecurity #llm #aisecurity #apikeys #cloudsecurity

Trends in Mainframe Modernization: Fresh Insights from SHARE Orlando

Mar 6, 2026 By Navdeep Sidhu In meshIQ

Fresh insights from SHARE Orlando reveal mainframe modernization isn't about replacement—it's evolution. From hybrid architectures to AI-driven automation, enterprises are transforming legacy systems into agile, integrated platforms while preserving core reliability.

Read Post

meshIQ

Read more about Trends in Mainframe Modernization: Fresh Insights from SHARE Orlando

Full-Stack Observability Is Becoming a Business Imperative

Mar 6, 2026 By Dallon Robinette In Selector

As enterprises accelerate digital transformation, technology performance has become inseparable from business performance. Customer experiences, revenue streams, and operational efficiency increasingly depend on the reliability of complex, distributed systems. In this environment, full-stack observability is no longer a technical aspiration — it is a strategic necessity.

Read Post

Selector

Read more about Full-Stack Observability Is Becoming a Business Imperative

7 Tech Tools to Help Monitor Your Loved One's Safety

Mar 6, 2026 By OpsMatters In OpsMatters

Staying connected with aging family members is a top priority for many households. Technology now offers many ways to keep tabs on health and safety without being intrusive. Choosing the right tools can provide comfort to both the senior and their caregivers. These devices help bridge the gap between independence and necessary support - creating a safer home.

Read Post

OpsMatters

Read more about 7 Tech Tools to Help Monitor Your Loved One's Safety

Sponsored Post

Build vs Buy Monitoring: The Real Cost Breakdown for IT Teams

Mar 5, 2026 By Nuno Tomas In isDown

Every IT team eventually faces this question: should we build our own monitoring system or buy an existing solution? On the surface, building seems attractive. You get complete control, no vendor lock-in, and the illusion of "free" since you're using internal resources. But the math rarely works out that way. Let's break down what it actually costs to build, when building genuinely makes sense, and how to make the right decision for your team.

Read Post

isDown

Read more about Build vs Buy Monitoring: The Real Cost Breakdown for IT Teams

From Reactive to Predictive: Preserving BESS Uptime at Scale

Mar 5, 2026 By Allyson Boate In InfluxData

Battery Energy Storage Systems (BESS) operate as revenue-generating grid assets that capture surplus electricity, deploy power during demand spikes, and support frequency control. By shifting energy across time, they stabilize grid conditions, enable renewable integration, and execute market dispatch commitments. When systems respond as designed, stored capacity becomes a flexible, monetizable supply. But BESS performance depends on precision and availability.

Read Post

InfluxData

Read more about From Reactive to Predictive: Preserving BESS Uptime at Scale

What Is LLMjacking? The New AI Cybercrime Stealing Cloud AI Compute

Mar 5, 2026 By Sysdig In Sysdig

LLMjacking is a new cybercrime where attackers steal access to cloud-hosted AI models and use them for free — while the victim pays the bill. In this video, we break down what LLMjacking is, how attackers exploit compromised credentials and exposed APIs, and why security teams should treat AI infrastructure as a high-value attack target. Discovered by the Sysdig Threat Research Team, LLMjacking is quickly becoming the AI-era equivalent of cryptojacking — except instead of mining cryptocurrency, attackers run expensive large language models (LLMs) at scale.

View Video

Sysdig

Read more about What Is LLMjacking? The New AI Cybercrime Stealing Cloud AI Compute

Meet the new Bits AI SRE: Deeper reasoning, twice as fast

Mar 5, 2026 By Dan Green In Datadog

When we announced Bits AI SRE at DASH 2025, we introduced an autonomous SRE agent that investigates alerts the moment they trigger. Bits AI SRE reads the same telemetry data as your team, understands your architecture, and follows your runbooks to identify likely root causes before you even open your laptop. It’s your AI teammate that’s always on call.

Read Post

Datadog

Read more about Meet the new Bits AI SRE: Deeper reasoning, twice as fast

How AI lets you talk to your company's data and get answers instantly

Mar 5, 2026 By Elastic In Elastic

In this conversation recorded at Elastic’s New York office, three product leaders discuss how AI agents are transforming enterprise software. The discussion features Steve Kearns (general manager, Search solutions at Elastic), Mike Nichols (general manager, Security solutions at Elastic), and Baha Azarmi (general manager, Observability at Elastic). They explain how Elastic Agent Builder allows teams to interact with their data using natural language instead of complex queries.

View Video

Elastic

Read more about How AI lets you talk to your company's data and get answers instantly

How LLMs can help boost productivity

Mar 5, 2026 By Elastic In Elastic

Learn how large language models (LLMs) are transforming productivity in business, coding, research, and daily workflows. Discover practical ways to use AI tools to automate tasks and improve efficiency. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about How LLMs can help boost productivity

Your Questions About AI-Assisted Development Answered

Mar 5, 2026 By Austin Parker In Honeycomb

We recently hosted a webinar on AI-assisted development with DORA, and the audience had a lot of questions—far more than we could get to in an hour. I picked out six that get at the stuff people are wrestling with day to day. These aren't the easy questions, and I don't think there are necessarily easy answers, but I've spent the past year building and shipping with AI coding tools and observing (literally) what happens when that code hits production. Here's what I have.

Read Post

Honeycomb

Read more about Your Questions About AI-Assisted Development Answered

Office Hours with David Girvin

Mar 5, 2026 By Sumo Logic, Inc. In Sumo Logic

Weekly office hours with David Girvin. Check out recent feature releases and updates, watch a quick live demo, and ask any questions with live Q&A.

View Video

Sumo Logic

Read more about Office Hours with David Girvin

You Can't Create Culture - But You Can Destroy It

Mar 5, 2026 By solarwindsinc In SolarWinds

Culture already exists in your team. Leaders don’t create it — they either nurture it or destroy it. This is why leadership presence matters.

View Video

SolarWinds

Read more about You Can't Create Culture - But You Can Destroy It

Routing OpenTelemetry logs to Sentry using OTLP

Mar 5, 2026 By James W. In Sentry

If you've already instrumented your app with OpenTelemetry, you don't have to rip it out to use Sentry. Two environment variables and your logs start flowing into Sentry, no SDK changes, no re-instrumentation. Here's how to set it up in a sample app, and when the native Sentry SDK might be the better call.

Read Post

Sentry

Read more about Routing OpenTelemetry logs to Sentry using OTLP

Best Python APM Tools in 2026: A Developer's Guide

Mar 5, 2026 By Sarah Morgan In Scout

Last updated: March 2026 Python applications built on Django, Flask, FastAPI, and other frameworks have the same monitoring needs as applications built in any other language: you want to know which endpoints are slow, why the database is getting hammered, what errors are firing in production, and ideally all of that in a form that does not require three separate tools to reconstruct a single incident.

Read Post

Scout

Read more about Best Python APM Tools in 2026: A Developer's Guide

How Imperva Gets Traffic Answers in Seconds with Kentik

Mar 5, 2026 By Kentik In Kentik

Imperva Network Architect, Wallace Lee, shares how Kentik helps teams drill deeper than traditional reporting tools to improve network and customer experience. Wallace shares how, during a live architecture review, Imperva’s Kentik power users answered a critical “are we safe?” traffic question in seconds. Kentik enables engineers to instantly understand prefix-level bandwidth and shows exactly which ASN and ISP traffic came from. Wallace also highlights how Kentik makes Anycast traffic visibility an “easy win,” helping teams move from questions to confident decision-making fast.

View Video

Kentik

Read more about How Imperva Gets Traffic Answers in Seconds with Kentik

Preventing SLA Breaches With Proactive Monitoring as MSPs Move Toward Autonomous IT

Mar 5, 2026 By Sofia Burton In LogicMonitor

AI-first hybrid observability with proactive monitoring helps MSPs protect SLAs as they move toward autonomous IT by getting engineers the right alerts before issues impact service. Managed services lives and dies on timing. The difference between a minor issue and a customer-facing incident often comes down to how early an engineer gets the right signal and how quickly they can act on it. That timing shows up in SLAs, service credits, escalations, and the trust you earn when customers feel taken care of.

Read Post

LogicMonitor

Read more about Preventing SLA Breaches With Proactive Monitoring as MSPs Move Toward Autonomous IT

SquaredUp vs Grafana: The Enterprise IT dashboard showdown

Mar 5, 2026 By Blog In Squared Up

Modern enterprises operate across an increasingly complex mix of hybrid cloud services, and productivity platforms. As environments scale, stakeholders need a single pane of glass (SPoG) to understand what’s happening across IT operations without jumping across dozens of disconnected tools.

Read Post

Squared Up

Read more about SquaredUp vs Grafana: The Enterprise IT dashboard showdown

How Race Communications Automates DDoS Mitigation with Kentik

Mar 5, 2026 By Kentik In Kentik

Sorin Esanu, Director of Network Engineering at Race Communications, explains why deep, always-on network intelligence is essential when you have massive volumes of traffic moving in and out from many sources. After outgrowing an on-prem tool that required ongoing maintenance and didn’t deliver the analytics they needed, Race chose Kentik for richer visibility, daily traffic optimization, and improved security.

View Video

Kentik

Read more about How Race Communications Automates DDoS Mitigation with Kentik

Avoid the Swivel-Chair Tool Stack: Conway Corporation on Why Kentik Wins

Mar 5, 2026 By Kentik In Kentik

Everett Sinclair, Network Administrator at Conway Corporation, explains why Kentik became their “one pane of glass” for cloud-based network visibility, rapid troubleshooting, and smarter peering and caching decisions. With Kentik’s SaaS network intelligence platform, Conway gets updates automatically, avoids server rebuilds, and can deploy cloud agents remotely to run simple metric tests close to customer locations.

View Video

Kentik

Read more about Avoid the Swivel-Chair Tool Stack: Conway Corporation on Why Kentik Wins

Continuous Security Monitoring: The Practical Guide for Modern Ops Teams

Mar 5, 2026 By OpsMatters In OpsMatters

If you've ever been on call during a "nothing changed... except everything" incident, you already understand the real problem with traditional security checks: they're snapshots. And snapshots are useless the moment your infrastructure shifts, a new SaaS tool gets approved, a developer spins up a service in a different region, or a vendor quietly exposes an admin portal to the internet. Modern environments don't stay still. So security can't, either.

Read Post

OpsMatters

Read more about Continuous Security Monitoring: The Practical Guide for Modern Ops Teams

AWS Middle East data center strikes: 92 SaaS platforms report disruptions

Mar 4, 2026 By Colin Bartlett In StatusGator

StatusGator analysis identifies 92 cloud services that publicly acknowledged disruptions tied to the AWS Middle East incident. Over the weekend, Amazon confirmed that drone strikes damaged AWS facilities in the Middle East, disrupting cloud infrastructure across the region. The strikes affected AWS regions in the United Arab Emirates and Bahrain, causing outages and degraded performance across core cloud services including compute, storage, and databases.

Read Post

StatusGator

Read more about AWS Middle East data center strikes: 92 SaaS platforms report disruptions

Buy vs Build in the Age of AI (Part 1)

Mar 4, 2026 By James Barnes In StatusCake

A few months ago, I spoke to an engineering manager who proudly told me they had rebuilt their monitoring stack over a long weekend. They’d used AI to scaffold synthetic checks. They’d generated alert logic with dynamic thresholds. They’d then wired everything into Slack and PagerDuty, and built a clean internal dashboard. “It used to take us weeks to prototype something like this,” they said. “Now it’s basically instant.” They weren’t wrong.

Read Post

StatusCake

Read more about Buy vs Build in the Age of AI (Part 1)

Introducing Rocky AI to General Availability

Mar 4, 2026 By Dan Giordano In Checkly

After months of being available in Beta for our app users, Rocky AI is now generally available to all users and plans. Rocky AI is Checkly’s AI agent that works around the clock, 24/7, to make sure your application’s reliability is optimal. In this first release, Rocky AI ships with the ability to run continual Analysis on test and check failures, giving your teams AI-powered root cause analysis, impact analysis, and more.

Read Post

Checkly

Read more about Introducing Rocky AI to General Availability

We Turned Our WireShark Wizard Into a Markdown File

Mar 4, 2026 By Tim Nolet In Checkly

Rocky AI — Checkly’s AI agent — is now Generally Available. We developed Rocky AI over the last ~6 to 8 months. This is an aeon in AI-years. During this period, we learned a ton. About AI, but mostly about how to fit them into an existing SaaS product, not just another chat widget. This is my ramble…

Read Post

Checkly

Read more about We Turned Our WireShark Wizard Into a Markdown File

How to undo Git reset hard?

Mar 4, 2026 By Johannes Rauh In Icinga

You just finished a long interactive rebase. You hit enter. Your commit history looks… wrong. There is a bunch of things that could go wrong: You could dig through git reflog for 10 minutes. Or you could press Git’s hidden undo button: ORIG_HEAD.

Read Post

Icinga

Read more about How to undo Git reset hard?

Public Sector Observability: Service Experience and Reliability Are Now Mission-Critical

Mar 4, 2026 By LogicMonitor In LogicMonitor

Reliable digital services aren’t optional for public sector agencies. They’re essential to mission success. Across the U.S. public sector, service experience and reliability have moved from operational concerns to mission requirements. At a federal level, Executive Order 14058 makes improving service delivery and customer experience a federal priority, measured by real outcomes for the public. And for state and local governments, the bar is set by the private sector.

Read Post

LogicMonitor

Read more about Public Sector Observability: Service Experience and Reliability Are Now Mission-Critical

Use plain English to query your multi-cloud infrastructure in Resource Catalog

Mar 4, 2026 By Sriram Raman In Datadog

Modern cloud environments include thousands of resources across providers, teams, and accounts. Organizations need the ability to quickly locate the right resources so that they can manage resource compliance and troubleshoot issues. When engineers need to answer questions such as which databases are still on extended support or which storage buckets lack encryption, they often have to switch consoles, use provider-specific query languages, and know obscure version strings or configuration flags.

Read Post

Datadog

Read more about Use plain English to query your multi-cloud infrastructure in Resource Catalog

Generating metrics from traces with cardinality control: A closer look at HyperLogLog in Tempo

Mar 4, 2026 By Carles Garcia In Grafana

While tracing is a critical component of any observability strategy, metrics — especially RED metrics (request rate, error rate, and duration) — are widely considered the gold standard for monitoring service health. Tempo, the open source, easy-to-use, and highly scalable distributed tracing backend, is well known in the OSS community for storing and querying traces. It can also, however, generate RED metrics directly from those traces using the optional metrics-generator component.

Read Post

Grafana

Read more about Generating metrics from traces with cardinality control: A closer look at HyperLogLog in Tempo

7 Real Ways to Modernize NetOps with Kentik AI Advisor

Mar 4, 2026 By Eric Hian-Cheong In Kentik

Kentik’s AI Advisor acts as a virtual network engineer, helping teams of all skill levels troubleshoot, manage, and optimize their infrastructure with unprecedented speed and context. We explore seven practical NetOps use cases, from rapid incident triage and capacity planning to upcoming live-device command support, that demonstrate how using AI as a collaborative teammate dramatically reduces manual investigative work.

Read Post

Kentik

Read more about 7 Real Ways to Modernize NetOps with Kentik AI Advisor

Skills vs. MCP: You're probably reaching for the wrong one

Mar 4, 2026 By David Girvin In Sumo Logic

Everyone is adding Model Context Protocol (MCP) servers to everything right now. And I get it. MCP is clean. It’s standardized. You write a server, expose some tools, and suddenly your LLM can query your log platform, pull a dashboard, and fire an alert. It feels like the right abstraction. But I’ve watched teams at serious companies burn weeks building MCP integrations for workflows that should have been skills, and build skills for things that genuinely needed MCP.

Read Post

Sumo Logic

Read more about Skills vs. MCP: You're probably reaching for the wrong one

Is your search bar your competitor's best salesperson?

Mar 4, 2026 By Jeremy Pell In Elastic

New Australian research reveals poor website search is costing businesses revenue as AI raises the bar.

Read Post

Elastic

Read more about Is your search bar your competitor's best salesperson?

The Spark Avengers Unite: Dispatches on the FUTURE of IT (w/ Matt, Moe & Denis)

Mar 4, 2026 By Nexthink In Nexthink

Tom assembles the “Spark Avengers” for a deep dive into the most talked-about innovation in IT: Nexthink Spark, the personal AI agent for every employee. Joined by Moe Haidar, Denis Schertenleib and Matt Rose, the team unpacks how Spark evolved from early LLM experiments into an enterprise-ready, autonomous IT agent already delivering 70%+ first contact resolution. From printers and frozen cameras to complex root-cause analysis, Spark is transforming support from reactive to proactive.

View Video

Nexthink

Read more about The Spark Avengers Unite: Dispatches on the FUTURE of IT (w/ Matt, Moe & Denis)

Setting up Ping (ICMP) Check

Mar 4, 2026 By Uptime Website Monitoring In uptime

In this video, we’ll walk you through on how to set up and configure your Ping ICMP check in Uptime.com.

View Video

uptime

Monitoring

Read more about Setting up Ping (ICMP) Check

Key learnings from the 2026 State of DevSecOps study

Mar 4, 2026 By Kennedy Toomey In Datadog

We recently released the 2026 State of DevSecOps study, in which we analyzed tens of thousands of applications and their respective supply chain and build system dependencies. Our research revealed trends in security posture and best practices across the software development life cycle.

Read Post

Datadog

Read more about Key learnings from the 2026 State of DevSecOps study

How does AI enhance search?

Mar 4, 2026 By Elastic In Elastic

Explore how artificial intelligence enhances search engines through semantic understanding, vector embeddings, and contextual retrieval. Learn how AI-powered search delivers faster and more accurate results. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about How does AI enhance search?

Centralizing Docker Logs for Observability and Security

Mar 4, 2026 By Jeff Darrington In Graylog

Most people can remember the old game of telephone, the stream of whispered sentences or phrases across a group of kids. At each transmission, a different piece of information gets lost or misheard, leaving the last person with an incomplete or incomprehensible statement. Managing Docker logs can feel the same way, especially when an error message is lost or an error message lacks context.

Read Post

Graylog

Read more about Centralizing Docker Logs for Observability and Security

Sponsored Post

The art of software engineering management

Mar 3, 2026 By Sumitra Manga In Raygun

Like any leadership role, leading an engineering team in a mature, compact company like Raygun comes with both honor and responsibility. Leading a major development project is a bit like conducting a symphony orchestra, where every individual plays a crucial role and has a great impact on the work they release to customers and end-users.

Read Post

Raygun

Read more about The art of software engineering management

The Battle for Control: Introducing Avantra AIR

Mar 3, 2026 By Brenton O'Callaghan In Avantra

SAP operations teams are drowning. Every day is a battle against alert fatigue, complex root causes, and repetitive firefighting. And while vendor spin will tell you that moving to the cloud or adopting SAP RISE magically simplifies everything, the reality on the ground is entirely different. We call it the Hybrid Cloud Paradox: Different providers might own different parts of your critical business landscape, but you still own the business risk.

Read Post

Avantra

Read more about The Battle for Control: Introducing Avantra AIR

Did ChatGPT take down Claude?

Mar 3, 2026 By Colin Bartlett In StatusGator

On March 2, 2026, Claude experienced a widespread service disruption that affected users across North America, Europe, Asia, and Australia. The outage quickly drew significant media attention, with numerous technology news outlets reporting on user frustration and downtime. In the early hours of the incident, some commentators speculated that the disruption may have been caused by a sudden influx of new users migrating from OpenAI. However, there is no public evidence confirming that theory.

Read Post

StatusGator

Read more about Did ChatGPT take down Claude?

February 2026 product updates

Mar 3, 2026 By Valeria Kurolapova In StatusGator

February brought powerful new improvements to StatusGator – from better status page analytics and expanded API capabilities to smarter incident detection. We also published our latest Early Warning Signals report, highlighting major outages we detected before providers acknowledged them. Here’s everything that’s new.

Read Post

StatusGator

Read more about February 2026 product updates

What You Need to Know About Choosing a Data Center Location for SolarWinds Papertrail

Mar 3, 2026 By Rachel Revoy In SolarWinds

When signing up for SolarWinds Papertrail, you’ll see an option to choose where your data is stored. What does this mean? What should you consider when choosing a data center location? In this blog, we’ll explore how you can determine where to store your data. First off, the region you choose is the physical location where your data is stored. Once you select a region, you can’t migrate data from it, so it’s important to choose carefully.

Read Post

SolarWinds

Read more about What You Need to Know About Choosing a Data Center Location for SolarWinds Papertrail

Simplifying troubleshooting across the user journey with Datadog Synthetic Monitoring

Mar 3, 2026 By Lauren Zuniga In Datadog

Every digital experience is a chain reaction. A customer logs in to an application, an API authenticates the request, a backend call retrieves data, a page loads, and somewhere along the way, something might break. When it does, teams often chase symptoms while the root cause remains hard to find. The more distributed the system, the more difficult it becomes to see how one small failure can cascade into a visible outage.

Read Post

Datadog

Read more about Simplifying troubleshooting across the user journey with Datadog Synthetic Monitoring

Announcing Automated Diagnostics: Reduce MTTR with Instant, Data-Driven Troubleshooting

Mar 3, 2026 By Tom Chaves In LogicMonitor

Automated Diagnostics closes the gap between detection and diagnosis instantly. Every IT operations team knows the pressure. When an alert hits at 2 a.m., it’s a race against time to find the root cause before users feel the impact. But gathering diagnostic data such as logs, process stats, and thread dumps can eat up critical minutes. That manual lag is exactly what Automated Diagnostics eliminates.

Read Post

LogicMonitor

Read more about Announcing Automated Diagnostics: Reduce MTTR with Instant, Data-Driven Troubleshooting

How to create and manage secrets with Grafana Cloud Synthetic Monitoring

Mar 3, 2026 By Bukola Ayodele In Grafana

Observability isn’t just about collecting metrics and logs; it’s about proactively validating that your systems work as expected. Synthetic monitoring helps teams continuously test APIs, applications, and critical user journeys. But when those checks require the use of sensitive data, securely managing credentials becomes essential to maintain both reliability and security.

Read Post

Grafana

Read more about How to create and manage secrets with Grafana Cloud Synthetic Monitoring

Cloud Application Slowness: Root Cause Analysis & Observability Insights

Mar 3, 2026 By Arun Aravamudhan In eG Innovations

Discover how cloud application slowness occurs despite healthy metrics and how unified observability helps identify hidden bottlenecks and resolve issues faster.

Read Post

eG Innovations

Read more about Cloud Application Slowness: Root Cause Analysis & Observability Insights

The Speed of Clarity: How Grounded Context Transforms Triage and Strengthens Operational Decision-Making

Mar 3, 2026 By ScienceLogic In ScienceLogic

Modern operations move at a pace that leaves little room for ambiguity. When an incident emerges, teams must determine what is happening and how best to respond. Yet triage often slows under the weight of fragmented data, noisy alerts, and limited shared understanding across engineering groups. These conditions stretch routine issues into drawn-out investigations and delay action exactly when teams need to move with purpose.

Read Post

ScienceLogic

Read more about The Speed of Clarity: How Grounded Context Transforms Triage and Strengthens Operational Decision-Making

Responsible transformation: Agentic AI for the public sector

Mar 3, 2026 By Eduard van Mierlo In Elastic

The world is transforming, and artificial intelligence, especially agentic AI, is quickly becoming embedded across private and public sectors. For government agencies, law enforcement, and mission-critical organizations, embracing this new reality is uniquely challenging. On the one hand, agentic AI promises measurable improvements: modernized IT workflows, faster analysis, improved citizen services, and operational efficiency.

Read Post

Elastic

Read more about Responsible transformation: Agentic AI for the public sector

5 Essential Capabilities that Make Coralogix an Observability Powerhouse

Mar 3, 2026 By Jonny Steiner In Coralogix

Sometimes observability can feel like a second job. With many traditional tools, users must become experts in a proprietary language to ask a simple question. In these cases, developers or SRE’s can find themselves spending more time manually sifting through raw text, building complex data pipelines from scratch, and bouncing between fragmented dashboards than actually solving problems.

Read Post

Coralogix

Read more about 5 Essential Capabilities that Make Coralogix an Observability Powerhouse

A Practical Guide to SCADA Security

Mar 3, 2026 By Charles Mahler In InfluxData

Critical infrastructure is under siege. The systems that control our power grids, water treatment plants, and oil pipelines weren’t designed for a connected world. This post covers what security measures teams need to understand and how time series monitoring can help turn SCADA’s weaknesses into a security advantage.

Read Post

InfluxData

Read more about A Practical Guide to SCADA Security

OnlineOrNot updates from February 2026

Mar 3, 2026 By Max Rozen In OnlineOrNot

February was all about making OnlineOrNot better for dev teams. I shipped audit logs for teams, added support for two-factor authentication and passkeys, unified the webhooks system across uptime checks, heartbeats, and status pages, and made it possible to connect multiple Discord channels.

Read Post

OnlineOrNot

Read more about OnlineOrNot updates from February 2026

Saved queries now support template variables | Grafana Cloud

Mar 3, 2026 By Grafana In Grafana

In this video, Collin Fingar, Software Engineer at Grafana Labs, demonstrates how template variables can be used in saved queries, a feature that enables users to reuse queries they or others in their org have saved. You'll see how a query that contains variables can be reused, and how the variables can be replaced at the point of reuse.

View Video

Grafana

Read more about Saved queries now support template variables | Grafana Cloud

Dropbox Ingests 5GB-6GB of Logs per Second Using Loki. Here's How | Grafana Everywhere

Mar 3, 2026 By Grafana In Grafana

As part of the Grafana Everywhere series, Chris Hodges from Dropbox shares why they chose Loki for their logging stack after their data center went dark.

View Video

Grafana

Read more about Dropbox Ingests 5GB-6GB of Logs per Second Using Loki. Here's How | Grafana Everywhere

Why Website Change Monitoring Matters for Modern Brand Management

Mar 3, 2026 By ChangeTower In ChangeTower

A competitor quietly slashes their prices, and the sales team doesn’t find out until deals start falling through. A rogue plugin update changes the homepage headline to something nobody approved. These scenarios play out constantly for brands without visibility into what’s happening on their own websites and their competitors’ sites. Website change monitoring provides that visibility through automated tracking and alerts, turning potential blind spots into strategic advantages.

Read Post

ChangeTower

Read more about Why Website Change Monitoring Matters for Modern Brand Management

Root cause and fix production bugs with Seer

Mar 3, 2026 By Sentry In Sentry

Try Sentry for free: https://sentry.io
Docs: https://docs.sentry.io

View Video

Sentry

Read more about Root cause and fix production bugs with Seer

Enabling Proactive ITOps with Skylar Advisor

Mar 3, 2026 By ScienceLogic In ScienceLogic

By continuously connecting signals across your IT environment, Skylar Advisor turns operational complexity into clear, prioritized guidance. It highlights potential impact, explains why it matters, and delivers clear next steps so IT teams can act early and stay ahead of alerts before they turn into issues.

View Video

ScienceLogic

Read more about Enabling Proactive ITOps with Skylar Advisor

When was the term artificial intelligence coined?

Mar 3, 2026 By Elastic In Elastic

Discover when the term artificial intelligence was first introduced and how it shaped the future of AI research and machine learning. This video breaks down the origin of AI and its historical significance in modern technology. About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about When was the term artificial intelligence coined?

Telemetry Talks ep.2 - How to use OpenTelemetry in VictoriaMetrics Cloud

Mar 3, 2026 By VictoriaMetrics In VictoriaMetrics

Telemetry Talks – Episode is here! In this episode, Diana and Jose introduce VictoriaMetrics Cloud, covering what it is, the problems it solves, and its pricing model, including how overages are handled. If you’re building or operating cloud-native systems and want a clearer, real-world understanding of OpenTelemetry and managed observability, this episode is for you. Resources for Further Learning.

View Video

VictoriaMetrics

Monitoring

Read more about Telemetry Talks ep.2 - How to use OpenTelemetry in VictoriaMetrics Cloud

What does investigation look like when data lives in multiple tools?

Mar 3, 2026 By Virtana In Virtana

War rooms don’t fix fragmentation. They expose it. Incident hits. App checks traces. Infra checks hosts. Cloud checks dashboards. Network checks packets. Everyone sees their layer. No one sees the system. So we guess. Rollback. Add capacity. Freeze change. The noise stops. The constraint doesn’t. Modern failures don’t live in tools. They live in dependencies. If your platform can’t follow a transaction across hybrid and AI infrastructure — to the exact constraint — you don’t have observability.

View Video

Virtana

Read more about What does investigation look like when data lives in multiple tools?

Why Small Businesses Still Underestimate Endpoint Monitoring - And What MSPs Can Do About It

Mar 3, 2026 By OpsMatters In OpsMatters

Small businesses tend to think of cybersecurity in terms of firewalls and antivirus software. If those two boxes are checked, the assumption is that the network is protected. But the threat landscape has shifted dramatically in the last few years, and endpoints - laptops, desktops, mobile devices, even printers - have become the primary attack surface. Most small businesses haven't adjusted their defenses accordingly.

Read Post

OpsMatters

Read more about Why Small Businesses Still Underestimate Endpoint Monitoring - And What MSPs Can Do About It

IT Monitoring News | March '26 Edition

Mar 2, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

Latest releases, resources, and events focused on Microsoft SCOM and modern ITOps & DataOps CONTACT.

Read Post

NiCE IT Mgmt

Read more about IT Monitoring News | March '26 Edition

Protecting sensitive PII data with effective log management

Mar 2, 2026 By Jenifer P In Site24x7

Organizations rely heavily on logs or tracking changes, troubleshooting issues, and addressing authentication attempts. Although these logs are essential for ensuring a smooth onboarding experience, they often contain users' personally identifiable information (PII), including names, email addresses, phone numbers, and sometimes location or device details. The following sample log illustrates this scenario: 2025-11-01 09:12:33 ACCOUNT_CREATED - New user registered: Name: Michael Scott, Email.

Read Post

Site24x7

Read more about Protecting sensitive PII data with effective log management

February 2026 Early Warning Signals

Mar 2, 2026 By Colin Bartlett In StatusGator

February 2026 saw another wave of impactful service disruptions across AI platforms, e-commerce infrastructure, developer tools, education providers, collaboration apps, and cloud services. Using StatusGator’s Early Warning Signals, we detected outages before providers publicly acknowledged them – and in several cases, providers never acknowledged them at all. Many services still lack transparent or timely status communication, leaving users with little visibility during critical incidents.

Read Post

StatusGator

Read more about February 2026 Early Warning Signals

System Datasets: From Alert Fatigue to Optimized Notifications

Mar 2, 2026 By Coralogix In Coralogix

Alert fatigue rarely begins as a single mistake. It grows as systems scale, teams grow, and “just in case” monitoring becomes the default. A few extra alerts, another threshold, and soon the on-call channel becomes overwhelmed. Engineers get interrupted for noise or stop trusting pages; either way, real signals get missed. Reliability drops, and productivity quietly declines. Most teams respond tactically: tune thresholds, change notifications, suppress noise.

Read Post

Coralogix

Read more about System Datasets: From Alert Fatigue to Optimized Notifications

Tech Talk | Application management with Targeted Application Install for Victoria Experience

Mar 2, 2026 By Splunk In Splunk

Apps create endless opportunities to leverage the strengths of the Splunk Cloud platform. Until now, you could only install Splunk apps across every search head on a Splunk Cloud Platform Victoria Experience deployment. With TAI you now have fine-grained control over which search head groups will run which apps.

View Video

Splunk

Read more about Tech Talk | Application management with Targeted Application Install for Victoria Experience

Tech Talk | Unlock Database Monitoring with Splunk Observability Cloud

Mar 2, 2026 By Splunk In Splunk

In this technical session, discover how Splunk Database Monitoring detects slow queries, speeds root cause analysis, and drives faster fixes with AI assistance. You will learn how to.

View Video

Splunk

Read more about Tech Talk | Unlock Database Monitoring with Splunk Observability Cloud

Grafana Alerting: faster rules, personalized filters, and an operations workspace

Mar 2, 2026 By Deyan Halachliyski In Grafana

Alerts are only useful when you can quickly find and act on the right signal. That's why, over the past two years, we rebuilt Grafana Alerting’s UI to make it more reliable and efficient, especially at scale. The result: a faster, paginated alert rules page that handles tens of thousands of rules, with a powerful filter dropdown and saved searches so you can quickly get back to the views you care about most.

Read Post

Grafana

Read more about Grafana Alerting: faster rules, personalized filters, and an operations workspace

Why we open-sourced AURA: Infrastructure for production AI

Mar 2, 2026 By Henry Andrews In Mezmo

Over the last year, I’ve talked to dozens of SRE teams about AI. The excitement is real, but conversations hit a wall when we get to production reality. How does an agent manage complex context without losing the plot? How does it avoid hallucinating relationships between signals? Who owns the orchestration logic that ties it all together? We realized the bottleneck wasn’t model intelligence. It was the lack of a reliable logic layer between the data and the model.

Read Post

Mezmo

Read more about Why we open-sourced AURA: Infrastructure for production AI

Operations | Monitoring | ITSM | DevOps | Cloud