Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

Understanding Amazon Security Lake: Enhancing Data Security in the Cloud

This year, Amazon Web Services (AWS), a leading cloud services provider, announced a comprehensive security solution called Amazon Security Lake. In this blog post, we will explore what Amazon Security Lake is, how it works, the benefits for organizations, and partners you can leverage alongside it to enhance security analytics and quickly respond to security events.

Zero code tracing: Kubernetes observability with Logz.io and eBPF

Distributed tracing is a core tool for operating modern microservices platforms. For SREs and DevOps teams, it is often the fastest way to understand latency issues, service dependencies, and unexpected failure modes. But achieving comprehensive tracing coverage is resource-intensive and time-consuming. It usually requires application changes, language-specific instrumentation, agent lifecycle management, and ongoing coordination with development teams.

Observability for Feature Flags

Some of your users are having a party; dancing away, having a great time. But a couple of users are stuck outside in the rain, knocking on the door, trying to get in. Unfortunately, you can’t hear them because of all the noise happening inside. That’s what it feels like when you gradually roll out new features across your user base without the right monitoring.

Sampled analysis of 10 billion spans with Coralogix highlight comparison

The CNCF reported that between 39% and 56% of organizations surveyed are now ingesting traces as part of their observability strategy. Tracing has become a cornerstone of any modern observability operation. Customers are regularly handling 10s of billions of spans every day, but with billions of spans, how can teams quickly figure out what is changing, what’s breaking, or what’s slowing down?

2026 Observability Predictions: What Lies Ahead?

What remains of the 2025 AI hype? After a year of “AI will fix everything” promises, engineering teams in 2025 hit a wall of reality: AI is a tool, not a magic bullet. We’re now seeing a more practical approach: identifying broken workflows and tasks where AI can help and leveraging AI strengths like data analysis at speed and scale to derive meaningful, valuable insights. Looking ahead, 2026 will reward organizations that combine AI innovation with a practical approach.

IoT Sensor Data into Graylog: A Lab Guide

Graylog has always been associated with log management, metrics, SIEM and security monitoring—but it’s also a great tool for creative, low-cost experiments in a home lab. I wanted to use it for real-world sensor data, so I built a DIY temperature and humidity monitor using an ESP-WROOM-32 development board and a DHT22 sensor.

Elastic and Google Cloud's powerful partnership in 2025

In 2025, Elastic and Google Cloud created a powerhouse of AI-driven insights, providing an end-to-end search, observability, and security journey for our joint customers. We continue to partner on many opportunities for success and have made even further progress this year to empower all our users, especially around generative AI (GenAI). This blog highlights our collaboration with Google Cloud to help you harness the power of data at scale as well as our top moments from Google Cloud Next ‘25.

The Observability Stack is Collapsing: Why Context-First Data is the Only Path to AI-Powered Root Cause Analysis

By Bill Balnave, VP of Customer Success at Mezmo The core promise of modern observability is simple: cut Mean Time To Resolution (MTTR). Yet, despite a boom in tooling and investment over the last four years, the data tells a sobering story: our industry is actually getting worse at finding and resolving issues. Dashboards, once our trusted guide, have become the starting point for a chaotic "dashboard hunt" that rarely leads to the definitive root cause.

Confessions of a software engineer who enjoyed being paged at 5am

It’s 5:14am, and I wake up to the squawking geese sound of my PagerDuty alert (anyone else have this sound? No?). I’m four months into working for my new team as a junior software engineer, and this is my first time being paged in the middle of the night. Most software engineers probably dread this moment, but I kind of love it. Agile ceremonies and Jira tickets suddenly don’t matter, and you’re fully focussed on stopping a customer-impacting fire.

Elastic at AWS re:Invent: Concluding a year of partnership in agentic AI innovation

Highlights of another laudable year of customer-centric collaboration The integration of Elastic’s capabilities, including vector databases and context engineering, with AWS services helps customers build intelligent, scalable, and secure applications faster and with greater flexibility. Our ongoing collaboration has resulted in another year of notable innovation with AWS. This blog highlights our continued collaboration with AWS throughout 2025 to help you capitalize on the power of AI.

Logging Best Practices (Grafana OpenTelemetry Community Call)

We’re back with a new Grafana OpenTelemetry Community Call episode, and this time we’re diving into logging with OpenTelemetry and Grafana Loki! Even better, we’re joined by two fantastic guests: Jack Berg, OTel logging expert, and Ed Welch, Loki guru. Getting both of them in one conversation makes for an amazing deep-dive into all things logging. Logs come in every shape and size, from simple CLI output to massive distributed systems generating petabytes of structured data. In this episode, we’ll talk about.

About us - Sumo Logic

Security teams are flooded with thousands, or even millions, of signals every day. Sumo Logic’s entity-based SIEM and Dojo AI agents automate the manual work of detection, triage, and remediation so you can act faster on the alerts that matter. Discover how Sumo Logic simplifies security operations, helping you cut through the noise and protect your digital world.

Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

Want to make your logs easier to work with? Excluding unneeded logs from indexing reduces noise and may reduce log management costs. In this video, you’ll learn how to: See for yourself how to improve log utilization with Datadog Log Patterns and log exclusion filters. Then set up an alert to track ingestion spikes.

Setting up OpenTelemetry Demo in Kubernetes with Splunk Observability Cloud

Are you looking to explore the power of OpenTelemetry and Splunk Observability Cloud in a Kubernetes environment? This video provides a comprehensive, step-by-step walkthrough on how to deploy the OpenTelemetry Demo application in Kubernetes and seamlessly integrate it with Splunk Observability Cloud for metrics, traces, and logs! In this tutorial, you'll learn.

Tech Talk - Splunk Observability for AI

In this Tech Talk, we’ll show you how Splunk’s agentic, AI observability delivers end-to-end visibility of the entire AI stack, from agents and large language models (LLMs) to the underlying infrastructure. You’ll see how AI Infrastructure Monitoring provides teams with data-dense dashboards and detectors for surfacing trends, patterns, and outliers to correlate application health with underlying AI infrastructure performance.

Tech Talk - Take action automatically on Splunk alerts with Red Hat Ansible Automation Platform

As digital and AI applications become more prevalent, the need for fast, efficient, and consistent management of IT operations is critical. This session will show you how to automate responses to Splunk Observability Platform alerts using Red Hat Ansible Automation Platform's Event-Driven Ansible.

Building visibility and resilience across Kubernetes

Kubernetes has transformed how modern applications are deployed and scaled. Its flexibility and automation power innovation but also expand the attack surface. From control plane access to runtime drift, Kubernetes introduces layers of complexity that can obscure visibility if not properly monitored. For security leaders, Kubernetes is both an opportunity and a risk. While it enables agility, it also decentralizes security responsibility across teams, tools, and cloud layers.

Introducing the Databricks Destination: Powering governed, scalable analytics from day one

Modern enterprises are generating more high-volume observability and security data than ever, which means the cost and complexity of getting analytics-ready data into Databricks are only growing. With the new Databricks Destination for Cribl Stream, organizations finally have a governed, scalable, and cost-efficient way to take full control of their data pipelines, accelerate AI-driven analytics, and unlock real business value from their Databricks investment.

Save the logs, save the planet: How to make your observability stack greener

If data centres were a country, they’d rank fifth in electricity consumption by 2026. Over the past few years, the resulting carbon footprint of the technology industry has sparked the fast-growing green software movement, led by the Green Software Foundation. How can we continue to innovate software in a way that also minimises its impact on the environment? This has been a fascinating problem I’ve been exploring for a few years now.

How to Use MCP to Optimize Your Graylog Security Detections

Security teams face a critical question: “What logs should we collect, and what detections should we enable to protect against threats targeting our industry?” For a bank in the northeast, this isn’t academic. Threat groups like FIN7, Lazarus Group, and Carbanak specifically target financial institutions with sophisticated attacks ranging from SWIFT compromise to ransomware.

AI Observability in 2026: Why the data layer means everything

If there was ever a year for AI observability, it was 2025. Vendors released assistants to cover a variety of use cases. Coralogix released the first agent (distinct from assistants!), Olly, an autonomous, multi-agent observability platform. The direction of travel is clear, but many vendors and users are about to run into some significant problems with their data layer.

Overcoming ClickHouse's JSON constraints to build a high-performance JSON log store

Customer logs data is always messy. Being (and building!) an observability platform, we get to see all the beautiful, creative ways it can be messy, every single day. And yet, our customers expect, quite fairly, I might add, perfect query results and peak performance. Info SigNoz is an open-source observability platform that can be your one-stop solution for logs, metrics and traces.

Graylog Guided Demo

Have a sneak peek at Graylog V7.0. Graylog V7.0 introduces a major step forward in speed, usability, and visibility across your entire security and operations workflow. In this demo, we walk through the newest capabilities designed to help teams detect, investigate, and respond faster than ever. You’ll see how the updated interface streamlines daily tasks, how the enhanced search and pipeline tools simplify complex data handling, and how powerful additions like built-in correlation and modernized dashboards give you clearer insight with less effort.

Google SecOps Forwarder Deprecation: Migrate to Bindplane and OpenTelemetry

Google Cloud Security Operations is deprecating the legacy SecOps Forwarder, and OpenTelemetry with Bindplane is the official telemetry ingestion method. In this workshop, you’ll learn how to migrate from the SecOps Forwarder to Bindplane and OpenTelemetry Collectors, the officially supported ingestion model for Google SecOps going forward. We walk through the why, the what, and the how — with practical guidance you can apply immediately.

Agentic AI demands a new data architecture #ai #telemetry

Clint Sharp explains why traditional schema-on-read systems cannot handle the query loads of the future. Agentic telemetry requires a 360-degree view, but structuring data only when you read it is too slow for AI-driven workloads. The solution is using LLMs to drive the cost of building parsers to near zero. Tools like Copilot Editor allow teams to map data to OCSF instantly, effectively building factories of parsers to handle the scale of agentic AI.

How AI Agents automate incident response #ai #cybersecurity #telemetry

Clint Sharp demonstrates how Cribl Search leverages AI to streamline incident investigation. Starting from a Slack channel, the AI builds an interactive notebook, analyzes order processing logs, and identifies suspicious traffic spikes. It connects high CPU usage to a recent Jenkins deployment, hypothesizing a supply chain attack, and ultimately recommends a rollback. This isn't a far off concept. It is the future of operations arriving right now.

Why AI agents need a common data model #ai #telemetry

Clint Sharp explains why a common model like OCSF is critical for the future of AI. Agents need standardized data to analyze information effectively on your behalf. He contrasts the traditional manual workflow of checking Slack, tickets, and wikis while asking colleagues with a future where AI fuses this human context with machine data. Instead of just search results, AI agents will hand you examined hypotheses so you know exactly where to take your investigation.

Elastic and Microsoft partnership achievements in 2025

Highlights of another successful year of customer-centric collaboration Once again, our partnership delivered an impressive year of innovation with Microsoft Azure, Azure AI Foundry, and Azure OpenAI. This blog highlights our continued collaboration with Microsoft to better serve customers throughout 2025 and our key moments at Microsoft Ignite.

Bindplane Community Call in December 2025

Join us live on Wednesday, December 10th at 11am EDT for the December Community Call. We’ll cover: Hands-on demos of the new Bindplane features you’ve been asking for Recaps of KubeCon+CloudNativeCon NA in Atlanta New Bindplane feature guides and blog posts As always, we’ll wrap with an interactive Q&A, so bring your questions!

Docker Logs Command Reference: tail, follow, since Options

Managing Docker container logs is essential for debugging and monitoring application performance. Tailoring Docker logs allows for real-time insights, quick issue resolution, and optimized performance. This guide focuses on efficient methods for tailing Docker logs, with clear examples and command options to streamline log management.

Observability trends for 2026: Maturity, cost control, and driving business value

The observability landscape has undergone a fundamental transformation over the past several years. In a recent report, The Landscape of Observability in 2026: Balancing Cost and Innovation conducted by Dimensional Research and sponsored by Elastic, over 500 IT decision-makers were surveyed. It revealed that observability has definitively transitioned from an optional capability to a mission-critical business function.

Become a 10x investigator with Cribl Notebooks

Cribl Notebooks aims to streamline the investigation process by bringing everything into a single interactive interface. It functions as a virtual war room where teams can collaborate in real time. You can view AI queries and code alongside charts without switching between scattered tabs or workstations. This persistence makes it easier to document the root cause and share the story behind the data.

Fixing Performance Issues Fast with Logs & Tracing

Learn how to quickly track down performance bottlenecks using Sentry Logs and Tracing. In this video, we walk through identifying a slow screen, jumping into the connected trace, and pinpointing slow backend steps, database calls, and AI/LLM operations. See how logs, issues, and traces work together to show the full picture of what happened in a single session.

Expose Hidden State Bugs with Sentry Logs

See how Sentry Logs can surface hidden state bugs that stack traces alone can’t explain. In this walkthrough, we debug a React Native app with an Express.js backend where a missing diet value causes a crash. We inspect the issue, pull in the connected logs, and confirm whether the problem comes from an initial render or from real backend data. By combining issues, traces, and logs from the same session, you get the full story—and a faster path to the fix.

Prioritizing Bugs with Sentry Logs

Learn how to use Sentry Logs to measure how often a bug occurs and which users it impacts. In this example, a React Native app with an Express.js backend crashes when the diet value becomes undefined. After identifying the root cause, we use Explore Logs to count how many times users switch their diet to “none,” filter the related log messages, and group results by user type to understand the impact.

Bindplane | Notifications

Real-time alerts for your telemetry pipelines are here. In this quick overview, you’ll learn about the new Notifications panel in Bindplane. This update gives you real-time visibility into key changes across your configurations, fleets, and agents so nothing slips through the cracks. You’ll see how Notifications helps you stay ahead of: This new feature centralizes alerts you’d otherwise miss — making Bindplane easier to operate at scale. Email, Slack, and webhook notifications are also on the way.

Bindplane in 12 Minutes: A Complete Overview of the Telemetry Pipeline for OpenTelemetry at Scale

Bindplane is a unified telemetry pipeline that helps teams cut observability spend by 50% or more. In this overview, you will learn how to route telemetry from any source to any destination, manage large fleets of OpenTelemetry Collectors, and gain real visibility into collector health, state, throughput, and routing behavior. 

Elastic's move to free on-demand training

Students can now learn what they need within the Elastic stack anytime. The Elastic Training team has shifted its on-demand training strategy from paid to free! Yes, you heard that right — complimentary on-demand training is now readily available to everyone. The Elastic Training team is continuously developing and releasing bite-sized training modules designed to align with Elastic solutions and highlight key features.

Ep 22: re:Invent recap

In this episode of Masters of Data, we're breaking down AWS re:Invent 2025 through David's eyes (and probably a few cups of conference coffee). We dive into the massive crowds, killer customer conversations, and product demos that actually worked—because we're all about building real tech, not smoke-and-mirrors clickbait. David geeks out over Mobot, our AI tool that's making workflows smoother (not just another chatbot in disguise), and how attendees couldn't get enough of the live demos. We also throw some shade at the AI-washing epidemic and dig into why practical AI applications in security and observability actually matter.

Why FedRAMP In Process Matters for Federal Customers

Chris Ebley from Blackwood explains why FedRAMP In Process is a major milestone. It gives federal teams confidence that the product can handle sensitive data, meets strict security controls, and comes from a company committed to operating at the maturity level the government expects. This opens new go to market opportunities and makes it easier for agencies to move forward with Cribl.

Why Cribl Lake Delivers the Best Price Performance for AI Workloads #ai #telemetry

CMO Abby Strong explains how Cribl Lake is built for the real demands of modern AI. You get fast storage for high performance workloads and efficient architecture that scales without blowing up your budget. A smarter foundation for the AI era.

Coralogix in G2 Winter 2026: Momentum, Progress, and 192 Badges

As we wrap up 2025 and slowly come down from the re:Invent high, we’ve got one more reason to keep the celebration going. Coralogix has earned 192 badges in the G2 Winter 2026 reports and secured the position in the Momentum Grid Report for Observability Software. It is a strong finish to the year and a clear reflection of the steady progress the platform has been making.

Why should you demand OpAMP support from your vendor?

Fleet management is the practice of monitoring and configuring your fleet of agents and collectors. Key functionality includes: Fleet management is the hallmark of an organisation that has realised the great importance of a healthy telemetry pipeline, and has taken steps to ensure that collectors & agents are every bit as robust as the production architecture for which they are responsible.

Bindplane Onboarding | Install Your First OTel Collector & Send Windows Events to Google SecOps

In this 10-minute step-by-step walkthrough, Chelsea from the Bindplane Customer Success team shows you how to install your first Bindplane OpenTelemetry Collector and start sending Windows Event telemetry from a Windows VM directly into Google SecOps.

Bindplane in 200 Seconds: Windows Event Logs & Google SecOps

Learn how to configure Bindplane to collect and route Windows Event Logs from a Windows VM into Google SecOps. In this 200 second onboarding walkthrough, Chelsea shows how to build and configure a full SecOps-ready pipeline in just a few minutes. You’ll see how to: Create a Configuration Add the Windows Event Log source Configure the Google SecOps destination Roll out the configuration to an agent running on a Windows VM Start receiving security telemetry inside SecOps.

Using Traces, Metrics, and Logs All in One Place, as Demonstrated by Pipeline Builder

When troubleshooting complex software, it’s important to be able to gain insight via its telemetry quickly and precisely. No one wants to waste time switching between tools or worrying about how to interact with different types of data. At Honeycomb, all your data is available in one place, accessible via our fast query engine. But what does that look like in practice?

AI Agents Need Structured Telemetry. Are You Preparing? #telemetry #ai

Clint Sharp breaks down the shift from traditional observability to AI ready telemetry. Agents need well formed fields, consistent schemas, and predictable data models. If your environment is full of unstructured logs, agents will give inconsistent answers. The work starts now so your AI future can actually deliver value later.

AI Is Growing Your Data Faster Than Your Budget #telemetry #ai

Clint Sharp explains why data is growing at a 30% CAGR while budgets stay flat. Teams are already running infrastructure at 80 to 90% capacity, and AI agents multiply query volume by ten or fifty. What got you to 2025 will not get you to 2035. You need a new approach to handle AI scale without blowing up cost.

Understanding How a Log Correlation Engine Enables Real-Time Insights

Tax season is notoriously most people’s least favorite time of year. For people who complete their own tax returns, the process becomes an agonizing one of looking at small pieces of paper, matching numbers to the lines that ask for information, and comparing various inputs. In essence, doing your taxes makes you a correlation engine. Now, imagine taking this tedious process and applying it to the terabytes of data that your environment generates daily.

Use Database Monitoring in Splunk Observability Cloud to Identify and Resolve Slow Queries

In this video, I introduce Database Monitoring in Splunk Observability Cloud. I'll demonstrate how to spot and resolve slow queries by leveraging rich metrics and correlating database performance directly with traces in Splunk Observability Cloud APM. TOC.

Cribl and Cloudflare give you full network visibility with real time telemetry

Glenn Block explains how the new Cloudflare source and R2 destination in Cribl Stream lets you ingest WAF, DNS, and Zero Trust logs for full visibility and real time intelligence. Better security, better performance, and lower cost for modern IT and security teams.

Ep 20: re:Invent FOMO? Dojo AI demo

Not heading to re:Invent this week? Don't worry—we've got you covered. In this episode, we welcome Architect Solutions Engineer, Jake Lee, to preview the exciting new Sumo Logic tools we are showcasing in Vegas. Our new SOC analyst agent acts as an AI partner that instantly assesses incident severity and recommends next steps—no more drowning in alerts. The MCP server breaks down barriers by letting you query Sumo Logic from Slack or integrate security insights directly into your IDE.

Why AI Will Push #Telemetry Budgets to the Breaking Point in 2026

Telemetry growth is about to hit a new level in 2026. Nick Heudecker from Cribl walks through our new predictions report and explains why observability costs are set to surge again, with more than a third of enterprises spending at least 15 % of their IT budgets on telemetry alone. He also shares how agentic AI adds new risk to the data pipeline, why most AI workloads will struggle to scale, and how platform shifts and market forces will reshape the data landscape.

#AI Powered Data Protection Inside Cribl Guard

Cribl Guard uses an always running AI agent to spot sensitive data as it moves through your environment and recommend the right protections in real time. In this demo, you will see how the agent samples live events, identifies patterns like credentials and credit cards, and turns them into one click fixes that keep your destinations safe. Faster detection, smarter rule recommendations, and instant mitigation. This is what modern data protection looks like.

New agents in the Dojo: Expanded Sumo Logic Dojo AI

Back in September, we unveiled Sumo Logic Dojo AI, our agentic AI platform built to power intelligent security operations and incident response. With that launch, we introduced Mobot, our conversational interface, as well as our first agents designed to help automate routine tasks, streamline investigations, and give security teams the freedom and ability to focus on analyzing the highest value security issues facing their organization. Today, we’re excited to share the latest additions to Dojo AI.