Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

6 Reasons Your Data Lake Isn't Working Out

Since the data lake concept emerged more than a decade ago, data lakes have been pitched as the solution to many of the woes surrounding traditional data management solutions, like databases and data warehouses. Data lakes, we have been told, are more scalable, better able to accommodate widely varying types of data, cheaper to build and so on. Much of that is true, at least theoretically. In reality, the way many organizations go about building data lakes leads to a variety of problems that undercut the value their data lakes are supposed to deliver.

Detecting Anomalous Spans at Scale with DataPrime

Tracing is one of the most transformative gifts of observability. It allows engineers to follow a single request through a distributed system and see every span and dependency along the way. However, even with that visibility, some of our most basic questions stay unanswered. Why did a specific span behave differently today than it did yesterday? Why did latency rise even when nothing “broke”?

Introducing Dataspaces & Datasets

Observability data has a habit of outgrowing everything else. As telemetry volume, variety, and velocity increases, staying organized gets harder. Governance becomes messy, and the cost of digging through “everything” keeps rising. Over the past year, Coralogix’s DataPrime engine has been addressing these challenges by laying a new foundation for observability at scale.

Stop Leaking PII in Your #Telemetry with Cribl Guard

Sensitive data sneaks into destinations more often than teams realize. In this clip, we capture live events, spot emails and login tokens slipping through, and fix it instantly with Cribl Guard. A few clicks, a commit and deploy, and Guard redacts the data in real time. No complex configs. No regex nightmares. Just fast protection that keeps your telemetry clean and your security tight.

Pastries with SREs: Holding onto extra observability data and desserts

In this episode of Pastries with SREs, we dig into why you should keep all of your observability data, even if you don’t need it quite yet. We explore: With enriched logs and flexible, cost-effective storage, you can stop worrying about what you might need later and start answering questions with confidence, no matter when they arise. Additional resources.

New features: Introducing Metrics Usage and Query Usage analyzers

As teams grow and telemetry scales, it becomes harder to keep track of which metrics matter. Labels pile up, cardinality increases, and costs start rising faster than anyone expected. At the same time, dashboards often stay quiet and alerts go untouched. The truth is, most teams don’t actually know how and how much of their metric data is being used, let alone which metrics are driving cost. This is exactly the problem we set out to solve.

In the age of AI, measurement becomes our superpower

The last few years have felt less like a product roadmap and more like a scene from science fiction. Artificial intelligence didn’t simply arrive, it erupted. In what feels like a blink, we’re building software by prompting instead of programming. Our words now generate code, compose music, translate languages, and create entire digital experiences.

How continuous profiling cut our cloud spend

At Coralogix, we’re constantly looking to evolve the measurements we take to better understand the efficiency of our infrastructure. We constantly assess and investigate sources of cost in our cloud infrastructure, to ensure we’re getting the best return on investment. This activity, often referred to as FinOps, is becoming a cornerstone of engineering teams.

Mezmo + Catchpoint deliver observability SREs can rely on

For SREs juggling multiple services, third-party dependencies, and constant alerts, a critical service slowdown can quickly turn into chaos. APM Dashboards may show everything is fine, yet users are still experiencing problems. That gap—between application telemetry and real-world performance—can turn a five-minute fix into a two-hour war room. ‍

Pastries with SREs: Enriched logs and filled donuts

In this episode of Pastries and SREs, we take a sweet dive into one of the most exciting evolutions in observability: enriched logs, also known as wide events. Gone are the days of toggling between tools and stitching together logs, metrics, and traces. Enriched logs consolidate the context, providing everything you need to understand and resolve issues in a single log entry. We explore.

How to Reduce Log Data Costs Without Losing Important Signals

You can cut your log costs by removing repetitive, low-value logs early and keeping only the parts that genuinely help you understand issues. Modern systems generate logs far faster than you expect. Even when your workload stays stable, infrastructure components, retries, and background workers continue producing a steady stream of repeated entries.

Search Telemetry Without Limits in a Multi Cloud and AI World

Cribl Search gives you one lens across all your telemetry data no matter where it lives. Instead of forcing teams to move data into one system or jump between tools, you get a familiar pipe based query experience with dashboarding and alerting built in. Storage and query processing stay separate so you decide where your data lives while your users get fast, simple access in one place.

Use OpenTelemetry with Observability Pipelines for vendor-neutral log collection and cost control

Today, many DevOps and security teams operate in a world of complex, hybrid, or multi-vendor environments. As more teams look to avoid lock-in by adopting open standards, OpenTelemetry (OTel) is quickly gaining adoption as the primary open source method for DevOps and security teams to instrument and aggregate their telemetry data. However, OTel alone may lack the advanced processing functions, native volume control rules, and hybrid environment support that large organizations need.

Architecture for the agentic era: How AI will reshape data, security, and observability

As AI agents move from copilots to autonomous systems, they’re generating and consuming data at unprecedented scale. The result is a new kind of infrastructure pressure — one that’s quietly reshaping how organizations think about data, cost, and control. Across IT, Security, and Observability, leaders are realizing a hard truth: too much data is too costly.

AI-Suggested Alert Thresholds for Mobile Telemetry

Life is pretty good. I’ve shipped a mobile app and I’m (happily) drowning in telemetry. Battery impact, time in foreground/background per screen, crash rates, slow frames, network retries – the works. The data is brilliant; the challenge is turning signals into reliable alerts that catch real issues which are relevant to my app’s functions. So… what should I actually listen for, and where should I set the thresholds?

Why Gaining Control of Your #telemetry Data Is a Game Changer

Disconnected pipelines. Unknown data sources. Costs that do not add up. Many teams struggle to answer a simple question. What data do we have and where is it going? In this clip, a Cribl customer explains how bringing all telemetry data together changed everything. With Cribl, their team can finally see what they collect, where it flows, and what it costs. That clarity unlocked smarter reduction, better routing decisions, and major optimization across security and observability workflows.

From Data Lake to Lakehouse. Why Cribl is Preparing for the Agentic #ai Era #telemetry

Customers asked for a simpler way to store and access telemetry data, and Cribl delivered. First came Cribl Lake. Cost effective data storage, flexible access, and identity based authorization instead of infrastructure based access rules. A simple way to retain data at rest and run slow, inexpensive analytics when needed. But the story did not end there.

KubeCon North America 2025: OpenTelemetry Recap from Atlanta

KubeCon + CloudNativeCon North America 2025 wrapped up in Atlanta last week, and it sure did feel like a big one for OpenTelemetry. Between Observability Day, the project updates, and the activity around the OpenTelemetry Observatory booth, you could feel how quickly the ecosystem is maturing.

Pastries with SREs: FinOps is to ROI as a coffee is to cannoli

In this episode of Pastries and SREs, our hosts tackle one of the hardest questions observability leaders face: "How do you prove the ROI of observability?" This isn’t just about uptime or dashboards. It’s also about aligning observability with business outcomes, cloud cost savings, and FinOps metrics that matter to leadership.

#observability needs more than tools. It needs the right data.

Good observability starts with good data. In this clip, we hear how Cribl gives teams real control over their data pipelines so they can collect, enrich, and route telemetry from any source to the right destination. It is not just about more dashboards or another platform. It is about building an observability ecosystem that connects IT, security, and the business through cleaner data and smarter AIOps. Tool rationalization and AI driven pipelines are not future goals. They are happening right now.

Ep 18: AI has a memory problem, just like you do

In this episode of Masters of Data, we dive into how AI learns, examining both how we teach it and what it derives from human performance, as well as why context plays a crucial role in AI interactions. We break down five key components of AI training and talk about why we should view AI as a tool under human control rather than an autonomous entity. We explore the challenge of maintaining context in AI—much like our own memory struggles—and discuss methods, such as retrieval-augmented generation, that can help AI retain context more effectively.

Better together: Cribl and Microsoft Fabric just got radically simpler

In September, I wrote about how Cribl and Microsoft Fabric Real-Time Intelligence provide a powerful combination, unlocking new analytics capabilities for security and IT teams. I also said there was more to come… Today, Cribl is thrilled to announce a new Cribl Destination for Microsoft Fabric Real-Time Intelligence, marking another big step forward in our collaboration with Microsoft to make it much easier for Cribl customers to use Fabric.

How to Speed Up Incident Response With Guided Remediation

Most teams picture incident response as a linear sprint from alert to resolution. A notification appears, an analyst pivots across screens, a decision gets made, and the workflow moves on. It works, but it is mechanical, tiring, and fragile. Graylog 7.0 aims for something more impactful. Guided remediation gives analysts clarity during the moments when pressure rises and context usually scatters. It takes raw detection data and turns it into a clear path forward. No theatrics.

Elasticsearch: The context engine for grounding and orchestration in Microsoft Azure AI Foundry Agent Service

The rise of large language models (LLMs) and agentic applications promises to transform enterprise workflows. Yet, the core challenge remains: How do we ensure these powerful agents generate accurate, relevant, and trustworthy responses based on proprietary enterprise data rather than relying solely on their generic training knowledge? The answer lies in grounding — connecting the LLM to verified, trusted, and up-to-date information.

How to pair Grafana Drilldown with Loki for faster logging insights

Our logs can tell us so much about the state of our systems, but they can also be a bit overwhelming. Yes, Grafana Loki—and, by extension, Grafana Cloud Logs, which is powered by Loki—reimagined the way log aggregation systems could meet modern engineering demands, but logs, by their very nature, are still voluminous.

Mezmo's AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)

We are thrilled to announce the availability of Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)—a truly transformative leap forward for engineering and operations teams included in your existing subscription at no additional charge. We are paving the way for a new era of observability, moving beyond passive, reactive monitoring to a world of proactive AI-driven observability.

What Is a Data Pipeline

In today’s tech world, IT and security technologies are the functional equivalent of Pokemon. To gain the insights you need, you “gotta catch ‘em all” by ingesting, correlating, and analyzing as much security data as possible. Data pipelines organize chaotic information flows into structured streams, ensuring that data is reliable, processed, and ready for use.

MachineGPT: Speaking the Language of Machines to Shape the Future of AI

At.conf25, we took a bold step forward—introducing the concept of MachineGPT, which brings the power of generative AI to one of the most overlooked resources: machine data. MachineGPT speaks the language of machines. Just like ChatGPT learned the grammar of words and sentences to understand questions and respond in human language, MachineGPT can learn the hidden “grammar” of how systems behave through machine data.

Elastic named a Leader in the IDC MarketScape: Worldwide Observability Platforms 2025 Vendor Assessment

We're proud to share that Elastic has been named a Leader in the IDC MarketScape: Worldwide Observability Platforms 2025 Vendor Assessment (doc, November 2025). We believe this recognition validates our ongoing mission: to deliver an observability platform that is open, extensible, and AI-driven to power full-stack observability that unifies operational and business data at scale, allowing SRE teams to move from detect and resolve problems faster.

Customer panel: Transforming IT & security

In an era where telemetry data grows at a 28% compound rate while budgets remain flat, traditional IT and Security approaches are facing unprecedented pressure. Join our distinguished customer panel as they share their transformative journeys with Cribl's data engine solutions. Our panelists will discuss how Cribl's vendor-neutral portfolio has enabled them to regain control over their data infrastructure, achieving both immediate operational improvements and strategic long-term advantages.

Graylog MCP Integration: Real-Time LLM Access to Your Data

Graylog V7.0 supports integration with the Model Context Protocol (MCP), which allows large language models (LLMs) to access and interact with Graylog data and workflows in real time. Graylog exposes an MCP-compatible endpoint for LLM clients, such as Claude and LM Studio. MCP integration allows Graylog users to interact with their data through LLMs. With MCP, an LLM can connect directly to Graylog as a remote tool interface, performing queries, retrieving system information, and assisting with common administrative or investigative tasks. This capability may make it possible to.

Introducing the Splunk Technology Add on for Ollama Illuminating Shadow AI Deployments

Without strong visibility and governance, local LLMs risk replicating the fragmented, unsupervised sprawl once seen in shadow IT, complicating security postures and making it difficult for organizations to ensure proper oversight and compliance as these powerful AI tools become embedded in daily workflows. To address this challenge, The Splunk Threat Research Team has released the Splunk Technology Add-on for Ollama that provides comprehensive monitoring and observability capabilities specifically designed for local LLM deployments.

Pastries with SREs: No compromises on cost-effective observability or donuts.

In this episode of Pastries and SREs, we dig into how vendor lock-in and sky-high observability costs are forcing teams to choose between coverage and budget, AND why you shouldn’t have to settle. With donuts in hand, we explore how to take back control of your observability strategy by making it cost-effective, comprehensive, and flexible.

Investigating SIEM Incidents with Logz.io

A short demo showing how Logz.io, powered by the AI Agent, helps investigate security incidents by analyzing and correlating data. The AI Agent uses natural language to: Query and correlate SIEM questions with related logs Detect anomalies and highlight unusual activity Summarize findings to speed up root cause analysis Provide recommended actions This video demonstrates a practical SIEM use case for the AI Agent inside Logz.io.

Conquer Complexity, Accelerate Resolution with the AI Troubleshooting Agent in Splunk Observability Cloud

The digital landscape has transformed dramatically, and with it, the demands on our systems have grown exponentially. Traditional monitoring tools struggle to provide sufficient insight into complex, distributed, cloud-native environments. Observability is the answer, moving beyond merely knowing "what" is happening to understanding "why" it's happening, and its impact on user experience and business outcomes.

What is Active Telemetry

Active Telemetry is the evolution in how organizations collect, process, and use observability data. In traditional observability, telemetry is passive: systems emit logs, metrics, and traces that are stored and visualized after the fact. This model worked when systems were simpler and changes were predictable. But in today’s world with distributed microservices, Kubernetes, and AI workloads, passive telemetry can’t keep up. Active Telemetry changes that.

Use Grok parsing to extract fields from logs | Datadog Tips & Tricks

When your logs don’t follow a standard format, it can be difficult to extract valuable information, like key-value pairs and nested JSON objects. Grok parsing lets you define flexible patterns that match unstructured log data so you can extract specific fields to query, filter, and visualize. In this video, you’ll learn how to: By refining your Grok parsers, you can make your logs more useful for analytics, dashboards, or alerts, and get even more value from your logs.

What the RFC?! Making sense of syslog before you migrate

Syslog: it's everywhere, it’s ancient, and let’s be honest — it rarely shows up the way the RFC says it should. Before you cut over to Cribl Stream, it pays to understand exactly what you're dealing with and why it matters. In this talk, we’ll demystify the syslog format (yes, the actual RFC 3164 and 5424 stuff), look at what happens when data goes rogue, and explore how Cribl can help bring order to the chaos.

The Modern SOC: Transforming security operations with Al and automation

Security teams are dealing with massive data growth, siloed tools, and constant alert fatigue. All of this makes it harder to detect and respond to threats. AI has become a key part of the solution, but its effectiveness depends on having access to complete, high-quality data. In this session, Palo Alto Networks and Deloitte will explore how AI and automation are redefining the modern Security Operations Center (SOC). Learn how leading organizations are leveraging intelligent workflows, automated threat detection, and machine learning to accelerate response times, reduce analyst fatigue, and strengthen overall security posture.

SIEM Migration in 68 Days

In this session, we will discuss how the University of Pittsburgh was able to modernize their data processing strategy, migrate to a new SIEM solution, and avoid ballooning SIEM costs all within 68 days from the first install of a Cribl product. We will showcase how we were able to use Cribl's software to easily handle the following scenarios: 100% agent replacement and consolidation using Cribl Stream Workers and Edge.

Making Observability AI-Native with the Logz.io MCP Server

Now available: Secure, real-time access to your observability data via Logz.io’s Model Context Protocol (MCP) Server. The Logz.io MCP Server brings your logs, metrics, and telemetry data into the Model Context Protocol (MCP), an emerging open standard that lets AI systems query real data securely and contextually, in real time. That means any MCP-compatible LLM, like Claude Desktop, Cursor, your own AI agent… can now connect directly to your Logz.io environment.

How Smart Robots Work: AI Perception, Planning & Execution Explained

Imagine a future where machines not only perform physical tasks but also learn, adapt, and make intelligent decisions in dynamic environments. This future is rapidly becoming a reality with the advent of smart robots, poised to revolutionize industries from manufacturing to healthcare. In this article, we'll delve into smart robots: what makes these intelligent machines 'smart', how they perform tasks, and how they are reshaping the operational landscape.

Generation AI (Episode 5): How generative AI Is shaping the future of the marketing technology stack

Description: The next golden age of artificial intelligence has arrived, but the path forward is far from certain. Technology leaders are presented with a tremendous opportunity to revolutionize their business — that is, if they can find a way to tap into the full potential of their organization's data. In Episode 5 of Elastic's new limited series, Generation AI, marketing and IT leaders share how they believe AI will shape the future of marketing technology and workflows.

Unify Observability, Surface Business Impact, and Solve Problems Using AI Agents with Latest Splunk Observability Innovations

In September at.conf25, we announced how Splunk is shaping the future of digital resilience in the age of AI. Agentic AI is rewriting what it takes to build a leading observability practice. As vibe coding gains steam, applications will be built with less human involvement. At the same time, the rise of AI agents demands specialized telemetry to ensure models are performing as intended—aligned to their business purpose and cost.

Splunk Advances the OpenTelemetry Project with Its Latest Donation, the OpenTelemetry Injector

Splunk is very excited to be sponsoring Kubecon North America once again, kicking off this week in Atlanta, GA. As many know, Splunk is one of the top contributors to the OpenTelemetry project. We’re happy to have sent many of the Splunkers who serve as project maintainers and contributors to lead SIG meetings and engage with the greater community in the OpenTelemetry Observatory, sponsored by Splunk.

Building the Next Generation of Defenders: From the Classroom to the SOC of the Future

Singapore’s digital economy is growing at a remarkable pace, but with that growth comes a challenge: the nation is on track to need more than a million additional digitally skilled workers by 2026, particularly in cybersecurity, data, and AI. This is not just about filling jobs — it’s about ensuring the country’s long-term digital resilience.

Splunk Developer Program

A short video that introduces the Splunk Developer Program, highlights the end-to-end support and tooling it offers, and showcases how developers can build, test, and grow impactful apps with confidence. The video will follow the journey of a first-time app builder who discovers the program, uses its resources, and becomes an active, recognized contributor in the Splunk community.

Unlock Faster Incident Resolution with PagerDuty + Logz.io

Join us live as we demo how PagerDuty and Logz.io work together to supercharge your Root Cause Analysis. See how real-time observability and enriched incident context can help your team detect, triage, and resolve issues in minutes—not hours. Don’t miss this chance to see the integration in action, ask questions, and learn how to keep your teams in sync while driving continuous improvement. Perfect for anyone looking to level up their incident response!

Choosing the Right Load Balancing Approach for Your Network: Static, Dynamic, & Advanced Techniques

Load Balancing is the process of distributing network traffic among multiple server resources. The objective of load balancing is to optimize certain network operations. Ensuring that a workload is spread evenly among the computing resources, this “balanced load” improves application responsiveness and accommodates unexpected traffic spikes — all without compromising application performance. Let’s take a deeper look at this important networking function.

How to Use MetricFire Logging: Visualize Logs & Metrics Together in Grafana

Want full visibility into your systems? In this step-by-step tutorial, we show you how to use Grafana Loki with Promtail on Hosted Graphite by MetricFire to stream logs alongside your metrics. All visualized in Grafana dashboards. No more toggling between tools — get the full observability stack in one place.

Connecting the dots: Solving IT asset visibility with Dataprime

In large tech organizations, keeping track of every laptop, desktop, and endpoint is one of the IT department’s toughest challenges. Each device needs to be accounted for, properly assigned, and compliant with the organization’s policies, all while teams, offices, and contractors constantly change.

Pastries with SREs: From AIOps to GenAI and LLMs (lactose-free latte making)

In this episode of Pastries with SREs, we look at AIOps, where it fell short, where it worked, and how generative AI (GenAI) is reshaping what’s possible in observability today. We explore: If you’re wondering whether generative AI is different this time, this episode offers a grounded, practical look at how it’s evolving observability workflows.

What's New at Logz.io - October 2025

We’re expanding the Open 360 AI experience to more users with a modernized navigation and full access to Grafana and OSD dashboards. Your existing dashboards, alerts, bookmarks, and integrations remain unchanged, while new AI-powered capabilities provide deeper explanations and actionable insights. Existing customers can request early access through their account team.

The New Open 360 AI Experience

Experience the new Open 360 AI, built to help you explore, analyze, and act on your observability data in a smarter way. See how the AI Agent works directly inside dashboards to explain anomalies, summarize trends across your telemetry data, and guide you to root cause, without switching views or writing queries. Everything you know and love is still here, now enhanced with AI.

Make privacy compliance a competitive advantage with Cribl Guard

As Chief Legal Officer, I’ve personally navigated the complex, ever-shifting landscape where privacy compliance meets rapidly evolving technologies. Whether it’s the sweeping reach of a law protecting personal data in the EU, the specific demands of a law giving California residents more control over their personal information, or the critical protections of a law safeguarding sensitive patient health information in the U.S., one challenge remains.

Logs Are Your Data Platform: Dynamic, Queryable, S3Backed

Modern systems move fast. Features ship daily, user behavior shifts hourly, and risks surface in minutes. In that reality, logs are not just a troubleshooting artifact. They are your most expressive data source. Logs capture the words developers write to their future selves. They carry the full story of requests, users, experiments, errors, feature flags, and revenue events.

Why Simplicity Beats Sprawl in Modern IT

In enterprise boardrooms today, what was once an arms race to adopt more tools and chase every new capability has now crystallized into a single mandate, “Make the platform work harder without spending more.” The industry has reached a saturation point. The buyers who once greenlit expansions now demand efficiency. And the ones who built the stack? They’re rethinking it entirely. It’s no wonder platformization is taking off.

Gobbling Up Insights: Graylog 7.0 Serves Up a Feast

A feast of new features. A cornucopia of new capabilities. A banquet of breakthroughs (and the T-day puns are just getting started). Graylog 7.0 brings a full plate of advancements that help security teams cut through noise, control cloud costs, and respond with confidence. We’re serving practical improvements across dashboards, automation, and AI support so analysts can focus on action instead of manual effort.

Embracing failure and chaos to improve system reliability and SRE team performance

In this interview with Alex Hidalgo, Field CTO at Nobl9 and author of Implementing Service Level Objectives (O’Reilly Media), we explore how traditional metrics like MTTR and MTTx can give a false sense of reliability. Alex shares how SRE teams can embrace failure, build psychological safety, and design systems that reflect the human factor behind uptime, outages, and real-world reliability.

AWS & Splunk: Accelerating Innovation Through Partnership

Discover how AWS and Splunk are pushing the boundaries of innovation to empower your security, observability, and cloud transformation journey. This video highlights our joint commitment to driving digital resilience through unified visibility, faster threat detection, and seamless integration across AWS services.