Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

8 Challenges of Microservices and Serverless Log Management

As organizations increasingly adopt serverless architectures and embrace the benefits of microservices, managing logs in this dynamic environment presents unique challenges. In this blog, we're taking a closer look at the differences between serverless and traditional log management, as well as 8 challenges associated with log management for serverless microservices.

Tech Talk - Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Find out how Splunk Attack Analyzer can help you quickly and efficiently investigate potential malware and phishing incidents by automatically tracking each stage of complex attack chains and expediting your response efforts. Hear directly from Product Manager Aditya Raj as he demonstrates how to combine Splunk Attack Analyzer with Splunk Enterprise Security and Splunk SOAR for even greater threat detection and response power.

Store and search logs at petabyte scale in your own infrastructure with Datadog BYOC Logs

As AI workloads and cloud-native applications expand, organizations are generating more log data than ever. Each service, container, and model inference produces continuous telemetry that must be stored, secured, and analyzed. As telemetry grows more complex, teams must balance full visibility with new retention and residency needs.

From KubeCon EU to KubeCon NA: Bindplane's OpenTelemetry Contributions and Highlights (Mar-Oct 2025)

Bindplane engineers have stayed deeply involved in the OpenTelemetry community this summer. With KubeCon+CloudNativeCon North America in Atlanta coming up I wanted to dive into all the work that has been done and give the engineers a well deserved shoutout. Here’s what we built, fixed, and contributed since KubeCon+CloudNativeCon Europe in London this March.

Tech Talk - Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

In this Tech Talk, discover how they’re leveraging Splunk Infrastructure Monitoring (IM) to supercharge their Kubernetes operations, detect issues within minutes, and resolve them 90% faster — all while optimizing and scaling like pros.

Energy-Efficient Computing: How To Cut Costs and Scale Sustainably in 2026

With AI the centerpiece of technology and innovation today, energy efficient computing is quietly becoming one of the most urgent challenges. In this article, we will discuss what makes energy efficient computing relevant for your organization, especially when modern resource-intensive AI workloads play an important role in driving your business operations and services.

Deploying Loki on Kubernetes via Helm (Loki Community Call - October 2025)

This Loki Community Call is about deploying Loki on Kubernetes via Helm charts. We talk about why you might want to use Helm to deploy on Kubernetes, best practices for deployment, and which Helm chart you should use! We are Jay Clifford and Nicole van der Hoeven, Developer Advocates at Grafana Labs, and we have invited Grafana Champion and Loki Helm Maintainer Jan-Otto Kröpke, Principal Cloud Architect at QualityOperations GmbH, to talk about the state of the Loki Helm Chart.

Sumo Logic Dojo AI overview

Stop the firefighting and get instant answers with Sumo Logic Dojo AI. When a security incident hits, you risk losing money and time as you wait for investigations and troubleshooting. Discover how Dojo AI agents simplify investigations by surfacing potential threats, providing actionable insights, and guiding you to the root cause faster using natural language.

Sumo Logic Academy - Training and Certification Overview

In 2025, Sumo Logic revamped its education and certification program, introducing industry-aligned assessments, digital badges and many free training offerings, including industry leading free instructor led classrooms and interactive hands-on labs. This video walks through all Sumo Logic Academy program offerings.

Introducing the Splunk Technology Add-on for Ollama: Illuminating Shadow AI Deployments

Without strong visibility and governance, local LLMs risk replicating the fragmented, unsupervised sprawl once seen in shadow IT, complicating security postures and making it difficult for organizations to ensure proper oversight and compliance as these powerful AI tools become embedded in daily workflows. To address this challenge, The Splunk Threat Research Team has released the Splunk Technology Add-on for Ollama that provides comprehensive monitoring and observability capabilities specifically designed for local LLM deployments.

Sliding Through Log-Time Space

This post kicks off a new series written by the Graylog Development Team. In these updates, we’ll highlight the features and fixes that make daily work in Graylog smoother. We want to show the work we care so much about and present the challenges we faced and overcame. Today, we’re starting with one of those minor but functional enhancements: Graylog time-range stepping.

How Generative AI is shaping the future of enterprise applications

The next golden age of artificial intelligence has arrived, but the path forward is far from certain. Technology leaders are presented with a tremendous opportunity to revolutionize their business — that is, if they can find a way to tap into the full potential of their organization's data. In Episode 4 of Elastic's new limited series, Generation AI, Elastic's Sr. Director, Enterprise Applications, Jay Shah, shares how he believes generative AI will shape the future of enterprise applications.

Artificial Intelligence as a Service AIaaS (AIaaS): What is Cloud AI & How Does it Work?

Today, organizations looking to build AI products and services using large language models (LLMs), agentic AI, and generative AI often start by investing in artificial intelligence as a service (AIaaS), also known as cloud AI. AIaaS provides a scalable, flexible, and cost-effective way for businesses of all sizes to access advanced AI technologies without the need for extensive in-house expertise or infrastructure.

Transform and Migrate Logs with Datadog Custom Processor

See how Datadog’s new Custom Processor in Observability Pipelines helps you transform and migrate logs from platforms like Splunk and Sumo Logic with precision and control. This demo walks through real examples of using VRL (Vector Remap Language) to enrich log data, rewrite timestamps, apply quotas, and securely process archives.

Streams: Elastic's New AI That Turns Log Chaos into Clarity

Elastic just made every SRE’s life easier. With the new Elastic Streams, AI automatically organizes, structures, and analyzes billions of logs, helping you find issues, detect anomalies, and fix problems in minutes, not hours. See how Elastic’s deep generative AI core turns chaos into clarity for Site Reliability Engineers and developers worldwide.

Don't count integrations, count dashboards and alerts

Vendors often compete by saying how many extensions or quick start packs they have. The implicit promise is: more integrations equals better observability. But that misses the point. What really matters is the quality and coverage of dashboards and alerts that you actually use to maintain system health, prevent outages and improve user experience. At Coralogix we believe that what you do with integrations is far more important than how many you have.

Meet Olly - The Coralogix AI Observability Agent (Demo)

Olly is Coralogix’s AI-native observability agent that makes observability data fast, accessible, and actionable—for everyone. Traditionally, teams have spent valuable time piecing together dashboards and writing queries to troubleshoot issues. Olly changes that by letting you ask real questions in natural language and delivering instant, intelligent answers from across your logs, metrics, and traces.

Why Your APM Needs Observability - Metrics, Logs, and Traces Explained

Modern software applications are increasingly complex. Microservices, cloud infrastructure, and distributed architectures make it challenging for developers, DevOps engineers, and SREs to maintain high performance and a seamless user experience. Traditional Application Performance Monitoring (APM) provides critical insights into how applications perform, but alone, it often leaves blind spots when it comes to diagnosing issues or understanding the full system behavior.

Clarity in the Dojo: The power of the Summary Agent

In the dojo, not every role is about throwing punches. Some roles are about awareness, the unmistakable voice that tells the fighter when to move, where the strike is coming from, and why the opponent matters. That’s the role of the Summary Agent in Sumo Logic Dojo AI. Unlike a traditional agent, it doesn’t launch queries or carry out actions on its own. Its purpose is to narrate, not act. In doing so, it becomes the foundation for every other decision in the dojo.

5 Log Management Best Practices for Your Organization

At Logz.io, we speak with hundreds of companies every month. One thing is consistent across the board: everyone ships logs. But the challenges are equally common: What are the best practices for logging? How do we reduce noise? How should we architect our logs to make them truly useful? The reality is that logs are noisy for everyone. The best time to standardize your logging practices is when you write your first line of code—though that rarely happens. The second-best time is now.

Elastic recognized as a finalist for Innovation in Customer Portals in 2025 TSIA STAR Awards

We are proud to announce that Elastic has been named a finalist by the Technology & Service Industry Association (TSIA) in the 2025 STAR Awards program for Innovation in Customer Portals that Improve Digital Customer Experience. This award recognizes Elastic’s ability to embrace AI innovations to enhance our digital customer experience.

Bridging partners in pursuit of agentic AI - Part 2: How leaders can position themselves for the future

From ecosystem foundations to future advantage In Part 1: Why partnerships matter for enterprise intelligence, we explored how enterprises are moving from experimentation to scalable impact with agentic AI and how ecosystems make that possible. But naturally, the next question is: Where do we go from here?

RED Metrics & Monitoring: Using Rate, Errors, and Duration

The RED method is a streamlined approach for monitoring microservices and other request-driven applications, focusing on three critical metrics: Rate, Errors, and Duration. Originating from the principles established by Google's "Four Golden Signals," the RED monitoring framework offers a pragmatic and user-centric perspective on service assurance and service performance.

Bridging partners in pursuit of agentic AI - Part 1: Why partnerships matter for enterprise intelligence

The pace of change in AI development has been dizzying. In just a few years, we’ve moved from experimenting with AI, machine learning (ML), retrieval augmented generation (RAG), and agents to asking how these innovations can solve real business problems. Enterprises are no longer impressed by the novelty and possibilities; instead, they expect outcomes.

Making logs work smarter: Evolving your observability strategy

When you start building an observability stack, it’s natural to reach for logs first. They’re familiar, easy to generate, and often already part of a developer’s workflow. And sending logs to a centralized system feels like a quick win, too. Simply add a log shipper, and voila, your application is observable.

10 Best Log Monitoring Tools

Log monitoring stands as the backbone of resilient, secure, and high-performing digital operations. Every digital service, application, cloud platform, and network device leaves behind a trail of log files, containing raw, unstructured data that chronicles system events, user actions, errors, security activities, and business transactions. For organizations striving to achieve operational excellence, these logs are more than archives; they're the heartbeat of every mission-critical system.

From court to code: Build an agentic RAG assistant with Elasticsearch

Want to see what it really takes to build a smart AI assistant? How about one that can help you make the right fantasy basketball picks? In this live session, we’ll demonstrate how to instantly activate and ground a high-performance AI agent using the Elastic Agent Builder, and we’ll show how it powers real-world use cases like smarter player picks. Join JD Armada, developer advocate, for a 20-minute live coding session to learn about.

Troubleshoot Faster with the New Log Search and Filtering in Qovery Observe

Following the launch of Qovery Observe, we’re progressively adding new capabilities to help you better monitor, debug, and understand your applications. Today, we’re excited to announce a major improvement to the Logs experience: you can now search and filter directly within your application logs.

ISP Monitoring Explained: How to Measure, Manage, and Improve Internet Performance

Reliable internet connectivity isn’t a convenience. It’s mission-critical infrastructure for modern organizations. Every organization today depends on high-speed, reliable internet access for daily operations—from cloud collaboration and data transfer to streaming, remote work, and customer engagement. As digital transformation accelerates, the rise of AI, large language models (LLMs), IoT, and device sprawl has massively increased bandwidth demand and network complexity.

25 Sumo Logic updates to better monitor and secure your Azure environments

If you manage workloads across multiple clouds, you know how easy it is for critical alerts or performance issues to get lost in the noise. Switching between consoles, correlating logs, and tracking metrics across platforms can slow down troubleshooting, delaying incident resolution and increasing risk of missing critical alerts.

CriblCon 25 Keynote Livestream

IT and security data professionals stand at a crossroads. The practices and technologies that have served you for the last ten years are at their breaking point, facing an onslaught of data growth and complexity that will only accelerate as AI goes mainstream. You have a choice. Stay earthbound or take your telemetry to the stratosphere and beyond.

Monitor logs from Amazon EKS on Fargate with Datadog

Amazon EKS on Fargate is a managed service that reduces the operational overhead of maintaining a Kubernetes cluster by abstracting away the underlying infrastructure. In a serverless Fargate environment, each pod is assigned its own isolated compute resources; there is no direct host-level access.

AI-First: Agentic AI needs a new architecture

At Cribl, we’ve talked a lot about epochs. A moment in time when there was a before and after. AI, and specifically agentic AI, is an epoch. The way we work is going to forever change. There have been many such events in our lifetimes: the PC, the Internet, and the smartphone. AI will change how we work forever. Prior to the PC, there were people whose jobs were literally titled “computer”.

Introducing Cribl Notebooks: One Tab For Your Entire Investigation

Investigations move fast. Data is messy. And today’s analysts are expected to connect the dots across massive datasets and various tools—while documenting every step and sharing results with stakeholders. What does that look like? A security investigation may involve 10 or more queries—each one filtering, transforming, and analyzing data from a different angle—duplicated across multiple browser tabs so nothing gets lost.

Introducing Cribl Insights: A central hub for monitoring and alerts

What happens when your data pipelines slow down, drop volume, or quietly change shape? Most monitoring tools won’t catch those shifts until it’s too late—when downstream systems are already impacted, dashboards are broken, or critical information is missing. That’s why we’re excited to introduce Cribl Insights, to give you real-time visibility into every part of your Cribl environment: data flows, operations, processing, user activity, configuration changes, and more.

Introducing Cribl Notebooks: Investigate, Visualize, and Share - All in One Tab

Run every part of an investigation in one workspace with Cribl Search’s new Notebooks feature. Bring queries, visualizations, and annotations together to make sharing and collaboration easier. Speed up investigations and turn complex workflows into narratives anyone can follow.

From Idea to Deployment: How To Build a Practical AI Roadmap

AI is being adopted at a faster rate than ever across the business world. According to Stanford, 78% of organizations had implemented AI in some form by 2024. And if that’s not convincing enough, 92% of companies plan to expand their AI investment over the next three years. Practically everyone, including your competitors, is already using AI to gain a competitive edge. If you don’t act soon, there's a real risk of falling behind.

Application Observability Done Right: Best Practices & Tips

Companies invest millions of dollars in observability platforms, yet they often still struggle to get application monitoring right. This is because most organizations focus on the technology, while neglecting the business. In this article, we’ll show you how to combine business requirements with technological needs. As the CTO of Logz.io, these are based on my experience working with global companies on their application observability needs.

Big Week at Logz.io: Major Product Announcements Signal New Era of AI-First Observability

Four months ago, we announced our vision of AI-first observability. Today, we’re not just talking about the future, we’re shipping it. This week marks a significant milestone with several major product announcements that demonstrate our continued momentum as the industry’s leading AI-first observability platform.

Micro Lesson: Sumo Logic Dojo AI Summary Agent

In this video, we'll introduce the new AI powered Summary Agent to help security teams using Cloud SIEM understand and prioritize cybersecurity insights in a faster and more efficient manner. The summary agent provides AI generated summaries of the component signals within an insight, giving analysts a clear view of the underlying evidence without having to spend time reviewing raw logs or multiple events individually. The summary agent is part of Sumo Logic's new Dojo AI platform, featuring a number of useful AI agents across all Sumo Logic products and services.

Mobile session replay - now live in Coralogix

Coralogix Real User Monitoring (RUM) already gives teams a complete view of how users experience their websites. Now, that same visibility comes to mobile. With Session Replay for iOS and Android, you can watch real sessions unfold and understand exactly what users saw and did, without relying on vague support tickets or incomplete crash logs. Session replay captures exactly how users interact with your mobile app: taps, swipes, scrolls, and screen transitions.

Top 9 LLM Observability Tools in 2025

Organizations are adding GenAI to their current and future architectures and product roadmaps, requiring Ops teams to ensure LLMs are accurate, fast, secure and cost-efficient. LLM observability tools directly addresses these needs, helping identify and prevent common LLM errors and issues: LLM observability provides the telemetry data for this analysis. LLM observability tools trace requests end-to-end, evaluate outputs, and correlate quality with latency, cost, prompts, tools, and data sources.

3 real-world generative AI strategies for executives

Everyone is excited about AI, but few companies have successfully implemented it. While enthusiasm for generative AI (GenAI) has helped accelerate AI adoption across enterprises, the promises of artificial intelligence have yet to translate into measurable impact on most organizations’ bottom lines. The trouble isn’t the tech — it’s a lack of executive ownership.

Agentic AI Explained: How Autonomous Systems Are Changing Cybersecurity

Discover how agentic AI enhances cybersecurity by augmenting security teams’ existing security tools and workflows. See how Retrieval-Augmented Generation (RAG) enables faster threat detection, streamlined investigations, and smarter incident response — empowering SOC teams to work more effectively. Join cybersecurity experts Lisa Jones-Huff and Mohammed Anas Khatri to discover how agentic AI can help your security team multiply its impact.

Redis Performance Monitoring: Combine Logs and Metrics for Complete Visibility

Redis earns its place in modern stacks because it’s an in-memory data store with microsecond latency and rich data structures, making it perfect for things like caching, sessions, and rate limiting. Since it often sits on the request path, small issues (connection churn, blocked commands, memory pressure) can quickly ripple into user-visible incidents.

Ep 13: Everyone is winging it: Hope for an AI future

In this episode, we welcome Naomi Buckwalter, Sr. Director of Product Security at Contrast Security, to chat about the evolving landscape of security threats and the dual role of AI in both facilitating and combating these challenges. We explore the increasing sophistication of modern phishing attacks and discuss how security teams must rapidly adapt to stay ahead of emerging threats. We debate the transformative impact of AI on the future job market, where personal qualities and soft skills may increasingly take precedence over traditional technical competencies.

Datadog vs Splunk: A Side-by-Side Comparison [2025]

Datadog and Splunk are both leading tools for monitoring and observability. Each offers a range of features designed to help you understand and manage your data. Datadog provides tools for tracking application performance and analyzing logs in real-time. Splunk, meanwhile, is known for its powerful log analysis and search capabilities. In this post, we will compare Datadog and Splunk on important aspects like APM, log management, search capabilities, and more.

LLM Observability Explained: Prevent Hallucinations, Manage Drift, Control Costs

Large Language Models (LLMs) are transforming how businesses interact with users, automate workflows, and deliver insights in real time. But as powerful as these models are, running them at scale comes with unique challenges, from hallucinations and latency spikes to cost overruns and user trust issues.

Pastries with SREs: Leveling up observability and donut dunkability

In this episode of Pastries with SREs, we explore what it really means to shift left with observability, moving from reactive firefighting to proactive performance. And yes, it starts with donuts. We unpack how SREs and IT Ops teams are often stuck reacting to incidents, battling alert fatigue and swivel-chair triaging. But what if you could pull in developers earlier, and give everyone a unified view of observability data?

Elastic named a Leader in The Forrester Wave: Cognitive Search Platforms, Q4 2025

Today, we’re excited to share that Elastic has been named a Leader in The Forrester Wave: Cognitive Search Platforms, Q4 2025. We believe this recognizes our continued innovation in AI-powered search and the momentum of the Elasticsearch Platform.

How to know your data with Cribl's Ed Bailey and VisiCore Technology's Paul Stout.

Classifying and tagging data is the key to automating pipelines and improving visibility across the enterprise. We’ll share both the technical and business impact of truly knowing your data, and why Cribl makes it possible. Plus, we’ll talk CriblCon and why we’re excited to see you there.