Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

5 Multi-cloud Data Management Best Practices You Should Follow

A multi-cloud approach helps organizations avoid vendor lock-in, leverage the best available technologies, and reduce costs - but it can also result in added complexity when it comes to centralizing, securing, and analyzing data from cloud applications and services. This blog highlights 5 multi-cloud data management best practices that can help you make the most of your data in multi-cloud environments.

OpenTelemetry Distributed Tracing Implementation Guide

Distributed tracing has become essential for understanding the performance and behavior of modern microservices architectures. As applications become more complex with multiple services communicating across different environments, traditional logging and metrics alone are insufficient for debugging performance issues and understanding request flows.

Vector Databases Explained: What they are & Why they Matter [Quick Question Ep. 2]

Ever wondered what a vector database is and why it’s becoming so important in AI search? In this quick video, I’ll break down what a vector database is, how it works, and what you should consider when choosing one. About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

Confessions of a CTO: How we Tamed our Cloud Costs

If you’ve ever found yourself staring at a cloud bill that could buy a small island or at least a very nice car, you're not alone. Believe me, at Cribl, we've had our share of those "molotov cocktail" bills that make our CFO, Zach, look like he's about to spontaneously combust. And yeah, a few F-bombs might have dropped from various senior leaders (myself included, I won't lie).

Azure native integration elevates Elastic Cloud Serverless experience

We're thrilled to announce a significant leap forward in making Elastic Cloud Serverless even more accessible and powerful for Azure users. With the general availability (GA) of Elastic Cloud Serverless on Azure, we've just released the Azure native integration for Elastic Cloud Serverless. This builds upon our existing Azure native integration for Elastic Cloud Hosted, allowing users to seamlessly discover and manage Elastic Cloud in a way that feels inherently part of the Azure ecosystem.

RUM Versions: one click deployment tracking

Deployments should drive your product forward, not slow you down. Yet too often, teams spend hours digging through logs, dashboards, and error reports just to answer a simple question: did the release go smoothly? Coralogix’s new Versions feature answers this in a single click, letting teams spend more time building and less time investigating.

How we're killing YAML fatigue with our new K8s integration process

Kubernetes has rapidly grown in adoption, with more than 84% of surveyed users evaluating or actively using Kubernetes in some way. It has become the go-to container orchestration deployment. As we grow the Coralogix platform, we continuously go back and improve flows that we believe will have a high impact on our user base.

Splunk Expands Data Management Capabilities To Include Ingest Monitoring

Managing data ingestion at scale is no easy task. As organizations onboard hundreds or even thousands of data sources into the Splunk platform for security, observability, and other business-critical use cases, it becomes increasingly complex to ensure data is consistently available and onboarded efficiently.

From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace

It is 12PM and you just start eating lunch when your phone starts buzzing. A storm of different monitoring and system-level alerts start stacking up on your phone and slack. The incident response "war room" opens and downtime communications are being drafted to customers. Your team is under pressure to find the root cause, but you are immediately hit with roadblocks.

Why Your Loki Metrics Are Disappearing (And How to Fix It)

Grafana Loki is up and running, log ingestion looks healthy, and dashboards are rendering without issues. But when you query logs from a few weeks ago, the data's missing. This is a recurring problem for many teams using Loki in production: while the system handles short-term log visibility well, it often lacks the retention guarantees developers expect for historical analysis and incident review.

Coralogix secures 188 badges in G2 Summer 2025 Reports

As we cruise through 2025 with momentum from our recent $115M Series E raise, the launch of Olly (our AI agent for observability), and our recognition as a Visionary in Gartner’s Magic Quadrant for Observability Platforms, we’re excited to celebrate another major milestone – earning 188 badges in the G2 Summer 2025 reports! At the heart of every G2 badge we earn is the voice of our customers, and their continued trust is what drives us forward.

13 Best Log Analysis Tools of 2025. Top Paid, Free & Open-Source Log Analyzers Reviewed

Log analysis and management tools have become essential in troubleshooting. With log analyzers you can extract meaningful data from logs to pinpoint the root cause of any app or system error, and find trends and patterns to help guide your business decisions, investigations, and security. If you’re not already using such a tool, now is the time to start looking for one.

How to Build Resilient Telemetry Pipelines with the OpenTelemetry Collector: High Availability and Gateway Architecture

Let’s bring that back. Today you’ll learn how to configure high availability for the OpenTelemetry Collector so you don’t lose telemetry during node failures, rolling upgrades, or traffic spikes. The guide covers both Docker and Kubernetes samples with hands-on demos of configs. But first, let’s lay some groundwork.

Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility

Dynatrace is a powerhouse for application performance monitoring and business analytics. But for many organizations, its power comes with a significant challenge: as applications scale across complex hybrid environments and diverse tech stacks, the sheer volume and variety of logs, metrics, and traces sent to the platform can explode, leading to staggering and unpredictable costs.

AI-Driven Alert Correlation with EventiQ in Splunk ITSI

In this video, we introduce EventiQ in Splunk ITSI, a powerful AI-driven solution designed to cut through the noise and help you find the root cause of issues faster. We’ll show you how EventiQ automatically analyzes and groups related alerts into actionable episodes, significantly reducing alert volume. We’ll cover how to enable EventiQ for a Notable Event Aggregation Policy and review the resulting episodes that it creates.

How to build an advanced semantic search engine with hybrid search | Elasticsearch Coding Sessions

Get ready to say 'Hasta la vista, baby' to outdated search methods as we take a closer look at semantic search, using a data set of some all-time favorite sci-fi and horror movies! Join Ugo Sangiorgi, principal product marketing engineer, for a 20-minute coding session to learn about: Key Highlights: Resources: If you’re looking to add AI-driven search to your app, product, or website, this session is for you. Engage with us in the chat, share your thoughts, and feel free to ask questions. Let's dive into the world of hybrid search with Elasticsearch!

Zero instrumentation distributed tracing is here: Meet OBI on Open Telemetry

Modern systems generate enormous amounts of telemetry. The hurdle is collecting clean, connected traces without rewriting code or babysitting a fleet of language agents. That’s why Coralogix backed eBPF from the start. eBPF (extended Berkeley Packet Filter) executes sandboxed programs inside the Linux kernel, without modifying kernel source code. This method allows probes to see every request, at runtime with no instrumentation, and with near zero per‑request overhead.

How to Create Playwright Scripts for Website Monitoring with Chrome, ChatGPT & Sematext

Let’s say you want to make sure your website works as expected. You do not want to check if it just loads. You also want to check if important buttons or features are there and working. Oh, and you don’t want to just do it once. You want to keep an eye on this pretty much all the time. And, of course, you don’t want to keep checking manually if anything broke – you want to be notified, alerted when (not if) things break. You can do this by creating a Browser Monitor.

Bringing GitLab Logs into Focus with Graylog

GitLab’s audit logs offer a goldmine of insights into user activity, project changes, and security events. Getting that data into Graylog for centralized analysis is easier than you might think—especially with the flexibility of our Raw HTTP input and Illuminate’s GitLab Spotlight Pack. In this two-part guide, we’ll walk you through how to get it done, from wiring up GitLab’s Audit Event Streaming to visualizing enriched events in a purpose-built dashboard.

Architecting for Value: A Playbook for Sustainable Observability

You’ve built something amazing. Your services are scaling, your users are happy, and your team is shipping code like never before. Then the cloud bill arrives, and one line item makes your eyes water: observability. That Datadog invoice feels less like a utility bill and more like a ransom note. It’s a modern engineering paradox. The tools that give you sight into your complex systems are the same ones that can blind you with runaway costs.

What Is AI Search? How It Improves Your Search Results [Quick Question Ep. 1]

In this brief episode, I explore how AI is revolutionizing the way we search. From understanding user intent to personalizing results, ranking, summarizing, and even multimodal search, AI is transforming every aspect of search relevance and performance. If you’ve ever wondered whether you really need all this context in search, the short answer is: absolutely. Watch to learn how AI search works and why it matters more than ever in the age of artificial intelligence.

How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines

Platform teams are struggling with observability noise, bloated storage costs, and lack of clarity during incidents. Most teams capture everything all the time, leading to expensive, overwhelming, and often unnecessary data volumes. In Telemetry for Modern Apps, Mezmo teamed up with Checkly to demonstrate how synthetic monitoring triggers and responsive telemetry pipelines can help reduce costs while maintaining the context needed during incidents.

Six platform updates giving you time back in your day

Ever look at your to-do list at the end of the day and realize it’s grown longer, not shorter? We get it—there’s always more to do and never enough time. But if you’re a Sumo Logic user, reading this blog will be a win for your day because we’re giving you six ways to slash the time you spend on tasks in your platform.

From Sequential Bottlenecks to Concurrent Performance: Optimizing Log Processing at Scale

We optimized log processing pipeline by moving from sequential to concurrent processing at the entry level, achieving 30% higher throughput and better resource utilization without increasing infrastructure costs. When customers start sending millions of logs per minute, you quickly discover whether your processing pipeline can actually scale with vertical scaling.

IT Service Performance Monitoring: Key Metrics, Best Practices, and Future Trends

As organizations rely more on complex IT systems and cloud-based services, keeping everything running smoothly — and reliably — has become a top priority. That’s where IT service performance monitoring comes in, giving teams the visibility they need to make sure systems stay healthy and responsive. By tracking a range of technical and user-focused metrics, businesses can quickly identify and address issues before they impact operations or end users.

The AI Monitoring crisis that no one's talking about

When I spoke at AWS London earlier this year, I had the chance to discuss something that more and more teams are starting to feel: traditional observability doesn’t cut it for AI systems. In AI, “Is it running?” is no longer enough. We have to ask, “Is it right?” When I delivered that line, I saw the heads nodding. Everyone’s excited to build with LLMs, but when it comes to actually monitoring them in production? That’s where things fall apart.

Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo

Following the popularity of our existing GitHub integration, we’ve extended similar capabilities to GitLab users. You can now ingest GitLab events directly into Mezmo Telemetry Pipelines and route them to any destination. This provides a powerful new way to monitor, alert, and react to activity within your GitLab repositories.

Query and Analyze Logs Visually, Without Writing LogQL

It’s 2 AM. An incident’s in progress. Error rates are climbing. You jump into the logs, filter by service, adjust the time window… and now you need a LogQL query. You write one. It errors out. You fix the syntax, try again, only to realize you need a different filter or a new aggregation. Back to rewriting. By the time you’ve got the query right, you’ve already lost 10–15 minutes. The system is still broken, and you still don’t know why.

Elasticsearch is a recommended vector database in the NVIDIA Enterprise AI Factory validated design

Elastic now integrates with the NVIDIA Enterprise AI Factory validated design to provide users with a recommended vector database for their on-premises AI Factories. The validated design provides enterprises with a framework for building and deploying AI Factories on-premises.

MCP Server on Splunk Cloud Platform Demo

Discover the future of data interaction! This video introduces the Model Context Protocol (MCP) server on Splunk Cloud Platform, a groundbreaking capability that seamlessly connects your Splunk data with advanced AI models (LLMs). Learn how to leverage natural language to query, analyze, and manage your Splunk environment without complex SPL. In this comprehensive setup and configuration guide, we'll walk you through.

How Payconiq Centralized Monitoring and Enabled Real-Time Insights with Elastic

Yannick Boulleys, Head of Platform at Payconiq, shares how Elastic helped the company consolidate fragmented monitoring tools into a single platform. With real-time user monitoring, built-in anomaly detection, and GenAI-powered root cause analysis, Elastic has transformed how Payconiq manages system visibility, consumer behavior, and cost efficiency, without requiring deep technical expertise.

Kibana Logs: Advanced Query Patterns and Visualization Techniques

Kibana gives you a structured way to explore log data indexed in Elasticsearch. With the right queries and visualizations, you can identify anomalies, debug issues more quickly, and track trends across services. This blog covers practical ways to query logs using Kibana’s Lucene and KQL syntax, build visualizations that surface meaningful signals, and set up dashboards for ongoing log-based monitoring.

Build Log Automation with Last9's Query API

Manual log investigation is one of those engineering tasks that quietly drains hours without offering much real value. You're debugging an incident. Monitoring shows elevated error rates. Now begins the familiar drill: It’s a tedious cycle, and it doesn’t scale. The whole process breaks down when you’re trying to automate incident response, run continuous security monitoring, or generate compliance reports.

How to Troubleshoot Outages Faster Using Elastic Observability [2 Min Live Demo]

In this video, I’ll show you how Elastic Observability helps you reduce downtime, accelerate root cause analysis, and unify logs, metrics, and traces in one powerful dashboard. With native OpenTelemetry support, AI-powered troubleshooting, and built-in anomaly detection, you can streamline your workflows and boost service reliability.

Splunk Named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

We are proud to announce that Splunk has been named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms for the third year in a row. In our opinion, our recognition in the Observability category comes on the heels of Splunk being recognized for a tenth consecutive time as a Leader in the 2024 Gartner Magic Quadrant for Security Information and Event Management (SIEM). Splunk was the only vendor named a Leader in both SIEM and Observability for the Gartner Magic Quadrant three times.

Introducing Coralogix's MCP Server: Helping customers build smarter AI agents

Now available: Secure, real-time access to your observability data via Coralogix’s Model Context Protocol (MCP) Server. AI agents are only as powerful as the context they’re given. Today, we’re excited to announce the launch of the Coralogix MCP Server, which enables third-party AI agents to connect directly to your observability data across production, staging, and other environments.

Observability as Code: Why You Should You Use OaC

Key takeaways In the fast-moving world of CI/CD pipelines, microservice architectures, and container orchestration, software changes rapidly. What exists in a codebase today might be gone next week. At this scale and speed, it’s impossible for development teams to manually track every line of code and every new piece of functionality.

Elastic named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

Observability has an investigation problem, and dashboards and alerts aren’t enough for solving problems in today’s complex systems. AI-driven capabilities, powerful analytics, and the ability to scale are essential to drive real-time investigations while keeping costs low. We think this is why Elastic has been named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms for the second time.

Cloud Log Management: A Developer's Guide to Scalable Observability

As systems move to microservices, serverless, and multi-cloud setups, debugging gets harder. You’re no longer dealing with a single log file; you’re looking at logs from dozens of services, running across different environments. Traditional debugging methods like SSH-ing into servers or adding print statements don’t scale in these environments. Cloud log management tools help by collecting logs from all your services into one place.

The Inconvenient Truth About AI Ethics in Observability

Let's be honest: most conversations about AI ethics sound like they're happening in a boardroom, not an ops room. But here's the thing, when you're using AI to make sense of your telemetry data, ethics isn't some abstract concept. It's the difference between insights you can trust and algorithmic noise that leads you down the wrong path. The uncomfortable reality? Your AI is only as ethical as the messiest, most biased piece of telemetry data you feed it. And if you think your data is clean, well...

Coralogix | Magic Quadrant 2025

Today marks an exciting moment for all of us at Coralogix. We’re proud to share that Gartner has named us a Visionary in the 2025 Magic Quadrant for Observability Platforms. This recognition, we believe, reflects what we’ve been building toward for years: an observability platform that delivers scale, cost-efficiency, AI-powered insights, and tangible customer success.

How to turn logs into metrics with Grafana Loki (Loki Community Call July 2025)

Cyril Tovena shows us how to turn logs into metrics with Grafana Loki using metric queries in LogQL. What do you do when all you have are logs, but you want to count them, aggregate them, or parse them for numbers you want to graph? Well, there's a query for that! Cyril is joined by Jay Clifford and Nicole van der Hoeven to discuss everything you need to know about metric queries and how to use them to get numbers out of Loki.

Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)

‍ We're not witnessing the end of observability, we're witnessing its evolution into something far more powerful. The observability industry is having its Moneyball moment. Just like Billy Beane revolutionized baseball by using data analytics to compete with teams that had vastly larger budgets, observability is undergoing a fundamental transformation.

How to Get Logs from Docker Containers

When a container misbehaves, logs are the first place to look. Whether you're debugging a crash, tracking API errors, or verifying app behavior—docker logs gives you direct access to what's happening inside. This blog covers the full workflow: how to retrieve logs, filter them by time or service, and set up logging for production environments.

See System Logs Alongside your Metrics Using Loki, Grafana, and Graphite

In this quick demo, we show how you can transform logs collected by Grafana Loki into actionable Graphite metrics using MetricFire. Watch as we convert structured logs into performance insights. Perfect for teams looking to bridge the gap between logging and monitoring. This workflow helps you move beyond basic log storage and turn raw logs into meaningful metrics for alerts, dashboards, and capacity planning.

How They Handle 44 Million Searches a Day...Without Breaking! | Rightmove and Elastic

Rightmove, the UK's number one property search, and buying and selling platform has trusted Elastic for more than 11 years. Hear Andrei Nicusan, Principal Engineer at Rightmove on why Elastic has been Rightmove's number one Search and Observability solution for more than a decade. And now with the move to Elastic Cloud and Google Cloud Platform, you can find out how Rightmove are taking advantage of reductions in their infrastructure overheads too!

Introducing MetricFire Logging: Visualize Logs Alongside Metrics

As modern infrastructure grows more dynamic and distributed, collecting logs alongside metrics becomes a critical part of any observability strategy. To make this easy and powerful, MetricFire now supports a direct logging pipeline using Grafana Loki. This allows you to forward system logs from your servers to Hosted Graphite's Loki backend and visualize them in your Hosted Grafana dashboards with full control over queries, filtering, and alerting.

Coralogix Expands AWS Partnership to Deliver AI-Driven Observability and Edge Threat Detection

Coralogix is proud to announce a new phase in its partnership with AWS through a Strategic Collaboration Agreement (SCA) focused on bringing AI-powered observability and security to the enterprise. At the heart of this collaboration is Amazon Bedrock, AWS’s managed service for foundation models.

APM best practices: Dos and don'ts guide for practitioners

Application performance management (APM) is the practice of regularly tracking, measuring, and analyzing the performance and availability of software applications. APM helps you get visibility into complex microservices environments, which can overwhelm site reliability engineering (SRE) teams. The generated insights create an optimal user experience and achieve desired business outcomes.

Introducing the Coralogix Operator for Kubernetes

As organizations begin to scale their observability strategy, point and click methods of management become increasingly unworkable. This is why Coralogix has now fully released the Coralogix Operator for Kubernetes. Kubernetes operators are control loops that allow users to declare their desired state in their Kubernetes clusters, and the operator is responsible for resolving this state.

Coralogix launches OpenAPI endpoints

Observability is about much more than dashboards and alerts. Extensible platforms that integrate into the user’s tech stack are fundamental parts of a great developer experience. This is why Coralogix has supported gRPC APIs for account management, data ingress & query, alert definition, dashboard creation, permissions management and more. Today, Coralogix adds a new integration, with the launch of OpenAPI endpoints for all existing functionality.

Logging in Docker Swarm: Visibility Across Distributed Services

Docker Swarm's logging model shifts from individual container logs to service-level aggregation. The docker service logs command batch-retrieves logs present at the time of execution, pulling data from all containers that belong to a service across your cluster. This approach gives you a unified view of distributed applications, but it comes with its patterns and considerations for effective observability.

Enhanced monitoring of Amazon EKS with Elastic add-on capabilities

Easily enable Elastic add-on within the Amazon EKS Console for streamlined monitoring and quick data onboarding. Amazon Elastic Kubernetes Service (EKS) makes running Kubernetes on AWS simple and scalable. But as your workloads grow, so does the need for robust monitoring and observability. Enter Elastic Agent, a powerful, unified way to collect logs, metrics, and security data from your EKS clusters, all managed through Elastic Fleet.