Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

What Do DevOps Professionals Really Mean When They Talk About Kubernetes (K8s)?

In the world of DevOps, Kubernetes (K8s) is more than just a tool for managing containers-it's the backbone of modern infrastructure. When DevOps teams mention Kubernetes, they're referencing its vast capabilities, which extend far beyond basic container orchestration. They're talking about its ability to manage scaling, automation, networking, and security across complex, distributed systems. In this article, we'll explore what DevOps pros really mean when they discuss Kubernetes, highlighting the core features that make it a cornerstone of the DevOps ecosystem.

Observability to Generative AI: Journey in Evolving IT Operations

For those of us managing the ever-evolving IT infrastructure, the days of simple cause-and-effect relationships are long gone. A performance dip in one application might affect microservices, destabilizing the systems. Alerts – flood in, logs – pile up, and even the most sophisticated monitoring dashboards often leave asking: Where do we even begin?

Ingesting JSON Logs From Containers With the OpenTelemetry Collector

It’s very popular to push logs, in a formatted way, to the console output of an application (sometimes referred to as stdout). Although using a push-based approach like OTLP over gRPC/HTTP is preferred and has more benefits, there are many legacy systems that still use this approach. These systems typically use a JSON output for their logs. So, how do we get these JSON logs into a backend analysis system like Honeycomb that primarily accepts OTLP data?

Black Friday Without the Developer Nightmares: A Survival Guide

Black Friday, the traditional kickoff to the holiday shopping season, is set to make waves in 2024 with projected sales reaching an impressive $10.8 billion—a 9.9% increase from last year according to Statistics.blackfriday analysts. According to the same team, Cyber Monday sales in 2024 are expected to reach $13.2 billion—a 6.1% increase from 2023. Both events in sum are expected to generate $24 billion in sales.

OneFootball Scores an Observability Goal with Honeycomb

For football fans worldwide, staying connected to their favorite teams, players, and matches is a passion—and OneFootball delivers exactly that. The platform is a one-stop shop for football fans to follow their teams, get up-to-date information, and immerse themselves in global football culture. With over 100 million users spanning multiple continents, OneFootball is an essential companion for fans to track live scores, player stats, breaking news, and more.

OpenSearch vs Elasticsearch: Complete Platform Comparison [2024]

Choosing between OpenSearch and Elasticsearch in 2024 represents a critical decision for organizations seeking robust search and analytics solutions. Both platforms offer comprehensive capabilities, but their approaches differ significantly. This in-depth comparison will help you make an informed decision based on your specific needs.

Pioneering the Future of Observability with AI

In September, Lumigo announced we were exploring how AI can help shape the next generation of observability. Since then, we’ve unveiled the beta of Lumigo Copilot, which we believe will be the most intelligent AI in observability. Today, we’re providing an update on our progress and inviting our customers to participate in the beta.

Unlock Unmatched Insights: Introducing the deepest Hybrid Infrastructure Observability Platform

If you want to ensure that your infrastructure is resilient by modern standards, you need to have a deep understanding of how processes and technology impact your business. Learn how Virtana unlocks unmatched insights into your infrastructure through our Virtana AI Platform for deep hybrid infrastructure observability, to give you the deep understanding you need for a resilient infrastructure.

How to Attain Deep Network Device Coverage with SolarWinds Observability SaaS

Welcome to the first in a series of blog posts that will walk you through the key network monitoring and observability capabilities of the SolarWinds Observability SaaS option. Simplicity has always been at the heart of our product ethos, and our recent decision to bring our self-hosted and SaaS observability options under the single umbrella of SolarWinds Observability embodies this ethos.

Beyond Monitoring: A Guide to Cloud Observability

Many businesses rely on cloud infrastructure to power their software solutions. The cloud today makes it easier than ever to create services and components, increasingly the complexity of software. With more and often smaller processes, cloud-native architectures have driven the need for better insights into our software—a way to look into how these processes fit together.

Unlocking Peak Performance with Kentik's Azure Network Observability Tools

In today’s multi-cloud landscape, maintaining smooth and reliable connectivity requires complete visibility into cloud networks. With Kentik, network and cloud engineers gain the tools to monitor, visualize, and optimize Azure traffic flows, from ExpressRoute circuits to application performance, ensuring efficient and proactive operations.

Early Observability in Platform Engineering: Challenges and Solutions

Since the emergence of the cloud, the DevOps movement, and the rise of microservices, developers have been increasingly responsible for the operation of their software. “You build it, you run it” (YBYR) and “You build it, you operate it” (YBYO) have become common mantras in the software engineering industry. However, there’s a misunderstanding in this statement. Developers should remain focused on building software.

What is Single Pane of Glass? A Complete Guide to Unified IT Management

Ever felt overwhelmed juggling multiple monitoring tools and dashboards? You're not alone. Today's IT environments are more complex than ever, and keeping track of everything can feel like watching a dozen TV screens simultaneously. That's where Single Pane of Glass comes in – it's like having a universal remote for your entire IT infrastructure.

Leveling up your observability practice - Part 2

Lessons from the front lines: Challenges in your observability maturity journey In our previous blog, we explored the observability maturity spectrum — revealing that while only 7% of organizations consider themselves experts, the majority (43%) are actively working to improve their practices. We saw how mature organizations achieve better outcomes, from faster root cause analysis to reduced user-reported incidents.

Adding AI to Observability 2.0 for Dynamic Observability

The original premise of observability was to ensure system health, identify issues, and resolve those issues efficiently. As I recently outlined, the legacy approach (sometimes called Observability 1.0 now) relied heavily on metrics and tracing because logs were seen as too noisy or challenging. But, as most forward thinkers have identified now, logs are exactly the telemetry type that we need the most.

An easier way to manage your observability collectors | Grafana

Managing observability collectors at scale is often overwhelming, but it doesn’t have to be. Grafana Fleet Management offers a better way to monitor, configure, and control your collectors—all from a centralized platform. With remote configuration and detailed health insights, you can quickly resolve issues, save time, and reduce manual effort.

Emergency Observability with Coroot

If you’re an experienced engineer, you likely have comprehensive observability and monitoring set up for your production systems. So if issues arise, you’re empowered to resolve them quickly. Yet, there are way too many systems out there, especially smaller and simpler ones, which are running with only rudimentary observability systems, or no observability at all. This means when an application goes down or starts to perform poorly, it may be very hard to pinpoint and resolve the issue.

Leveling up your observability practice - Part 1

Lessons from the front lines: Moving to observability maturity What separates the observability experts from the novices? It's a question that's been on my mind lately, especially after diving into our recent 2024 State of Observability Survey of over 500 practitioners. In my past roles as a DevOps engineer and a site reliability engineer (SRE), I've seen firsthand how a mature observability practice can be the difference between sleepless nights and smooth sailing.

Easily control observability collectors at scale with Fleet Management in Grafana Cloud

Managing observability workloads can quickly overwhelm even the most experienced admin. Maybe you’re dealing with multiple departments, each needing its own collector configurations and pipelines. Every time you have to run a test or roll out a change, the process is cumbersome and introduces risk. Or perhaps you’re responsible for tracking hundreds of collectors across different environments and regions. In a scenario like this, troubleshooting individual issues feels nearly impossible.

Splunk's Path Towards Achieving FedRAMP Moderate Authorization for Splunk Observability

Splunk continues to partner with government agencies on their digital transformation journeys to help deliver their missions and provide faster and more intelligent services. We are committed to the success and support of the security requirements of our public sector customers, and I am thrilled to share the latest strategic investments Splunk is making to expand our FedRAMP program to include Splunk Observability Cloud for government customers.

The new era of observability - why logs are the key to success

The promise of observability has always been clear: ensure system health, quickly identify and resolve issues efficiently. However, traditional observability, broken into metrics, logs, and traces, is cumbersome and fragmented, leading to higher costs and developer burnout.

The Schrödinger's Cat Challenge of Observing Cloud-Native Applications

The Schrödinger's Cat thought experiment highlights the paradox of determining a system's state without direct observation—an apt analogy for the challenges of observing cloud-native applications. These systems' complex, ephemeral, and distributed nature often makes them appear as black boxes. Coupled with the operational complexities of multi-cloud and hybrid environments, gaining a clear picture feels impossible.

There Is Only One Key Difference Between Observability 1.0 and 2.0

We’ve been talking about observability 2.0 a lot lately; what it means for telemetry and instrumentation, its practices and sociotechnical implications, and the dramatically different shape of its cost model. With all of these details swimming about, I’m afraid we’re already starting to lose sight of what matters.

Why Deep Observability is the Key to Infrastructure Success in 2024 and Beyond

In today’s digital economy, infrastructure has evolved from your organization’s technical foundation to a strategic asset that can make or break your business outcomes. Yet, as companies embrace hybrid environments, many find themselves struggling with a critical challenge: how to maintain control and visibility across increasingly complex infrastructure landscapes and AI workloads.

Observability for Modern IT : eG Enterprise

Discover how eG Enterprise provides enterprise-class observability to enhance IT operations and ensure optimal digital experiences. With end-to-end monitoring, diagnosis, reporting, and analytics across physical, virtual, cloud, and hybrid environments, eG Enterprise proactively detects and resolves performance issues to keep applications running smoothly. By ensuring uptime and delivering actionable insights, it helps organizations maximize the value of their IT investments, achieve superior ROI, and deliver reliable, high-performing services.

New Relic vs Datadog: Complete Platform Comparison [2024]

Choosing between Datadog and New Relic in 2024 represents a critical decision for organizations seeking robust monitoring and observability solutions. Both platforms offer comprehensive capabilities, but their approaches differ significantly. This in-depth comparison will help you make an informed decision based on your specific needs.

Scaling Observability for Dexory's Global Fleet of Autonomous Robots with Grafana Cloud:

Join Dexory's VP of Software, Matt MacLeod, as he explains how Dexory, a Grafana Labs customer, uses Grafana Cloud to monitor and manage a global fleet of autonomous robots for warehouse inventory. Hear about Dexory's journey from prototype to scalable observability solution, the challenges of high-frequency data collection in harsh environments, and how advanced monitoring enables proactive issue resolution. Discover Dexory’s insights on improving customer satisfaction and lowering operational costs through effective observability practices.

Application Experience - Amplifying Observability for Today's Experiential World

Nexthink’s industry-leading Experience 24 Events in Boston and London brought together over 1000 IT professionals dedicated to accelerating the value of strategic adoption of DEX. A hot topic was the growing recognition that traditional Observability solutions (APM’s and other high-cardinality technology monitoring solutions) while necessary, are insufficient to solve the full, end-to-end visibility of how employees are experiencing the totality of their web applications.

Integrating Gremlin with your observability tools

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. To get the most value out of Chaos Engineering and reliability testing, you need a way to observe your service’s behavior. Observability tools offer insight into how your systems are performing, but observability on its own isn’t enough. You need a way to monitor your systems while testing their reliability so you can determine whether your service passed or failed a test.

Close Your Hybrid IT Observability Gap SolarWinds Observability SaaS or Self Hosted Solutions

As technology evolves, organizations face the challenge of modernizing their infrastructure to improve efficiency, reduce costs, and meet customer demands. But with hybrid IT setups—combining on-premises data centers, multiple cloud instances, and SaaS applications—most observability tools force compromises. The result? Gaps that impact performance, slow down issue resolution, hurt customer satisfaction, and reduce ROI.

Get complete Kubernetes observability by monitoring your CRDs with Datadog Container Monitoring

Custom resources are critical components in Kubernetes production environments. They enable users to tailor Kubernetes resources to their specific applications or infrastructure needs, automate processes through operators, simplify the management of complex applications, and integrate with non-native applications such as Kafka and Elasticsearch.

How Generative AI Can Prevent Downtime with AI-Powered Observability

Generative AI (GenAI) is still in its infancy, but its impact is already being felt across industries. Over the past year, production applications leveraging GenAI have gone from proof-of-concept to delivering real-world value. According to the World Economic Forum, 75% of surveyed companies plan to adopt AI technologies by 2027. Leading cloud providers like AWS are making significant investments.

A Journey Towards Observability-led Aiops

Bring composite visibility with logs, metrics, events, and traces to find and fix issues faster. Observability is no longer a choice but an essential component of future IT infrastructure. Enterprise IT operations are transforming from a traditional, siloed, and people-first approach to a technology-first approach, leveraging AI/ML and automation. This whitepaper touches upon: Download this paper and find out how you can have better visibility and insights into distributed application systems.

140x cheaper than Datadog: why storing observability data on-prem makes sense

I’ve heard this story many times from production engineers: ‘We use tools like Datadog and NewRelic, but to keep costs from skyrocketing, we’re only monitoring our most critical services. We’re storing just 10% of our logs and traces and only the metrics we consider essential. It’s a frustrating situation. Engineers want full visibility across their systems, but cloud storage costs make it impossible to monitor everything.

Application Performance Monitoring (APM) Guide for DevOps Teams in 2024

In today's rapidly evolving technology landscape, Application Performance Monitoring (APM) has become a critical component for DevOps teams striving to maintain high-performing, reliable applications. This comprehensive guide explores everything modern DevOps teams need to know about implementing and optimizing their APM strategy.

Kentik Named a Value Leader in EMA's 2024 Radar Report for Network Operations Observability

We are excited to share that Kentik has been named a Value Leader in EMA’s 2024 Radar Report for Network Operations Observability. This recognition highlights our continued commitment to building an AI-powered, end-to-end observability platform for modern networks, helping network and cloud teams optimize their infrastructures for availability, performance, cost-efficiency, and security.

Observability 2.0: Don't repeat sins of the past

If you are moving in the observability circles, chances are that you have heard the phrase “Observability 2.0,” which refers to how we need a new approach to observability. I am incredibly excited about the energy and discussion around a shift to “Observability 2.0,” as we now have a second chance to develop observability the way it was originally envisioned.

A Taste of Observability - Embrace the Cloud With OpenTelemetry

Join Splunk Observability expert Kirk O'Quinn and Monster CICD Lead Graham Bucknell for a conversation on OpenTelemetry (OTel), a powerful open-source project that is transforming how we monitor and trace applications. In this informative session, we will delve into the world of Otel, exploring its history, its roadmap and we will discuss lessons, and success/failures of “Companies” journey to OpenTelemetry.

Tracing the Line: Understanding Logs vs. Traces

In the software space, we spend a lot of time defining the terminology that describes our roles, implementations, and ways of working. These terms help us share fundamental concepts that improve our software and let us better manage our software solutions. To optimize your software solutions and help you implement system observability, this blog post will share the key differences between logs vs traces.

Top 12 SolarWinds Competitors and Alternatives In 2024

Organizations exploring SolarWinds alternatives often face a critical decision when choosing the right network and infrastructure monitoring solution. While SolarWinds has established itself as a reliable industry standard, companies are increasingly seeking alternatives that offer better alignment with their monitoring needs, budget constraints, and security requirements.

AI Observability with Grafana with Ishan Jain (Grafana Office Hours #29)

In this Grafana Office Hours, Ishan Jain talks about AI Observability with Grafana: what it entails, factors to consider when monitoring and observing LLMs, and how to do it all with Grafana. He is joined by Senior Developer Advocate Nicole van der Hoeven. LINKS.

LLM Monitoring and Observability

The demand for LLM is rapidly increasing—it’s estimated that there will be 750 million apps using LLMs by 2025. As a result, the need for LLM observability and monitoring tools is also rising. In this blog, we’ll dive into what LLM monitoring and observability are, why they’re both crucial and how we can track various metrics to ensure our model isn’t just working but thriving.

SolarWinds Observability Self Hosted 2024.4 Expanded Device Support and Enhanced Wireless Monitoring

Discover the latest features in SolarWinds version 2024.4! This update brings support for a variety of new network devices, including Fortinet SD WAN, Ruckus, Juniper, Arista, and Extreme Networks wireless access points, plus Meraki switch support via API integration. Join Crystal Taylor, SolarWinds Evangelist, as she takes you through the new wireless monitoring capabilities and shows how your network management just got easier. Watch now to optimize your network oversight and stay ahead with these powerful enhancements!

SolarWinds Observability Self Hosted 2024.4: New Cloud Monitoring for Azure and AWS Databases!

Explore the powerful new features in SolarWinds version 2024.4, now supporting expanded cloud monitoring capabilities! Crystal Taylor, SolarWinds Evangelist, walks you through the latest updates, including Azure Managed Instance, Azure MySQL, Azure PostgreSQL, and Amazon RDS for SQL Server. See firsthand how PostgreSQL and RDS instances are monitored, showcasing detailed charts and metrics like Log IOs, physical data reads, and memory usage. Upgrade now to take full advantage of these new insights and optimize your cloud database performance.

Against Incident Severities and in Favor of Incident Types

About a year ago, Honeycomb kicked off an internal experiment to structure how we do incident response. We looked at the usual severity-based approach (usually using a SEV scale), but decided to adopt an approach based on types, aiming to better play the role of quick definitions for multiple departments put together. This post is a short report on our experience doing it.

Observability as a superpower

With every job I have, I come across a new observability tool that I can’t live without. It’s also something that’s a superpower for us at incident.io: we often detect bugs faster than our customers can report them to us. A couple of jobs ago, that was Prometheus. In my previous job, it was the fact that we retained all of our logs for 30 days, and had them available to search using the Elastic stack (back then, the ELK stack: Elasticsearch, Logstash, and Kibana).

Network Observability: Mastering Infrastructure Data for Smarter IT

If you want to know exactly what’s on your network and how it’s all connected in real time, then network observability is the answer. Network observability pulls data from sources across your network infrastructure to model a detailed view of your systems and how they interact. This lets you understand exactly what’s happening on your network at any given moment so you can optimize performance.

Booking.com's Observability Overhaul: Unified Metrics, Logs, and User Insights | Grafana & OTel

Murugesan and Ahmadali from Booking.com's Observability Team as they dive into the journey of modernizing observability. Discover how they transformed fragmented systems into a centralized, scalable platform using OpenTelemetry and Grafana solutions. They share insights on their three-year strategy, the importance of unified metrics and logs, and overcoming challenges, from technology transitions to fostering teamwork.