Operations | Monitoring | ITSM | DevOps | Cloud

April 2024

This Month in Datadog: Bits AI for Incident Management, KSPM, New Observability Pipelines, and more

Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. To learn more about Datadog and start a free 14-day trial, visit Cloud Monitoring as a Service | Datadog. This month, we put the Spotlight on Bits AI for Incident Management.

Simulation Theory, Observability, and Modern Software Practices

The 1981 book Simulacra and Simulation by Jean Baudrillard is widely read and cited within academic circles but also permeates popular culture, influencing films, literature, and art. His theories notably influenced the Wachowski siblings' The Matrix series, bringing some of his ideas into mainstream awareness.

How Dell ISG Consolidated Observability Tooling Without Losing Functionality | Grafana Customer

In this recorded session, Brian Murphy, Technical StafF SRE at Dell Technologies shares how his team, Dell ISG consolidated Observability tooling without losing functionality using Grafana Cloud. The ISG team own “Northstar Tooling” which consists of Artifactory, Github Enterprise, Jenkins, and more. They also They also manage the Internal Cloud and k8s clusters, as well as all the hardware and networking that goes with it.

Insights of an Observability Advocate: The Challenges and Rewards

At a recent SRE Meetup in Bangalore, we had the pleasure of meeting Akshay Deshpande. During our conversation, Akshay, who manages a Performance/Observability Engineering team at Smarsh discussed his passion for observability and his constant drive to improve the field. Smarsh helps companies gain valuable insights from their communication data, enabling them to proactively identify potential regulatory and reputational risks before they escalate.
Featured Post

The journey to observability delivers benefits for the entire the IT department

Across all industries, IT departments are moving from traditional application monitoring approaches towards full-stack observability. Rapid adoption of cloud native technologies has led to spiraling complexity and exposed the limitations of the Application Performance Management (APM) tools being deployed by IT teams.

A Product Manager's Insights from KubeCon + CloudNativeCon Europe 2024

I recently had the privilege of attending KubeCon + CloudNativeCon Europe 2024 in Paris. The conference, hosted by the Linux Foundation, marked the 10th anniversary of Kubernetes. Here are the key takeaways and highlights from the conference.

An overview of ManageEngine Site24x7: AI-powered observability platform for DevOps and IT operations

Here is a quick overview of ManageEngine Site24x7. The cloud-based platform’s broad capabilities help predict, analyze, and troubleshoot problems with end-user experience, applications, microservices, servers, containers, multi-cloud, and network infrastructure, all from a single console.

Observability for Everyone

What do you need to achieve observability? Who you ask and the role they hold will influence the answer, but the answer likely follows this pattern: “You only need is how you define observability.” I cannot disagree with this logic. A specific use case may only need a specific type of telemetry. Experience and expertise allow engineers to quickly answer questions about a system without expanding into adjacent data types.

Hybrid Observability for health and life sciences: Top 6 challenges and how monitoring can help

As the healthcare industry has introduced more complex IT infrastructure, it now faces many challenges as it strives to deliver high-quality services to patients. From adapting to remote work and telemedicine to resource constraints, healthcare organizations must continually adapt to new technologies. Some of the nascent technologies, like remote triage of patients, telemedicine, and IoT, have all seen an acceleration in innovation as the industry pivots to visit patients remotely.

How Lack of Knowledge Among Teams Impacts Observability

Without a doubt, you’ve heard about the persistent talent gap that has troubled the technology sector in recent years. It’s a problem that isn’t going away, plaguing everyone from engineering teams to IT security pros, and if you work in the industry today you’ve likely experienced it somewhere within your own teams. Despite major changes in the tech landscape, it is clear that organizations are still having significant difficulty keeping their technical talent in-house.

Mastering Observability with OpenSearch: A Comprehensive Guide

Observability is the ability to understand the internal workings of a system by measuring and tracking its external outputs. In technical terms, it entails collecting and examining data from numerous sources within a system to attain insights into its behavior, performance, and health. All organizations are now familiar with how essential observability is to ensure optimal performance and availability of their IT infrastructure.

Avoid Observability Failure

The public Internet is now a core component of every company’s digital architecture. Given its nature as a shared resource, the Internet is also the biggest variable in digital experience today. Therefore, application performance management solutions, which typically monitor application transactions and the cloud infrastructure that applications reside upon, can only offer IT operations teams a partial view of the overall health and performance of digital services. IT organizations must modernize their observability toolsets with Internet Performance Monitoring solutions.

Introducing Relational Fields

We’re excited to bring you relational fields, a new feature that allows you to query spans based on their relationship to each other within a trace. Previously, queries considered spans in isolation: You could ask about field values on spans and aggregate them based on matching criteria, but you couldn’t use any qualifying relationships about where and how the spans appear in a trace.

Control your log volumes with Datadog Observability Pipelines

Modern organizations face a challenge in handling the massive volumes of log data—often scaling to terabytes—that they generate across their environments every day. Teams rely on this data to help them identify, diagnose, and resolve issues more quickly, but how and where should they store logs to best suit this purpose? For many organizations, the immediate answer is to consolidate all logs remotely in higher-cost indexed storage to ready them for searching and analysis.

Aggregate, process, and route logs easily with Datadog Observability Pipelines

The volume of logs generated from modern environments can overwhelm teams, making it difficult to manage, process, and derive measurable value from them. As organizations seek to manage this influx of data with log management systems, SIEM providers, or storage solutions, they can inadvertently become locked into vendor ecosystems, face substantial network costs and processing fees, and run the risk of sensitive data leakage.

Dual ship logs with Datadog Observability Pipelines

Organizations often adjust their logging strategy to meet their changing observability needs for use cases such as security, auditing, log management, and long-term storage. This process involves trialing and eventually migrating to new solutions without disrupting existing workflows. However, configuring and maintaining multiple log pipelines can be complex. Enabling new solutions across your infrastructure and migrating everyone to a shared platform requires significant time and engineering effort.

Top 10 Ways to Reduce IT Cost Through Observability

Today, almost every other business is using cloud-native technologies and practices to grow business and increase revenue. No doubt, the modern cloud computing approach offers several opportunities for businesses to grow but it is also creating a new set of challenges. As per a report, SaaS companies spend almost 19% of their total revenue on IT. If not handled these challenges properly, it will erode the anticipated advantages. In fact, many businesses are under high pressure to reduce their IT costs.

Real User Monitoring With a Splash of OpenTelemetry

You're probably familiar with the concept of real user monitoring (RUM) and how it's used to monitor websites or mobile applications. If not, here's the short version: RUM requires telemetry data, which is generated by an SDK that you import into your web or mobile application. These SDKs then hook into the JS runtime, the browser itself, or various system APIs in order to measure performance.

Optimizing cloud resource costs with Elastic Observability and Tines

In today's cloud-centric landscape, managing and optimizing cloud resources efficiently is paramount for cloud engineers striving to balance performance and cost-effectiveness. By leveraging solutions like Tines and Elastic, cloud engineering teams can streamline operations and drive significant cost savings while maintaining optimal performance.

Driving SaaS Excellence Through Observability

For SaaS platforms, utilizing observability is crucial, as it’s vital for these companies to deeply understand their users' experience and the root cause of any issues. Observability involves leveraging the appropriate tools and processes in place to effectively track, examine, and troubleshoot the performance and behavior of a system, even if you can't directly see what's happening inside it.

What is Network Observability (vs. Network Monitoring)?

Today, we embark on an exciting journey into the realm of network observability, where the world of network monitoring gets a vibrant makeover. While network monitoring has long been the stalwart guardian of network operations, network observability swoops in like a caped hero, adding a whole new level of awesomeness to the game.

Observable systems with wide events

Oh, I didn't see you there. Hi, I'm Kevin, a developer here at Honeybadger. I've worked for the last year or so developing Honeybadger Insights, our new logging and observability platform. Let's peek into some of the design decisions and philosophy behind the product. In modern software development, the hunt for observable systems has traditionally revolved around the holy trinity of logs, metrics, and traces.

Open source observability explained - the Grafana Labs stack

Wish you could have open source observability explained to you? Senior Developer Advocate Nicole van der Hoeven explains how all the OSS projects from the Grafana Labs stack work together and how the picture they're all building towards is continuous reliability. TIMESTAMPS: ☁️ Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more. We also have plans for every use case.

Cisco Full-Stack Observability named a Leader in GigaOm Radar for Cloud Observability

The results are in! Learn why Cisco Full-Stack Observability was recognized as a Leader in GigaOm Radar for Cloud Observability in 2024. I’m pleased to announce that Cisco Full-Stack Observability was recently recognized as a Leader in the 2024 GigaOm Radar for Cloud Observability.

Mastering Live Debugging Techniques: A Must-Have Guide for Developers

Software debugging has undergone many transcendental shifts. These shifts are as fascinating as the transition from the biological origins of the term ‘debugging’ to its computer science incarnation. The moth that caused the first computer bug has led to a metamorphosis of the debugging scope to cover a much broader role in software development over the years. Live debugging is the latest manifestation of this evolution.

Transforming to an Engineering Culture of Curiosity With a Modern Observability 2.0 Solution

Relying on their traditional observability 1.0 tool, Pax8 faced hurdles in fostering a culture of ownership and curiosity due to user-based pricing limitations and an impending steep price increase. Pax8’s platform engineering team was keen on modernizing the company’s cloud commerce platform, but they were hitting obstacles with their traditional observability 1.0 tool, which relied on the three pillars of logs, metrics, and traces.

Discover Splunk - the unparalleled, most comprehensive full-stack observability solution

How do you become digitally resilient as an organisation? Hear from Maria Nyström, Regional Sales Manager at Splunk Sweden, about how Splunk is helping enterprises get full traceability in their environment. Splunk customers can trace any issue for any user and follow that to the application backend, the specific microservice and the infrastructure it runs on.

Introduction to Observability

These days, systems and applications evolve at a rapid pace. This makes analyzing the internal performance of applications complex. Observability emerges as a path to efficient and effective operational insights. Imagine a team of doctors monitoring a patient’s vitals—heart rate, temperature, blood pressure. These readings, combined with observation of symptoms, paint a picture of the patient’s health. This allows doctors to diagnose issues and provide care.

Google Cloud Welcomes Full-Stack Observability with StackState

When Google Cloud welcomed StackState to offer our full-stack observability solution to their network of customers, we were thrilled. Our excitement only grew when Google invited us to join other partners this week at Google Cloud Next ’24 at the Mandalay Bay Convention Center in Las Vegas.

Why companies choose Grafana Cloud for their hosted observability platform

Three different businesses, one shared problem: SailPoint, Kushki, and Flexcity were all looking for a hosted solution to help them optimize their telemetry storage, gain more insights from their observability strategy, and keep costs manageable. But what they gained from migrating to Grafana Cloud and working with Grafana Labs was much more. “The engineering team is super sharp. They’re experts. This is the best of the best," said Omar Lopez, head of the observability team from SailPoint.

Observability Vs. Monitoring: The Complete Comparison

Many often wonder, “Is there a difference between observability and monitoring?” The thing is as IT environments have become more complex, monitoring alone has become increasingly less effective. That’s because while monitoring is crucial, it isn’t particularly suited to tracking unforeseen or unexpected turns of events. That’s what observability is meant for. This guide will clarify what observability and monitoring are – and how they differ.

Honeycomb + Google Gemini

Today at Google Next, Charity Majors demonstrated how to use Honeycomb to find unexpected problems in our generative AI integration. Software components that integrate with AI products like Google’s Gemini are powerful in their ability to surprise us. Nondeterministic behavior means there is no such thing as “fully tested.” Never has there been more of a need for testing in production!

Setting Up the Latest AWS Observability Solution

The tutorial demonstrates how easy it is to deploy the AWS Observability Solution using the CloudFormation template using the quick and new method. The CloudFormation template being used in this method sets up an automated collection of logs and metrics from AWS to the Sumo Logic service.

Why You Need Observability With the Splunk Platform

Splunk’s extensible and scalable data platform has been instrumental in helping ITOps teams fully understand their tech environments and tackle any IT use case with data streaming, dashboarding, federated search, AI/ML, and more. But, with the explosion of telemetry and the growing complexity of digital systems, ITOps practitioners who rely solely on a logging solution are missing out on critical insights from their digital systems.

5 reasons why observability and security work well together

Site reliability engineers (SREs) and security analysts — despite having very different roles — share a lot of the same goals. They both employ proactive monitoring and incident response strategies to identify and address potential issues before they become service impacting. They also both prioritize organizational stability and resilience, aiming to minimize downtime and disruptions.

Instrumenting a Demo App With OpenTelemetry and Honeycomb

A few days ago, I was in a meeting with a prospect who was just starting to try out OpenTelemetry. One of the things that they did was to create an observability demo project which contained an HTTP reverse proxy, a web frontend, three microservices, a database, and a message queue. Here’s a rough diagram: Their motivation was to try out OpenTelemetry and see how much effort it took for them to instrument their system.

Observability benefits of Cisco Catalyst Center integration

LogicMonitor’s agentless collection has long provided customers with many benefits for collecting telemetry data directly from network devices. Recently, LogicMonitor added another feature, enabling the discovery of devices/sites and the collection of telemetry data from the Cisco Catalyst Center. Retaining options is essential due to the pros and cons associated with each approach.

360° Observability: Enhancing Reliability Across the Board

As a manager, figuring out how to talk to your engineering teams about building a strong observability strategy can feel overwhelming. But don't worry! This post will help you navigate the challenges to unlock the full power of observability in your IT environment. Drawing on insights from over 40 discussions with larger enterprises, we've put together a strategy assessment that examines three key focus areas — what we’re calling aspects — each encompassing three actionable steps.

Getting started with the Elastic AI Assistant for Observability and Microsoft Azure OpenAI

Recently, Elastic announced the AI Assistant for Observability is now generally available for all Elastic users. The AI Assistant enables a new tool for Elastic Observability providing large language model (LLM) connected chat and contextual insights to explain errors and suggest remediation.

Observing Core Web Vitals with OpenTelemetry: Part Two

Core Web Vitals (CWV) are Google's preferred metrics for measuring the quality of the user experience for browser web apps. Currently, Core Web Vitals measure loading performance, interactivity, and visual stability. These are the main indicators of what a user’s experience will be while using a web page: Note: As of March 12th, INP has become a stable Core Web Vital, replacing First Input Delay (FID).

How an APM Alternative Helps You Do Observability Right

Every software-driven business strives for optimum performance and user experience. Observability—which allows engineering and IT Ops teams to understand the internal state of their cloud applications and infrastructure based on available telemetry data —has emerged as a crucial practice to help engage this process. For years, application performance monitoring (APM) was the de facto practice and tooling that organizations have used to keep tabs on their critical systems.

Optimizing Operations: A Look At Observability For Manufacturers

As the automation of processes and deployment becomes more prevalent in the manufacturing industry, the need for IT services grows further. The use of complex systems and technologies, such as AI and robotics has become the new normal for manufacturing organizations.

Beyond the trace: Pinpointing performance culprits with continuous profiling and distributed tracing correlation

Observability goes beyond monitoring; it's about truly understanding your system. To achieve this comprehensive view, practitioners need a unified observability solution that natively combines insights from metrics, logs, traces, and crucially, continuous profiling. While metrics, logs, and traces offer valuable insights, they can't answer the all-important "why." Continuous profiling signals act as a magnifying glass, providing granular code visibility into the system's hidden complexities.

Do We Still Need to "Observe"? The Future of AI & O11y

AI has had a massive impact on every part of our lives, but mainly on how we consume large data sets easily. The observability world is based on collecting enormous amounts of data and consuming it by observing dashboards built on monitoring tools. Most of the o11y tasks like writing complex queries, creating dashboards & defining alerts, have been done much in the same way for the last decade & AI models are well-positioned to disrupt this modus operandi.