Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Elastic Observability 8.15: AI Assistant, OTel, and log quality enhancements

Elastic Observability 8.15 announces several key capabilities: New and enhanced native OpenTelemetry capabilities: Elastic AI Assistant enhancements: Large language model (LLM) observability for Azure OpenAI: Elastic Observability now provides deep visibility on the usage of the Azure OpenAI Service. The integration includes an out-of-the-box dashboard that summarizes the most relevant aspects of the service usage, including request and error rates, token usage, and chat completion latency.

Monitor your Anthropic applications with Datadog LLM Observability

Anthropic is an AI research and development company focused on building reliable and safe artificial intelligence systems. Their flagship product is Claude, an advanced language model and conversational AI assistant known for its strong capabilities in natural language processing, reasoning, and task completion. Anthropic places a particular emphasis on AI safety and ethics, and its models and APIs are used by organizations across various industries to build powerful, safe, and performant AI applications.

Event Logs Explained: Your Guide to System Health

Event logs contain critical information and the analysis of these logs will support organizations in the detection of many security incidents, from auditing user access to observing malicious traffic and even isolating monitor rule changes on a firewall. By collecting event logs systematically and analyzing them, organizations can obtain insights into their IT environment for maintaining operational efficiency and security.

Understanding the Deficiencies of AWS CloudWatch for Cloud Visibility

While CloudWatch offers basic monitoring and log aggregation, it lacks the contextual depth, multi-cloud integration, and cost efficiency required by modern IT operations. In this post, learn how Kentik delivers more detailed insights, faster queries, and more cost-effective coverage across various cloud and on-premises resources.

Is the Internet ready for L4S?

Today, Catchpoint is pleased to be sharing the results of our Global Explicit Congestion Notification (ECN) Bleaching Rates measurement campaign, covering the state of ECN bleaching worldwide, according to Catchpoint’s perspective. ISPs, telecoms and streaming services, among others (this information should be of interest to anyone with ISP dependencies), will be able to draw on this information to determine if your network or an upstream network is experiencing ECN bleaching.

Observe deleted Kubernetes components in Grafana Cloud to boost troubleshooting and resource management

As a site reliability engineer, you need constant vigilance and a keen eye for detail if you want to manage your Kubernetes infrastructure effectively. As part of that effort, you need to see the historical data from your pods, nodes, and clusters — even after they’ve been deleted or recreated. Many SREs rely on kubectl for this, and while it’s indispensable for real-time Kubernetes management, it presents some significant challenges with historical data.

The CoPE and Other Teams, Part 2: Custom Instrumentation and Telemetry Pipelines

The previous post laid out the basic idea of instrumentation and how OpenTelemetry’s auto-instrumentation can get teams started. However, you can’t rely only on auto-instrumentation. This post will discuss the limitations in more detail and how a CoPE can help teams overcome them.