Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Telemetry Now Teaser: "Turning Network Telemetry Into Financial Insight"

Network operators prioritize cost, performance, security, and reliability as their core foundational needs. But how do they get the economic data to make tradeoffs when one of these needs suffers? Tune into the latest Telemetry Now with special guest Lauren Basile to learn how Kentik Traffic Costs is providing data-backed answers to these questions.

How We Built VictoriaLogs Cluster: A CTO's

Go behind the scenes with the VictoriaMetrics team! In this special talk, Marc Sherwood is joined by our CTO, Alexander Marshalov, to explore our powerful, open-source logging solution, VictoriaLogs. This isn't just a feature showcase. This is a deep dive into the engineering mindset that drives our development. Alexander shares firsthand insights into why we built VictoriaLogs Cluster, the technical challenges of creating a distributed system for logs, and the core principles of simplicity and efficiency that guide our architecture.

How GenAI Is Empowering Elastic Workforce

With over 10,000 questions answered and a 99% satisfaction rate in just 90 days, ElasticGPT, our internal generative AI assistant built on Elastic’s Search AI Platform, is transforming how our teams find information, make decisions, and complete day-to-day tasks. Matt Minetola, CIO, explains how ElasticGPT helps employees access company knowledge faster using natural language queries. Learn how we’re using retrieval augmented generation (RAG) and a secure, scalable architecture to deliver trusted, real-time AI experiences across the organization.

Model your architecture with custom entities in the Datadog Software Catalog

Every software organization has its own unique architecture and workflows. Beyond services and APIs, teams rely on internal libraries, CI/CD jobs, data pipelines, AI agents, and more to keep systems running smoothly. But as architectures grow more complex and interconnected, it can become difficult to keep track of all the structural dependencies and interactions in one place.

Why Does Your Node.js App Crash in Production and How Can You Fix it?

Node.js has become one of the most popular platforms for building scalable and high-performance web applications. Its event-driven, non-blocking I/O model allows developers to efficiently handle thousands of concurrent connections with minimal overhead. However, many businesses still face a critical challenge, Node.js applications often crash unexpectedly in production environments, causing downtime, lost revenue, and damage to brand reputation.

The telemetry time bomb - and what to do about it

Telemetry data is growing at an average of 29% a year — doubling costs every 18 months. That’s putting pressure on ITOps budgets, observability platforms, SecOps teams, and SIEM deployments alike. In this post, we’ll explore how unchecked data volumes, siloed tools, and aging architectures are creating a telemetry cost crunch that limits visibility, slows both troubleshooting and threat detection, and impacts business outcomes.

What is AI-Native Monitoring? The Complete Guide for Developers

Before we talk about AI-native monitoring, let’s take a quick step back to make sure everyone is on the same page. In software engineering, monitoring is the continuous collection and analysis of data about a system’s health, performance, and behavior. Tools like Scout Monitoring, Datadog, and New Relic traditionally track server uptime, request latency, error rates, and database performance.