Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Sentry AI code review, now in beta: break production less

This could’ve been prevented. This should have been prevented. This too. We all hate getting tagged in PRs. The time, the blame for when you inevitably miss something, and constant “I wouldn’t have written it that way” feeling is just hard to shake. LLMs promised this would get easier. Promised they would do it for us. But as we’ve seen, we’re not there yet. But this is what Sentry does for a living. We catch bugs… in prod.

Key APM Metrics You Must Track

Application Performance Monitoring (APM) helps you understand how your software runs in production. When you track the right metrics, you see how requests move through your system, where slowdowns happen, and how resources are being used. With this knowledge, you can spot issues early and keep your applications reliable for your users. In this blog, we discuss the key APM metrics to monitor, grouped into categories, and why each one matters for performance and user experience.

Memory stall: the agony before OOM

When we set a memory limit for a container, the expectation is simple: if the app leaks memory, the OOM killer steps in, the container dies, Kubernetes restarts it, done. But reality is messier. As a container gets close to its memory limit, allocations don’t just fail instantly. They get slower. The kernel tries to reclaim memory inside the cgroup, and that takes time. Instead of being killed right away, your app just crawls.

Building Real-Time Data Pipelines with Kafka, Telegraf, and InfluxDB 3

When milliseconds matter and data never stops flowing, you need a pipeline that can handle high-velocity streaming data with reliability and scale. The modern streaming stack of Kafka, Telegraf, and InfluxDB 3 Core delivers exactly that. To give you a concrete example, this blog works with a fictitious use case: “Papa Giuseppe’s Pizzeria.” Every oven, prep station, and order in this pizza restaurant generates data. Our workflow looks like this.

Beyond Automation: The Rise of Agentic Networks

Agentic AI is the next evolution in network management, moving beyond simple automation to intelligent systems that can reason, plan, and act autonomously. Justin Ryburn, Kentik Field CTO, highlights how this shift automates expertise, enables proactive problem-solving, and empowers human engineers for strategic innovation.

10 Best Practices for Proactive Database Performance Monitoring to Prevent Downtime

Databases are the core of modern applications, whether it is an e-commerce platform, a banking system, or a social media app. Slow database performance or unexpected downtime can cause serious problems, from lost revenue to poor customer experience. Proactive database performance monitoring helps teams identify issues before they escalate. Unlike reactive monitoring, which only addresses problems after they occur, proactive monitoring ensures your database remains fast, stable, and reliable.

Node.js Event Loop: Why Monitoring Matters

Node.js has become a cornerstone for modern application development because of its non-blocking and asynchronous architecture. According to Stack Overflow Developer Survey, Node.js remains among the most widely used technologies for web applications, powering millions of services globally. While this event-driven model provides scalability and efficiency, it also introduces challenges.

InfluxDB 3 Enterprise: Deploy Your Way, Scale on Demand

InfluxDB 3 Enterprise is engineered for performance and designed for flexibility, delivering high-scale, production-ready time series data management with operational simplicity. InfluxDB 3 Enterprise is built on a cloud-native, diskless architecture that removes the limits of traditional storage. It’s easy to deploy, scales effortlessly, and eliminates the complexity of managing clusters so you can deploy your way and meet the unique demands of your environment.