Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Application Performance Monitoring and related technologies.

Sponsored Post

How to improve your Crash Free Users score in minutes

If you're reading this blog, you likely already know the importance of quality software. But with the overwhelming number of metrics that can be monitored and improved, development teams are struggling with what metrics they should prioritize to have the most significant impact. The Crash Free Users score in Raygun is a perfect place for development teams who care about software quality to focus their efforts. It tells you what percentage of users didn't encounter a crash or error while using your software and is an ideal north star to gauge the overall quality of your software.

How Okta keeps 99.99 percent uptime with #datadog

How do you maintain 99.99 percent uptime across thousands of Kubernetes hosts and multiple cloud providers? Okta engineers explain why observability is critical to keeping authentication and authorization services running at scale. Watch how Okta uses Datadog to bring metrics, logs, and traces into a single view, speed up root cause analysis, and reduce time to mitigation while controlling costs.

Web Performance Metrics: Why INP Is Your Most Practical UX Performance KPI

Every developer has seen this scene: a user clicks a button, nothing happens, they click again—still nothing—and by the third frustrated tap, three overlapping modals explode onto the screen. The page wasn’t slow to load. It was slow to respond. This highlights the importance of perceived performance—how fast and responsive a website feels to users—which can shape user satisfaction regardless of actual load times.

Top 15 Application Performance Metrics for Developers and SREs in 2026

Every application tells a story of user intent, system behavior, and business impact. To truly understand how your application performs, you need to go beyond logs and errors. You need metrics that provide actionable visibility across your stack. Application performance metrics are the foundation for delivering high-quality digital experiences, and they empower DevOps teams, developers, engineers, and site reliability engineers (SREs) to respond faster, scale smarter, and continuously improve.

Redefining Application Management Services - the AIOps Way

For years, Application Management/Maintenance Services (AMS) have been the go-to solution for IT leaders trying to keep their business applications stable and running. The AMS pitch was simple: Hand over your apps to us, and we’ll manage and maintain them for you! And for a long time, that model has delivered promising results. It allows internal teams to focus on innovation while service providers handle the operational heavy lifting.

Debugging AI Agents in Production Without Losing Your Mind

AI agents are powerful, but debugging them in production is hard. Non-deterministic behavior, LLM latency, and token costs create observability challenges that traditional monitoring tools don't address. In this webinar, engineers from Inkeep and SigNoz walk through how Inkeep monitors its AI agent framework in production using OpenTelemetry-native observability.

Easily Map Logs to OCSF with Datadog Observability Pipelines

Normalizing security logs into the Open Cybersecurity Schema Framework (OCSF) is often complex, manual, and time-consuming. With Datadog Observability Pipelines, you can easily transform logs into OCSF format—right in your own environment—before routing them to destinations like Splunk, CrowdStrike, and AWS Security Lake. This video show how Security teams can use Observability Pipelines to: Collect, process, and transform logs into OCSF format automatically.

Taking Server Monitoring to the Next Level

For many years, uptime and availability have been basic standard measures of server health monitoring. But if a server is up and responding to a ping or HTTP request, does that really mean that all is well? In reality, uptime and availability alone often provide a false sense of security. A server can be technically “up” while being seconds away from a crash, running out of memory, operating with an expired license, or silently failing critical updates.

Beyond the Blue Link: UX Patterns for Google's AI Overviews, AI Mode & Answer Engines

The blue link is dying—but not in the way we expected. When Google’s AI Overviews began appearing at the top of the search results page, the SEO community panicked. Publishers watched click-through rates plummet. The Pew Research Center confirmed their fears: searchers who encounter an AI summary are half as likely to click on traditional search results (8% vs. 15%).