Operations | Monitoring | ITSM | DevOps | Cloud

Heartbeat behind the metrics | Hemachand on what visibility really means

What happens when observability grows faster than infrastructure? In this episode of Heartbeat Behind the Metrics, Hemachand Munagapati, Product Manager at Site24x7, reflects on over 15 years with the product and how the idea of a single pane of monitoring has shaped everything that followed.

How Dartmouth avoided vendor lock-in and implemented LBaaS with HAProxy One

History is everywhere at Dartmouth College, and while the campus is steeped in tradition, its IT infrastructure can’t afford to get stuck in the past. In an institution where world-class research and undergraduate studies intersect, technology must be fast, invisible, and – above all – reliable. That reliability was put to the test when Dartmouth’s load balancing vendor was acquired twice in five years, as Avi Networks moved to VMware and VMware moved to Broadcom.

How Okta keeps 99.99 percent uptime with #datadog

How do you maintain 99.99 percent uptime across thousands of Kubernetes hosts and multiple cloud providers? Okta engineers explain why observability is critical to keeping authentication and authorization services running at scale. Watch how Okta uses Datadog to bring metrics, logs, and traces into a single view, speed up root cause analysis, and reduce time to mitigation while controlling costs.

Universal Mesh in action: how PayPal solved multi-cloud complexity with HAProxy

The hardest part of modern infrastructure isn’t choosing your deployment environments — it’s bridging communication between them. Large enterprises are constantly facing the challenge of keeping everything connected, secure, and fast when their infrastructures are spread across different clouds and on-premises systems.

How LinkedIn modernized its massive traffic stack with HAProxy

Connecting nearly a billion professionals is no small feat. It requires an infrastructure that puts the user experience above everything else. At LinkedIn, this principle created a massive engineering challenge: delivering a fast, consistent experience across various use cases, from the social feed to real-time messaging and enterprise tools.

Spotify's performance & control across large monitoring environments with VictoriaMetrics

When your active time series is in the billions and the total number of data points you need to monitor runs into the tens of trillions, you need a high-performance observability solution with operational simplicity. Streaming behemoth Spotify is one such case. Their observability team chose VictoriaMetrics as the fastest monitoring and observability solution on the market.

How Inkeep Monitors Their AI Agent Framework with SigNoz

AI agents are fundamentally different beasts to monitor compared to traditional applications. A single user request can trigger a cascade of 10+ internal operations: sub-agent transfers, tool executions, LLM calls, API requests, each with unpredictable latency and failure modes. When something goes wrong (and with LLMs, things go wrong in creative ways), you need to see the entire execution flow to debug effectively.