Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Making the Most of PagerDuty + Datadog

For your team to effectively respond to incidents, you need a shared, unambiguous incident definition so you can recognize when an incident has occurred and assign the appropriate severity. Definitions of an incident differ across teams, but whatever definition you use, identifying and monitoring key service level indicators (SLIs) can help you understand when your service is operating normally—and when its performance has degraded to the point where you need to trigger an incident.

A single person on-call "rotation" is a critical vulnerability

One of the most common complaints we hear from operations and site reliability engineers is about the quality of life impacts and the resulting stress imposed by their on-call responsibilities. Most of us are already aware that a proper on-call rotation is critical to our engineering organization’s health in terms of both immediate incident response and long-term sustainable growth.

OnPage Mentioned in Two 2019 Gartner Hype Cycle Reports

Gartner’s Hype Cycle for Business Continuity and IT Performance Analysis are trusted reports, identifying solutions that enhance and solidify an organization’s business continuity. The OnPage team is pleased to announce that we’ve been included in two of Gartner’s Hype Cycle reports, listing OnPage’s incident alert management solution as a trusted tool for today’s support teams.

Vodafone Utilizes PagerDuty to Better Understand Their Real-Time Operations

Vodafone is a telecommunications company providing 4G network coverage for 18 million customers and 99% of the United Kingdom’s population. Ben Connolly, Head of Digital Engineering at Vodafone, details the challenges that his engineering teams were facing and why PagerDuty was the perfect fix. PagerDuty helps Vodafone deliver a better customer experience by allowing their teams to see the impact that they're having in real time.

ChatOps-The future of collaboration

ChatOps is the implementation of chatbots to unify communication and collaboration. Through ChatOps every single member of a team will be aware of what the other members are working on. It is the logical next step in the evolution of communication among teams after email and IM. Projects of today are developed at a global scale with millions of people as potential users, this means that teams are larger and often work in shifts or even remotely.

Don't Treat Your Business Metrics Like Other Metrics

Many companies today try to feed business metrics into APM or IT monitoring systems. Splunk, Datadog and others track your business in real time, based on log or application data – something that would seem to make sense. In practice, however, it fails to produce accurate and effective monitoring or reduce time to detection of revenue-impactful issues. Why? Because monitoring machines and monitoring business KPIs are completely different tasks.