Operations | Monitoring | ITSM | DevOps | Cloud

Introducing The PagerDuty Postmortem Guide

Your team had been fighting this major incident for hours, but your investigation was hitting one dead end after another. Finally, you managed to isolate the problem and your graphs started to improve. When all systems went back to normal, everyone let out a collective sigh of relief, shut down the response call, and went back to bed, never to think of this incident again. Or so you thought.

A How-To Guide To SLAs (Service Level Agreements), Best Practices, And Why They Are So Important To Customers

You’ve heard it so many times: Transparent communication is the key to any successful relationship. The banking industry learned this lesson when cyber attacks began to plague their customers, and the official line for many financial institutions was to deny there was a problem. That is until the hacks became so profound and so persistent that it became impossible to cover them up any longer.

OpsRamp Winter Release, January 2019: Be the First to Know and Take Action Faster with Context and Insight

The OpsRamp winter release delivers greater service-centricity and context for hybrid infrastructure management with intelligent incident management and cloud native monitoring. The January 2019 release features innovations such as a new UI for service maps, enhanced AIOps capabilities and cloud native monitoring features.

OpsRamp Delivers Greater Service Centricity, Expanded AIOps and Cloud Native Monitoring

OpsRamp, the service-centric AIOps software-as-a-service (SaaS) platform for the hybrid enterprise, today announced new topology maps, enhanced artificial intelligence for IT operatzions (AIOps) features and new monitoring capabilities for cloud native workloads.

Escalations and Maintenance Windows Are Critical to Downtime Response

Uptime.com includes several advanced check options to provide the flexibility organizations need in creating a response plan to downtime. Maintenance and planned downtime for patches and updates don’t typically create severe downtime events. With escalations, teams have an automated alert system that contacts designated senior-level personnel with relevant technical data.

Introducing the OpsRamp Winter Release, January 2019

OpsRamp helps digital operations teams drive resilient and responsive IT services by discovering topological relationships between resources at multiple levels in the increasingly hybrid and multi-cloud IT stack. In this webinar you’ll get an overview of Winter Release, including demonstrations of features to drive greater efficiency within modern IT operational environments.

Video AMA: Ana Medina

Ana is currently working as a Chaos Engineer at Gremlin 10, helping companies avoid outages by running proactive chaos engineering experiments. She last worked at Uber where she was an engineer on the SRE and Infrastructure teams specifically focusing on chaos engineering and cloud computing. Catch her tweeting at @Ana_M_Medina 11 mostly about traveling, diversity in tech, and mental health.

Improving MSP Incident Alert Management

Improving MSP Incident Alert ManagementAs the big game approaches this Sunday, I’ve been thinking about the NFL’s introduction of instant replay and how it makes the league much more enjoyable! Whether you’re rooting for the Patriots led by Tom Brady … or the Rams, you can’t deny that instant replay makes every Super Bowl much more efficient and adds more clarity to the game.

Why ITOps still suffers from alert fatigue

It takes a lot of time, effort and money to configure centralized monitoring. Making it all the more frustrating that those carefully crafted alerts will probably just end up being ignored. So why has the whole of ITOps collectively decided to banish your monitoring alerts to their junk folders? The simple answer: alert fatigue.