The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
Today’s customers see availability as a given. What do they really want? Bigger, better technology with new features and faster platforms. But, according to our recently released Moogsoft State of Availability Report, teams burn their time, money and energy on incident management. In fact, engineers overwhelmingly report that incident management takes up most of their time.
The FEMA Incident Command System responds to wide area disasters like an earthquake, fire, flood, hurricane, and tornado, while ITIL is used for digital services and applications. In large organizations, there is the facilities team and the data center team. FEMA is associated with the facilities team and ITIL with the smaller data center team. What characteristics are shared between the two and what are the main differences?
It’s that time of the year! PagerDuty is coming back to sin city for AWS re: Invent 2022! The global conference brings organizations of all sizes and is set to explore themes of modernization, automation, and resiliency in the cloud. With current economic conditions, enterprises are looking to scale operations and optimize costs while delivering always-on, digital experiences to their customers. Automaton plays a key role in helping support operational and cost efficiency.
In a perfect world, technology stays on and runs flawlessly. But we all know this isn't the case. Like any organization, xMatters sometimes experiences unplanned incidents. What we can control is how we respond to them. To resolve incidents quickly, it's important to coordinate an organized response.
Knowing who is in charge helps teams avoid confusion about who to turn to during a crisis, allowing them to focus their efforts where needed. When the pressure is on, an incident commander should have an established response plan to ensure that responders act quickly and coordinate efficiently, and with actionable insights this can be made possible.
I joined Honeycomb as a Staff Site Reliability Engineer (SRE) midway through September, and it’s been a wild ride so far. One thing I was especially excited about was the opportunity to see Honeycomb’s incident retrospective process from the inside. I wasn’t disappointed! The first retrospective I took part in was for our ingestion delays incident on September 8th.