Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

On Call Schedule

An enterprise can use an on-call schedule that defines who is available to respond to incidents 24/7. Yet, how your enterprise builds and manages its on-call schedule can impact departments and stakeholders across your organization. When it comes to on-call scheduling, your enterprise must plan as much as possible. Fortunately, with the right processes and tools, you can effectively implement and manage an on-call schedule.

Incident Ready: How to Chaos Engineer Your Incident Response Process

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, shared how our customers leverage best practices to break, mitigate, resolve, and fireproof incident processes.

Incident Ready: How to Chaos Engineer Your Incident Response Process | FireHydrant

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, will share how FireHydrant customers leverage best practices to break, mitigate, resolve, and fireproof incident processes. We’ll show you how to use chaos engineering philosophies to stress test 3 critical parts of a great process.

Microsoft's 3 major incidents in 10 days, where did they go wrong?

Just in case you haven’t heard, last week Microsoft experienced a huge outage that prevented users from accessing its Office 365 cloud-based subscription service which serves 200 million active monthly users. This latest outage was the third in ten days, causing the company to receive a deluge of customer complaints about a 'something went wrong' message that popped up when they tried to access their accounts.

October 2020 Update: Mute overwrite for iPhone (Critical Alerts), undo and more

Our October update brings the long-awaited mute-overwrite on iPhone (‘critical alerts’). We also introduce an undo action for Signl acknowledgements or closures. And in the web app you can now batch-ack and close to multiple Signls at once. All new features are introduced below – enjoy.

What is IT On-call?

An “on-call” worker is available to provide support at their employer’s request. Your enterprise may have on-call employees available across various departments, and these workers can help your business if problems arise, even outside of normal operating hours. Bonus Material: Advanced On-call Escalation Example PDF How you manage your on-call teams can have significant ramifications on your enterprise and its stakeholders.

Anomaly detection 101

What is anomaly detection? Anomaly detection (aka outlier analysis) is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset’s normal behavior. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance a change in consumer behavior. Machine learning is progressively being used to automate anomaly detection.