The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
SIGNL4 supports the remote execution of automated tasks or workflows in IT or IoT systems using Remote Actions. These remote actions offer a wide range of applications. You can execute remote actions in response to an alert to trigger some kind of remediation action. But there are many more possible use cases. This article provides some examples and ideas about what is possible.
People make mistakes, technology breaks down, and processes aren’t infallible. But, when incidents happen, what can we do about it? What can we learn? As with all things, learning isn’t a binary action, it’s a process. And, when an incident occurs, organizations typically conduct a post-mortem analysis and generate a post-incident review to uncover what went wrong and why.
Imagine being part of an overactive group chat that causes your phone to buzz every few minutes. In the beginning, you open every message but soon realize that most of them aren't important-or at least are not relevant to you. So, what do you do next? Maybe you let the messages pile up and check them later. Or perhaps, you mute the group chat and ignore the incoming messages altogether. You can blame this tendency to ignore or avoid incoming messages or notifications on one culprit: alert fatigue.
An incident has been declared and your runbook has fired. Everyone is gathered in your Slack channel, the tickets are opened, and roles are assigned. Now what? This is when most teams manually update status pages and kickoff investigation streams using a patchwork of tribal knowledge and supporting playbook documents.
Reliability and chaos might seem like opposite ideas. But, as Netflix learned in 2010, introducing a bit of chaos—and carefully measuring the results of that chaos—can be a great recipe for reliability. Although most software is created in a tightly controlled environment and carefully tested before release, the production environment is harsher and much less controlled.
Episode 5, Mooving to… Practical Postmortems covers how to leverage postmortems to effectively learn from failure. Postmortems are a commonplace reference and are now considered a best practice in most modern engineering teams. However, there’s still a lot of confusion on what postmortems should be – and more importantly, what they should NOT be. Thom Duran, Senior Manager of Productivity from Panther walks us through all that and more in the latest Mooving To.. episode!