Operations | Monitoring | ITSM | DevOps | Cloud

Alert fatigue, part 3: automating triage & remediation with check hooks & handlers

In many cases — as you’re monitoring a particular state of a system — you probably know some steps to triage or in some cases automatically fix the situation. Let’s take a look at how we can automate this using check hooks and handlers.

Simplifying security auditing, part 6: Compliance and the cloud

In part 5, we looked at auditing your network device logs. A decade ago, security professionals were primarily concerned about network perimeter and endpoint security. While those concerns are still valid, technological advancements have created new scenarios that need to be addressed.

Incident Management (class SRE implements DevOps)

In the previous video, Liz and Seth discussed how to make systems observable and how observability helps us diagnose failing systems, but didn't cover what to do when an incident grows beyond the ability of one person to do it all. In this video, you learn about the most important part of the incident management process – humans.