Operations | Monitoring | ITSM | DevOps | Cloud

Incident Response

Kubernetes Incident Response: 5 Metrics to Watch

Kubernetes is a central part of modern IT infrastructure. Like any critical system, it is becoming a valuable target for attackers. In order to identify and respond to security threats, teams need metrics that indicate anomalous activity and can indicate a direction for investigation.

How to Introduce Automation to Incident Response with Slack and PagerDuty

Major-incident war rooms are synonymous with stress. Pressure from executives, digging for a needle in a haystack, too much noise—it’s all weight on your hardworking technical teams. Incident responders clearly need a more effective way to collaborate across various technical teams. A method that both minimizes interruptions and keeps stakeholders up to date while ensuring everyone has the right level of context to do their job.

Incident response: how to keep tech problems from becoming people problems

Subscribe to Work Life Get stories about tech and teams in your inbox Subscribe When one of your IT services is on fire there’s no time to waste. Especially if that fire is blocking your users from getting stuff done. Rapid resolution tends to eclipse all else during an incident, often causing your team to ignore or forget pieces of the incident response process – like keeping people in the loop.

Make your Onboarding Experience Better with a Murder Mystery Game

Onboarding a new tool can be boring. Or stressful. Or both. When onboarding an incident response tool, it can be difficult to make sure that your team is getting the most from the experience. Do you opt for a run-of-the-mill meeting, or try to learn while in an incident? Neither option is ideal. That’s why Petal’s DevOps Engineer Michael Cole found a new way to get his team using Blameless for their incident response process.

Webinar (UK) - Silence the Noise: Simplify Your Crisis Response

Silence the Noise: Simplify Your Crisis Response, aims to educate you on simplifying the complexities of managing information during an incident. Since COVID, all organisations have experienced the cumbersome processes of managing a long term, on-going incidents This webinar will address how to simplify information management and apply these practices to a real life scenario.

Incident Response Alert Routing

You have identified a data breach, now what? Your Incident Response Playbook is up to date. You have drilled for this, you know who the key players on your team are and you have their home phone numbers, mobile phone numbers, and email addresses, so you get to work. It is seven o’clock in the evening so you are sure everyone is available and ready to respond, you begin typing “that” email and making phone calls, one at a time.

Pragmatic Incident Response: Lessons learned from failures by Robert Ross Failover Conf 2021

Incident response is overwhelming. So where do you start? There's a lot of advice out there, but it's mostly theories that aren't taking reality into account. So how do you get a process in place that actually works and scales? In this session, FireHydrant CEO and Co-Founder, Robert Ross, will share quick stories from his experience as an SRE and what tips he’s learned along the way.

Digital Transformation in Banking: Transforming Financial Services With Incident Management

Financial services institutions have been facing pressure to modernize their operations for years. But legacy architecture and processes—along with compliance regulations—have made rapid innovation difficult to achieve. Adding to this pressure are new, digital-first competitors who accelerate the need for financial services to deliver better digital customer experiences both more consistently and at scale.