Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Trending: Automation in I&O Optimization according to the Gartner 2023 Hype Cycle

In this blog, we take you through the latest trends in I&O optimization as Gartner’s report Hype Cycle for I&O Automation, 2023 predicts the widespread adoption of automated tools supporting IT infrastructure. This blog focuses on tools—like OnPage’s incident alert management solution—likely to be widely adopted as a standard for I&O optimization in the near future.

The Unplanned Show, Episode 7: Death of the Single Security Pane of Glass with Heather Hinton

In this episode, Heather Hinton describes how security teams can evolve away from spending cycles on “silly little jobs” and scouring multiple sources to try to identify the kinds of unplanned interrupt work that needs to be dealth with urgently. Instead, they can complete projects faster and take on more because on-call rotations are spent getting work done (with the occasional interruption) instead of “seeking” for the interrupt work. We also discuss how this fits in with encouraging broader employees to participate in security hygiene practices.

How to Maximize Time Savings and Reduce Toil During Incident Response

Incidents are a costly burden on businesses. Despite assembling the right people and teams, the manual work, tool setup and prolonged tasks can negatively impact customer experience. The need for adaptable processes to address diverse incident types further complicates the situation. This is where the PagerDuty Operations Cloud steps in. It streamlines and automates all the various manual steps in the incident response process.

Sponsored Post

Kubernetes Monitoring Best Practices

Kubernetes can be installed using different tools, whether open-source, third-party vendor, or in a public cloud. In most cases, default installations have limited monitoring capabilities. Therefore, once a Kubernetes cluster is running, administrators must implement monitoring solutions to meet their requirements. Typical use cases for Kubernetes monitoring include: Effective Kubernetes monitoring requires a mix of tools, strategy, and technical expertise. To help you get it right, this article will explore seven essential Kubernetes monitoring best practices in detail.

Failure Fridays at PagerDuty

Rich Lafferty, Staff SRE at PagerDuty and Stevenson Jean-Pierre, Senior Manager, Software Engineering at PagerDuty join Mandi Walls to talk about PagerDuty’s Failure Friday and Failure Any Day practices. PagerDuty has been using failure injection and chaos engineering methods to maintain the reliability of production services. Rich and SJP joined the PagerDuty live stream to talk about how the process works, how it has evolved, and how failure helps improve PagerDuty’s services.

The DevSecOps Toolchain: Vulnerability Scanning, Security as Code, DAST & More

DevSecOps is a philosophy that integrates security practices within the DevOps process. DevSecOps involves creating a ‘security as code’ culture with ongoing, flexible collaboration between release engineers and security teams. The main aim of DevSecOps is to make everyone accountable for security in the process of delivering high-quality, secure applications. This culture promotes shorter, more controlled iterations, making it easier to spot code defects and tackle security issues.

The Medium is the Message: How to Master the Most Essential Incident Communication Channels

We’ve all seen it: a company experiencing a major incident and going radio silent, leaving their customers to wonder “Are they doing something about this?!”. If you’ve ever been on the inside of something like this, you know the answer is most likely yes, there are people working hard to put out the fire as quickly as possible. But when it comes to incidents, perception is reality for customers.