Modern Agile practices and DevOps methodologies are leading to faster feature releases even though systems are becoming more complex. With high velocity comes more change and more change leads to more alerts and incidents in applications and infrastructure. So, the only surefire way for DevOps and IT teams to build reliable services is through proactive testing and an efficient on-call incident response plan.
Continuous improvement is one of the fundamental tenets of Agile methodology that PagerDuty’s product development teams emphasize. This already works fairly well at the individual team level via retrospective meetings and postmortems but sometimes we don’t notice larger or systemic issues that are outside the control of a single team. This blog will share the process that we use at PagerDuty to uncover those issues, the outcomes we have seen, and how we have evolved that process.
In the traditional IT Infrastructure Library (ITIL) approach to IT service management (ITSM) and IT operations, root cause analysis is required for effective incident management. But, over time, DevOps and IT teams are learning that there’s rarely one single root cause. Sure, one singular action (e.g. a new deployment) can result in one, short-lived incident. But, what about all the other actions leading up to that action?
Most technical incident response plans typically account for stakeholder communications—for both internal teams and external customers. But at PagerDuty, what we’ve learned from our customers is that there’s still a painful and expensive gap in alignment between IT and business teams. To close that gap, we need to focus on what incident response means for business teams.
Managed service providers (MSPs) simplify IT for global organizations of all sizes — and cloud services present new opportunities that enable these service providers to grow their revenues. Now, let’s take a closer look at MSPs and how they can capitalize on cloud services.
The current business environment requires organizations to implement cybersecurity safeguards to avert disasters associated with breaches, loss of data and hefty fines. Simply implementing a cybersecurity plan isn’t enough, it’s also important to incorporate the right solutions and workflows to prevent a disaster. This post will discuss the current state of cybersecurity, highlighting what organizations should be mindful of to successfully defend against malicious parties.
Collaborative help desks and service desks are essential to both IT and customer support. Together, they give teams a way to respond to internal and external incidents and work cross-functionally to support reliable services for end-users. Whether incidents are detected via monitoring tools or through technical support help desks, the business needs a cohesive incident management plan to maintain uptime and keep customers happy.