Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Humanizing a DevOps Transformation

Anyone who’s ever played the game of chess knows there’s more than one way to reach a desired outcome. There are 400 possible setups after the first turn; 197,742 after the second; and just north of 120 million after the third—all of which are marching toward the same desired outcome. “So, what does any of this have to do with DevOps?” you ask? Fair question.

Effective Communication Between Healthcare Professionals - Best Practices

Effective communication between healthcare professionals is critical for timely and effective operations. In a modern healthcare environment, communication technologies are critical for connecting healthcare professionals with other caretakers and healthcare entities, ensuring the best, most effective, immediate care to patients.

Choosing the Right SRE Tools

Implementing SRE practices and culture can be challenging. Fortunately, there are a variety of tools for each aspect of SRE: monitoring, SLOs and error budgeting, incident management, incident retrospectives, alerting, chaos engineering, and more. In this blog, we’ll talk about what to look for in an SRE tool, and how they’ll help you on your journey to reliability excellence.

I Have An SLO. Now What? -Alex Hidalgo

It’s 2020: There is a plethora of data available about measuring SLIs and setting SLO targets. But, now that you have this data, what are you actually supposed to do with it? The classic example of “Ship features when you have error budget; focus on reliability when you don’t.” is antiquated, too simple, and ignores all of the amazing discussions and decisions you can have with your SLO data. Let’s talk about how you can use SLOs to actually make people happier — from your customers, to your engineers, to your business.

Look Upstream to Solve your Team's Reliability Issues

In “Upstream” by Dan Health, we explore a variety of different problems ranging from homelessness, to high school graduation rates, to the state of sidewalks in different neighborhoods within the same city. In each of these examples, Dan discusses how upstream thinking decreased downstream work. Upstream thinking is characterized as proactive, collective actions to improve outcomes rather than reactions after an issue has already occurred.

Keeping your teams and customers in the loop during downtime

Making your organization more transparent is not always an easy process. In our latest blog post, Adam Hammond, shares some tips and tools that can help you get started when it comes to keeping your teams and customers in the loop during downtime.The core message is that you need to make communication a cultural pillar of your organization.

ChaosSearch Announces New Integration With Opsgenie

ChaosSearch is excited to announce its new integration with Opsgenie — Atlassian’s alerting and incident management platform. Using this integration, your teams can leverage the industry’s most powerful and comprehensive data monitoring and analytics capabilities channeled into a unified workflow through Opsgenie’s easy-to-use interface.