January 2022

No capes: the perils of being a hero-engineer

Jan 31, 2022 By Isaac Seymour In Incident.io

When I first started out as an engineer I really leant in to the idea of what’s often called “being a hero”; I would get to the office a bit early to make sure I could fix anything that had gone wrong overnight. I loved the camaraderie of someone outside engineering bringing their laptop over with a critical process broken for me to fix (even if I’d been the one to break it!). Being a hero feels really good for a while, but over time, it loses its shine.

Read Post

Incident.io

Read more about No capes: the perils of being a hero-engineer

Event Message - xMatters Support

Jan 31, 2022 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Event Message - xMatters Support

Getting Started with Playbooks

Jan 31, 2022 By Elli Ludwigson In Mattermost

It’s 2022: You’re good at your job, you’re maintaining modern systems, now you want to level up your team based on a solid foundation of their collective expertise. You want to standardize and centralize process documentation and make execution as easy and effective as possible so that everything runs smoothly, every time.

Read Post

Mattermost

Read more about Getting Started with Playbooks

What's New: Updates to Event Intelligence, On-Call Management, Automation, Mobile, and More!

Jan 31, 2022 By Vera Chan In PagerDuty

We’re excited to announce a new set of updates and enhancements to the PagerDuty platform. Recent updates from the product team include On-Call Management, Event Intelligence, and Mobile Products, to PagerDuty Community & Advocacy Events.

Read Post

PagerDuty

Read more about What's New: Updates to Event Intelligence, On-Call Management, Automation, Mobile, and More!

Reliability Through Automation for Your Infrastructure and Applications at Scale

Jan 28, 2022 By xMatters In xMatters

As technology becomes more SaaS-based and organizations deploy applications in multiple clouds, there are requirements for more visibility into the cloud environment and better incident response and resolution automation capabilities. The two elements required to achieve this are integrations and workflows in an incident response software solution and effective experimentation, research, and testing in the cloud and on-premise.

View Video

xMatters

Read more about Reliability Through Automation for Your Infrastructure and Applications at Scale

Intelligent Service Design

Jan 28, 2022 By Quintessence Anx In PagerDuty

Hello and welcome to the fourth post in our EI Architecture series focusing on Intelligent Alert Grouping. Previously we have talked about how to train Intelligent Alert Grouping using incident merges (here) and how to configure your alert titles to improve default matching. In this post, we’re going to cover how service design can also impact your experience with Intelligent Alert Grouping as well as the PagerDuty app in general.

Read Post

PagerDuty

Read more about Intelligent Service Design

DevOps Tools (All of the Tools Your Team Needs)

Jan 27, 2022 By Emily Arnott In Blameless

Wondering about DevOps Tools? We explain the best tools for every step of the DevOps development process. What are DevOps Tools used for? DevOps relies on effective tools to help teams manage the entire software development lifecycle. These tools can automate tasks, monitor applications, and facilitate sharing of information between teams.

Read Post

Blameless

Read more about DevOps Tools (All of the Tools Your Team Needs)

Training Intelligent Alert Grouping

Jan 27, 2022 By Quintessence Anx In PagerDuty

We’re continuing on with our third piece about how to utilize and improve your Intelligent Alert Grouping (IAG)! In case you missed it, the first two blog posts describe the feature (here) and explain how it uses merging to group alerts (here). We alluded to today’s post at the end of last: today we’ll be discussing how to use alert titles to improve IAG matches.

Read Post

PagerDuty

Read more about Training Intelligent Alert Grouping

What Your System Outage Notifications Need To Say

Jan 26, 2022 By Mario Guisado In xMatters

System outages happen to the best of us. Communicating with your customers and other stakeholders effectively during downtimes is vital to maintaining a solid relationship with them. When a system outage occurs, technical teams are tasked with swiftly locating the cause and resolving the issue, while communications teams are tasked with notifying stakeholders and customers about the outage to maintain transparency.

Read Post

xMatters

Read more about What Your System Outage Notifications Need To Say

Tracking Report - xMatters Support

Jan 26, 2022 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Tracking Report - xMatters Support

Using Event Orchestration to reduce noise and trigger next best action

Jan 26, 2022 By Vivian Chan In PagerDuty

We often hear from customers that they’re dealing with unmanageable levels of noise and complexity, which makes it harder to pinpoint root cause and get to resolution quickly. All this effort spent on sifting through noise, processing events, and gathering context results in a lot of wasted time. That’s why we’ve launched Event Orchestration, which became generally available to our Event Intelligence and Digital Operations customers on Monday.

Read Post

PagerDuty

Read more about Using Event Orchestration to reduce noise and trigger next best action

Announcing our newest integration: Confluence

Jan 26, 2022 By Dylan Nielsen In FireHydrant

Using FireHydrant’s Runbooks, incident and retro data can be automatically sent to Confluence at any point in the incident lifecycle. For example, the moment you’ve resolved an incident FireHydrant can create a fresh Confluence page with all of the critical incident information stored in FireHydrant. When utilizing Runbook conditions, you can choose the perfect moment to send your FireHydrant retro to a Confluence workspace.

Read Post

FireHydrant

Read more about Announcing our newest integration: Confluence

Five Ways Developers Can Help SREs

Jan 25, 2022 By Mayank Gupta In Squadcast

Reliability is a team game. More the collaboration between Developers and SREs, greater will be the success of the product. In this blog, we have listed down the five best practices that developers can adopt, to make the SRE's life easier. It is not easy to be a site reliability engineer. Monitoring system infrastructure and aligning them with the key reliability metrics is quite a daunting task. Whereas, a software engineer's job is to deliver high-quality software.

Read Post

Squadcast

Read more about Five Ways Developers Can Help SREs

Introducing CommsFlow for Context-Rich and Timely Updates to All Stakeholders

Jan 25, 2022 By Emily Arnott In Blameless

We’re so excited to announce our latest platform feature, CommsFlow™! This addition to the core Blameless product offering allows teams to keep stakeholders updated as the reliability of services and applications change. With our new automated and customizable communication flows, on-call, engineering, and business teams feel a sense of accomplishment and, of course, stay informed.

Read Post

Blameless

Read more about Introducing CommsFlow for Context-Rich and Timely Updates to All Stakeholders

Get Paid to Write About Mattermost Playbooks

Jan 25, 2022 By Ben Lloyd Pearson In Mattermost

Mattermost Playbooks help software engineering teams orchestrate their work across all tools and teams to plan projects and hit milestones by uniting your tech stack through a single point of collaboration. We want to see how our community is leveraging Playbooks in their own tech stack and share your creations with everyone so the whole community benefits. We’re doing this by launching a new effort to commission original blog articles that show Playbooks in action.

Read Post

Mattermost

Read more about Get Paid to Write About Mattermost Playbooks

February 2022 Everbridge Partner Newsletter: Channel

Jan 25, 2022 By Everbridge In Everbridge

There is a lot of buzz in the Channel Community regarding Everbridge. Listen to Jasmina Muller and Michael Antoniou share their insights and discuss how we can help our Everbridge Channel partners.

View Video

Everbridge

Incident Management

Read more about February 2022 Everbridge Partner Newsletter: Channel

How to get support?

Jan 25, 2022 By Matt In SIGNL4

This post provides an overview on the variety of options and to contact us, if you are seeking technical or other support from the SIGNL4 team. Though, SIGNL4 is designed as a self-service platform and provides an abundance of resources and help, we ‘human’ provide help through our agents whenever needed.

Read Post

SIGNL4

Read more about How to get support?

Episode 2: Mooving to Remix: Code You Will be Happy With

Jan 25, 2022 By Moogsoft Team In Moogsoft

Episode 2 of Mooving to… dives into a new tool called Remix, a framework to help create front-end code, you’ll love. This episode focuses on a new web framework that helps streamline your processes and eliminate downtime to the best of your ability. Thom Duran and Andrew Leonard of Moogsoft are joined by Kent C. Dodds, Director of Developer Experience at Remix.

Read Post

Moogsoft

Read more about Episode 2: Mooving to Remix: Code You Will be Happy With

Accelerate Your Cloud Migration for Financial Services with Morgan Stanley and Vista

Jan 24, 2022 By PagerDuty In PagerDuty

Learn how Morgan Stanley and Vista have streamlined incident response, moved to the cloud, and adopted service ownership to enable faster innovation and drive better user experiences for their customers.

View Video

PagerDuty

Read more about Accelerate Your Cloud Migration for Financial Services with Morgan Stanley and Vista

Respond to incidents faster than ever with the New Mobile Incident Details Redesign

Jan 24, 2022 By PagerDuty In PagerDuty

We’re working from anywhere, are you? With the PagerDuty mobile app, you’re always just a tap away from all the incident response tools you need. The new mobile Incident Details screen provides you with a more compelling visual experience and easier access to all your favorite features during incident response. Run a play, add a priority or note, post a status update, and more with the new carousel.

View Video

PagerDuty

Read more about Respond to incidents faster than ever with the New Mobile Incident Details Redesign

Event Orchestration Demo: Reduce Noise & Manage Event Routing with PagerDuty

Jan 24, 2022 By PagerDuty In PagerDuty

Say hello to the next generation of event rules and cut down on manual event processing. With Event Orchestration, you can create custom logic with nested rules to enrich, modify, and control routing or trigger automation actions based on event conditions at scale. (This feature is only available to Event Intelligence and Digital Operations plans).

View Video

PagerDuty

Incident Management

Read more about Event Orchestration Demo: Reduce Noise & Manage Event Routing with PagerDuty

AWS Re:Invent 2021 - Accelerate Your Cloud Migration for Financial Services

Jan 24, 2022 By PagerDuty In PagerDuty

Cloud migration and modernization projects for financial services are very complex initiatives with added challenges of visibility and incident response. He’s how we can help accelerate cloud adoption while reducing customer impact and streamlining and automating incident response.

View Video

PagerDuty

Read more about AWS Re:Invent 2021 - Accelerate Your Cloud Migration for Financial Services

Communicating to Users During Incidents

Jan 23, 2022 By Max Rozen In OnlineOrNot

Imagine you're having a regular day at work, opening up your browser, double checking something for a client in that web app your team built for them, when suddenly, you see this screen: You hit refresh a few times, just to be sure. Nope. Still down. What happens next depends on how well your team has planned for incidents like this (some folks call it unplanned downtime).

Read Post

OnlineOrNot

Read more about Communicating to Users During Incidents

Improving your team's on-call experience

Jan 22, 2022 By Max Rozen In OnlineOrNot

Your engineers probably dislike going on-call for your services. Some might even dread it. It doesn't have to be this way. With a few changes to how your team runs on-call, and deals with recurring alerts, you might find your team starting to enjoy it (as unimaginable as that sounds). I wrote this article as a follow-up to Getting over on-call anxiety.

Read Post

OnlineOrNot

Read more about Improving your team's on-call experience

FireHydrant Slack Incident Management Demo

Jan 22, 2022 By FireHydrant In FireHydrant

In this demo we'll look at how FireHydrant can solve the pains of quickly declaring and managing an incident, all from Slack.

View Video

FireHydrant

Read more about FireHydrant Slack Incident Management Demo

A Primer on the History and Evolution of Incident Management to Today

Jan 21, 2022 By JJ Tang In Rootly

Many of the concepts SREs take for granted about incident management originated with efforts to fight fires in California in the 1970s.

Read Post

Rootly

Read more about A Primer on the History and Evolution of Incident Management to Today

Getting over on-call anxiety

Jan 21, 2022 By Max Rozen In OnlineOrNot

You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.

Read Post

OnlineOrNot

Read more about Getting over on-call anxiety

Get Started with Playbooks Permissions

Jan 19, 2022 By Stephen Van Hemmen In Mattermost

The goal of Mattermost Playbooks is to help teams consistently orchestrate any and all recurring workflows. A Playbook is a prescribed, repeatable process that a team has agreed on and formalized as a collaborative checklist saved on their Mattermost server. We at Mattermost use Playbooks for incident collaboration, customer onboarding, and product releases, along with many other complex processes.

Read Post

Mattermost

Read more about Get Started with Playbooks Permissions

A single pane of glass for automatic incident response for Bridgeport Public School District

Jan 19, 2022 By Derdack In Derdack

“I have been doing this for 20+ years and have been using literally every product out there. Derdack is unique at how issues are addressed and communicated out because of the seamless integration, maturity and flexibility of the platform. Working with Derdack has been a game changer for us and helped us to do more with less.” Jeff Postolowski, Director Information Technology Services, Bridgeport Public School District

Read Post

Derdack

Read more about A single pane of glass for automatic incident response for Bridgeport Public School District

What is Incident Response?

Jan 18, 2022 By Peter Kosa In xMatters

When a service is down, a system is failing, or a security issue is in the midst of occurring, organizations need a solid incident response process to get up and running again. Incident response isn't just for high severity, lights out incidents either; if you've rebooted your computer to fix a problem, you've been an incident responder yourself! Incidents happen, and any successful organization knows that instead of pretending that one day nothing will ever go wrong, it's far more useful to develop a comprehensive operational response plan. And to do so, you need to know what incident response is! Let's get into it.

Read Post

xMatters

Read more about What is Incident Response?

Why SRE Benefits Your Organization's Teams & Your Customers

Jan 18, 2022 By Emily Arnott In Blameless

Wondering why you should choose SRE for your organization? We will explain what it is and all the benefits it can bring to your organization. What are the benefits of SRE?

Read Post

Blameless

Read more about Why SRE Benefits Your Organization's Teams & Your Customers

Improve Incident Response by Getting Control of Your (Unintelligent) Swarm

Jan 18, 2022 By Mandi Walls In PagerDuty

Incidents happen. Things go wrong. Systems fail. Sometimes they fail in unexpected and dramatic ways that create Major Incidents. PagerDuty makes a very specific distinction between an incident and an Incident. Your organization may also make such a distinction. Determining if an incident is major or not can come down to a number of factors, or a specific combination of factors, like the number of services affected, the customer impact, and the duration of the incident.

Read Post

PagerDuty

Read more about Improve Incident Response by Getting Control of Your (Unintelligent) Swarm

Achieving Maximum Patient Satisfaction Through Effective Clinical Communications

Jan 18, 2022 By OnPage Corporation In OnPage

Judit Sharon, CEO and founder of OnPage Corporation, sits down with Healthcare Innovation to discuss how advanced, effective clinical communication systems help teams achieve ultimate patient satisfaction. How has the landscape around time-sensitive communications between and among clinicians and others in patient care delivery, evolved in the past few years?

Read Post

OnPage

Read more about Achieving Maximum Patient Satisfaction Through Effective Clinical Communications

The SIGNL4 mobile App

Jan 17, 2022 By SIGNL4 In SIGNL4

Brief overview of the functions of the SIGNL4 mobile App

View Video

SIGNL4

Read more about The SIGNL4 mobile App

All Events Report - xMatters Support

Jan 17, 2022 By xMatters In xMatters

View Video

xMatters

Read more about All Events Report - xMatters Support

Benefits of Enterprise Alert's Mobile App

Jan 17, 2022 By Derdack In Derdack

Being in touch with your customers is key to any business. We at Derdack pride ourselves in being customer first when it comes to not only product enhancements and features but also support and building that customer/vendor relationship that lasts for years. We recently took a trip to Texas to visit several customers and the feedback was invaluable! We received a lot more information with a face-to-face meeting that just would not be the same if it were done virtually, like over Teams.

Read Post

Derdack

Read more about Benefits of Enterprise Alert's Mobile App

Communicating to Users During Incidents

Jan 14, 2022 By Max Rozen In OnlineOrNot

Read Post

OnlineOrNot

Read more about Communicating to Users During Incidents

Top 5 Incidents and Outages of 2021

Jan 14, 2022 By Quentin Rousseau In Rootly

An overview of major IT incidents and outages in 2021

Read Post

Rootly

Read more about Top 5 Incidents and Outages of 2021

Event Log - xMatters Support

Jan 14, 2022 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Event Log - xMatters Support

The SIGNL4 mobile app

Jan 14, 2022 By SIGNL4 In SIGNL4

Brief overview of the functions of the SIGNL4 mobile app

View Video

SIGNL4

Read more about The SIGNL4 mobile app

Presenting Role-Based Access Control for Squadcast users

Jan 13, 2022 By Vardhan NS In Squadcast

Role-Based Access Control is an effective means to enable authentication and ensure only the authorized personnel have access to sensitive data within the platform. This blog explains how to implement RBAC in your organization's Squadcast account to achieve maximum security and confidentiality during Incident Management. We recently released this new functionality into Squadcast (called RBAC) that helps organizations fine-grain the access control provided to users within our platform.

Read Post

Squadcast

Read more about Presenting Role-Based Access Control for Squadcast users

Canary Deployments | The Benefits of an Iterative Approach

Jan 13, 2022 By Emily Arnott In Blameless

At Blameless, we want to embrace all the benefits of the SRE best practices we preach. We’re proud to announce that we’ve started using a new system of feature flagging with canaried and iterative rollouts. This is a system where new releases are broken down and flagged based on the features each part of the release implements. Then, an increasing subset of users are given access to an increasing number of features.

Read Post

Blameless

Read more about Canary Deployments | The Benefits of an Iterative Approach

Want to accelerate your organization's digital innovation in 2022? Here's three ways to do it.

Jan 13, 2022 By Julian Dunn In PagerDuty

After two years of sky-high spending on cloud and related technologies, 2022 is the crunch point for corporate IT and digital leaders. Investments in technology helped facilitate the rapid shift to mass hybrid working and supported businesses to embrace the digital-first models of the new normal. But beyond merely investments to support new working styles, leaders also must ensure their organization continues to innovate.

Read Post

PagerDuty

Read more about Want to accelerate your organization's digital innovation in 2022? Here's three ways to do it.

What is a Workflow?

Jan 12, 2022 By xMatters In xMatters

Workflows are no stranger in the DevOps world. But where did this term come from, and what does it really mean? Perhaps it’s no surprise that workflows originated from the industrial revolution, which brought powerful machinery for mobilizing huge workforces unlike ever before. To maximize the potential of these new industrial tools, people had to first figure out the best way to use them to get work done as efficiently as possible.

Read Post

xMatters

Read more about What is a Workflow?

Properties Report - xMatters Support

Jan 12, 2022 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Properties Report - xMatters Support

Incident Response in less than 2 minutes by PagerDuty

Jan 12, 2022 By PagerDuty In PagerDuty

Orchestrate the right response for every incident. See how PagerDuty can help your team respond to incidents better in under 120 seconds with features like Rundeck Actions, postmortems, analytics, and key integrations with collaboration and ITSM tools.

View Video

PagerDuty

Read more about Incident Response in less than 2 minutes by PagerDuty

Effective Incident Management: How to Improve Collaborative Software Development

Jan 12, 2022 By JFrog In JFrog

* Are you using Azure DevOps as the starting point of your delivery process on the Azure cloud? Join this webinar to learn advanced tips and tricks for simplifying and accelerating your CI/CD pipelines with Azure DevOps and the JFrog Platform. Sharing a detailed demo of a real-world release pipeline triggered from Azure DevOps, we’ll review best practices and hard-won lessons for how you can streamline your end-to-end process and ensure it meets the security and quality requirements of large-scale enterprise delivery.

View Video

JFrog

Read more about Effective Incident Management: How to Improve Collaborative Software Development

PagerDuty Named a Leader in the Latest G2 Grid for AIOps Platforms

Jan 12, 2022 By Heath Newburn In PagerDuty

At PagerDuty, we are committed to championing the customer — it’s a core company value. Our product has to provide great value, we have to provide excellent service, and we need to make it simple to do business with us. The Winter 2022 G2 Grid for AIOps Platforms Relationship Index showcases these values and highlights PagerDuty as a leading player in the AIOps space.

Read Post

PagerDuty

Read more about PagerDuty Named a Leader in the Latest G2 Grid for AIOps Platforms

It's a great day to be a Panda.

Jan 12, 2022 By Assaf Resnick In BigPanda

I am excited to announce today that BigPanda has secured $190 million in financing at a $1.2 billion valuation. This financing was led by Advent International and Insight Partners, together with our other existing investors. BigPanda is now officially a unicorn, and the clear leader in the rapidly growing AIOps market!

Read Post

BigPanda

Read more about It's a great day to be a Panda.

xMatters Ninja Release Updates - xMatters Demo

Jan 11, 2022 By xMatters In xMatters

Join Belinda Joseph, Sr. Director of Marketing Events, and Corey Blakeborough, Solutions Architect, as they highlight and walkthrough some of the fantastic new features that rolled out during the xMatters Ninja release. Some of these great new features include the service dependencies map, the automation of digital and business response, and brand new unified alert reports!

View Video

xMatters

Incident Management

Read more about xMatters Ninja Release Updates - xMatters Demo

Equitably distribute on-call responsibility and streamline incident response with Round Robin Scheduling

Jan 11, 2022 By Hannah Culver In PagerDuty

PagerDuty is excited to introduce Round Robin Scheduling. Round Robin Scheduling allows teams to equitably distribute on-call shift responsibilities amongst team members. Automatically assigning new incidents across different users or on-call schedules on an escalation level ensures that teams are resolving incidents as efficiently as possible. And, by balancing the workload across multiple users, there’s less risk of burnout.

Read Post

PagerDuty

Read more about Equitably distribute on-call responsibility and streamline incident response with Round Robin Scheduling

What exactly is Digital Operations?

Jan 11, 2022 By AlertOps In AlertOps

IT modernization (for example, cloud computing), digital optimization, and the creation of new digital business models are all examples of digital transformation. The concept of combining company processes with agility, intelligence, and automation to build operational models that delight consumers while also improving performance is known as digital operations.

Read Post

AlertOps

Read more about What exactly is Digital Operations?

Call Routing for your 24/7 hotline

Jan 11, 2022 By SIGNL4 In SIGNL4

How SIGNL4 provides intelligent call routing and a voice mailbox with alerts for your 24/7 service hotline

View Video

SIGNL4

Read more about Call Routing for your 24/7 hotline

Equitably distribute on-call shifts with Round Robin Scheduling

Jan 11, 2022 By PagerDuty In PagerDuty

See how Round Robin Scheduling allows teams to equitably distribute on-call shift responsibilities amongst team members. Round Robin is generally available for Business and Digital Operations plans.

View Video

PagerDuty

Read more about Equitably distribute on-call shifts with Round Robin Scheduling

iOS Sending Messages - xMatters Support

Jan 10, 2022 By xMatters In xMatters

View Video

xMatters

Read more about iOS Sending Messages - xMatters Support

Intelligent Swarming vs. Tiered Support: How Customer Service Teams can use PagerDuty to Swarm Critical Issues

Jan 10, 2022 By Nancy Lee In PagerDuty

Most support organizations today adopt some form of the traditional tiered support model. It is one that is based on a process of escalations and customer handoffs. Under this model, customer issues get escalated through multiple levels of a support hierarchy, with three tiers being a common workflow.

Read Post

PagerDuty

Read more about Intelligent Swarming vs. Tiered Support: How Customer Service Teams can use PagerDuty to Swarm Critical Issues

Sending messages on Android - xMatters Support

Jan 7, 2022 By xMatters In xMatters

View Video

xMatters

Read more about Sending messages on Android - xMatters Support

Learn how PagerDuty can help address critical work across all departments

Jan 7, 2022 By Hannah Culver In PagerDuty

PagerDuty’s Operations Cloud helps organizations with critical work across the entire business, from IT teams to customer service to human resources, marketing, sales, and more. With PagerDuty, organizations can prioritize accurately, respond efficiently, and reduce operational overhead. In this blog post, we’ll share examples of how PagerDuty can be used for critical work in all departments, not just IT, using our new Solution Guides for Business.

Read Post

PagerDuty

Read more about Learn how PagerDuty can help address critical work across all departments

SRE and the Practice of Practice

Jan 6, 2022 By Matt Davis In Blameless

Part of the trepidation of being on-call is encountering unfamiliar emergency scenarios where we are surprised by suddenly not knowing how to do our jobs. We feel lost and alone, complicated by the world around us, powerless to resolve or even mitigate the problem. On-call need not be a solo affair full of fear and anxiety. There are ways we can employ practice and open collaboration outside of incidents to prepare us better.

Read Post

Blameless

Read more about SRE and the Practice of Practice

What the Ideal Incident Lifecycle Should Be

Jan 5, 2022 By Mark Henzi In xMatters

Today’s organizations are managing increasingly complex IT ecosystems and pressured to deliver on innovation—all while trying to maintain service performance and reliability to keep up with the always-on digital economy. With IT complexity growing exponentially, incidents have become a common, if not day-to-day struggle for many businesses. Incident management is the process or method that modern organizations use to prepare for and respond to service disruptions.

Read Post

xMatters

Read more about What the Ideal Incident Lifecycle Should Be

The Universal Language: Reliability for Non-Engineering Teams

Jan 5, 2022 By Emily Arnott In Blameless

We talk about reliability a lot from the context of software engineering. We ask questions about service availability, or how important it is for specific users. But when organizations face outages, it becomes immediately obvious that the reliability of an online service or application is something that impacts the entire business with significant costs. A mindset of putting reliability first is a business imperative that all teams should share.

Read Post

Blameless

Read more about The Universal Language: Reliability for Non-Engineering Teams

Building an SRE Team with Specialization

Jan 5, 2022 By Emily Arnott In Blameless

As organizations progress in their reliability journey, they may build a dedicated team of site reliability engineers. This team can be structured in two major ways: a distributed model, where SREs are embedded in each project team, providing guidance and support for that team; and a centralized model, where one team provides infrastructure and processes for the entire organization.

Read Post

Blameless

Read more about Building an SRE Team with Specialization

iOS Events Report - xMatters Support

Jan 5, 2022 By xMatters In xMatters

View Video

xMatters

Read more about iOS Events Report - xMatters Support

The Human Side of Being On-call: 5 Lessons for Managing Stress, Anxiety, and Life While Being On-call

Jan 5, 2022 By Derek Ralston In PagerDuty

Within DevOps, we talk a lot about the on-call process—but what about the human side of being on-call? For example, what are effective ways of managing stress and anxiety during a shift? How can one manage life situations that make being on-call difficult—such as being responsible for watching the kids during an on-call rotation? And how can an empathic team culture help prevent burnout and turnover?

Read Post

PagerDuty

Read more about The Human Side of Being On-call: 5 Lessons for Managing Stress, Anxiety, and Life While Being On-call

How to Effectively Lead High-Performing Engineering Teams

Jan 5, 2022 By Harrison Calato In Honeycomb

The overall theme is high-performing engineering teams are generally the ones that humanize the process. Whether you’re trying to increase productivity or release better-quality code, the biggest piece of advice is to lead with empathy.

Read Post

Honeycomb

Read more about How to Effectively Lead High-Performing Engineering Teams

Stakeholder Notifications

Jan 4, 2022 By AlertOps In AlertOps

With the AlertOps ServiceNow integration, you can automatically send updates to stakeholders. Set each update to use the notification channel you choose (email, voice, SMS, mobile app, and chat). Set triggers to send alerts on any condition, such as SLA breaches, status changes or any custom field change. Automatically updates at time points that you set. AlertOps also logs all activities in ServiceNow so you can track everything in one place.

Read Post

AlertOps

Read more about Stakeholder Notifications

Major Incident Notifications

Jan 4, 2022 By AlertOps In AlertOps

With the AlertOps ServiceNow integration, during a major incident, you can automatically send notifications to targeted groups of users (managers, stakeholders, customer service). Each group can have its own unique status update fields, so you can send contextual information with dynamic updates to each group at regular intervals, and a final message when the incident is resolved. Set each notification to use the notification channel you choose (email, voice, SMS, mobile app, and chat).

Read Post

AlertOps

Read more about Major Incident Notifications

Squadcast + Amazon EventBridge: Routing Alerts Made Easy

Jan 4, 2022 By Vishal Padghan In Squadcast

Amazon EventBridge is an AWS serverless event bus service making it easier to build event-driven applications. It uses events generated from your applications, integrated Software-as-a-Service (SaaS) applications, and other AWS services. It delivers a stream of real-time data from event sources to target services like AWS Lambda. You can also set up routing rules to determine the destination where you wish to send the data and build decoupled application architectures.

Read Post

Squadcast

Read more about Squadcast + Amazon EventBridge: Routing Alerts Made Easy

Fairwinds: Kubernetes Guardrails and Governance to Enable Developers and Reduce Risk

Jan 4, 2022 By PagerDuty In PagerDuty

Customers of both PagerDuty and Fairwinds Insights can generate and customize PagerDuty incidents for critical issues in their Kubernetes clusters. This capability includes over 100 checks that have been built-in to Fairwinds Insights for things like container vulnerabilities, insecure workload configurations, runtime security events, and resource usage—as well as custom user-defined policies for compliance and internal requirements.

View Video

PagerDuty

Read more about Fairwinds: Kubernetes Guardrails and Governance to Enable Developers and Reduce Risk

Android Events Report - xMatters Support

Jan 3, 2022 By xMatters In xMatters

View Video

xMatters

Read more about Android Events Report - xMatters Support

7 Incident Management Best Practices for Long Term Success

Jan 3, 2022 By Mike Bennett In xMatters

Incidents can have a massive impact on your operations, negatively affecting customers, employees, and stakeholders. Preparing in advance is the best way to restore normal service operations as quickly as possible.

Read Post

xMatters

Read more about 7 Incident Management Best Practices for Long Term Success

Enterprise Alert 9.2 Update Brings Great Flood Protection Enhancements

Jan 3, 2022 By Derdack In Derdack

We have released another update for Enterprise Alert 9 (version 9.2) which enhances the flood protection mechanism. This will help you to setup scenarios where you do not want the flood protection to be active for every notification channel. Read all details in this article.

Read Post