December 2021

Looking back at our journey through 2021!

Dec 30, 2021 By Squadcast Community In Squadcast

As we step into another year, its time to reflect back on our most memorable moments & milestones that tell the story of Squadcast in 2021. 😇 The last 12 months have been nothing short of a spectacular journey for us as a company. We raised funding (Yaay!🙌), launched an open-source tool called SLO Tracker, helped organizations globally improve their reliability and made on-call shifts in general less stressful. Here’s how our year went by.

Read Post

Squadcast

Read more about Looking back at our journey through 2021!

Outage Alert: Top 10 Downtime Incidents of 2021

Dec 29, 2021 By Maddie Welsh In uptime

2021 has been an eye-opening year for both businesses and consumers who use popular websites and applications. We have all seen notable increases in the frequency and severity of outages as dependency on internet infrastructure grows – with no signs of slowing down. With our reliance on automation and connectivity expected to increase in 2022 – let’s review some of the top internet outages and website downtime incidents of 2021.

Read Post

uptime

Read more about Outage Alert: Top 10 Downtime Incidents of 2021

What to Expect From xMatters in 2022

Dec 29, 2021 By xMatters In xMatters

With only a few days left of 2021, we all know what that means: making New Year’s resolutions. While some love the tradition of laying out their goals for the coming 12 months, others loathe it with a passion. And with approximately 80% of people failing to achieve their resolutions, it’s easy to see why there’s so much resentment towards this common habit. At xMatters, we plan to—and often do—beat those odds.

Read Post

xMatters

Read more about What to Expect From xMatters in 2022

SRE Predictions 2022 | Blameless SRE

Dec 28, 2021 By Emily Arnott In Blameless

As the new year approaches, we at Blameless like to ponder the future of Reliability Engineering. For 2021, we predicted that the practice of site reliability engineering (SRE) would continue to grow in terms of adoption, we would see adoption increase faster among smaller organizations, and SRE practices would get more attention to drive adoption compared to hiring. We’re sure you’ll agree that these trends have indeed strengthened in the last year.

Read Post

Blameless

Read more about SRE Predictions 2022 | Blameless SRE

On-Call Escalations

Dec 28, 2021 By AlertOps In AlertOps

With the AlertOps ServiceNow integration, you can use automatic escalations for on-call schedules and create custom escalations. Automatically escalate to a level 2 or level 3 team and notify management and stakeholders. Set each escalation to use the notification channel you choose (email, voice, SMS, mobile app, and chat). Set your escalations to trigger reminders when a response SLA or a resolution SLA has been breached or is approaching the deadline.

Read Post

AlertOps

Read more about On-Call Escalations

Service Catalog - xMatters Support

Dec 27, 2021 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Service Catalog - xMatters Support

Signl4 Who's On Duty Feature

Dec 23, 2021 By SIGNL4 In SIGNL4

Introducing Signl4's new feature the Who's on Duty page

View Video

SIGNL4

Read more about Signl4 Who's On Duty Feature

Tips & Tricks: Keeping Track of Event-Processing Delays

Dec 23, 2021 By Derdack In Derdack

A couple of weeks ago our partner Rok Ponikvar from S&T contacted me about an issue one of his customers faced. His customer complained that Enterprise Alert is not alerting on current issues and even if he creates a test ticket in his OBM system no alert goes out. After a little back and forth we concluded that Enterprise Alert is still processing historic data from an Event Storm in OBM earlier that day.

Read Post

Derdack

Read more about Tips & Tricks: Keeping Track of Event-Processing Delays

Common Security related Questions and Answers

Dec 23, 2021 By Derdack In Derdack

In light of the recent news about yet another reported Zero-Day Exploit and the accompanying discussions about security, let’s touch on the topic of security audits and how Enterprise Alert can be configured to avoid or at least minimize potential security impact. First, let’s establish what we mean by security audit.

Read Post

Derdack

Read more about Common Security related Questions and Answers

Creating Dynamic Teams - xMatters Support

Dec 22, 2021 By xMatters In xMatters

Dynamic teams in xMatters are groups that don’t have fixed team members but are created at the time of an event based on predefined criteria. You can add dynamic teams as members of a group, or target them directly with notifications.

View Video

xMatters

Incident Management

Read more about Creating Dynamic Teams - xMatters Support

How to Measure Uptime SLOs Using Pingdom and Nobl9

Dec 22, 2021 By Pingdom In SolarWinds

Do you find yourself asking, “What should our first service-level objective (SLO)be?” The simplest way to get started if you have a website is to measure uptime SLOs. The SLO will measure your uptime and how your site compares to your reliability goals. By following the steps outlined here, you can get up and running with your first SLO in minutes. To get started, you’ll need to set up an account on SolarWinds® Pingdom®.

Read Post

SolarWinds

Read more about How to Measure Uptime SLOs Using Pingdom and Nobl9

Oracle's Cerner Acquisition Will Drive Smarter Care Decisions

Dec 22, 2021 By Ritika Bramhe In OnPage

Oracle is gearing up to execute the largest deal in its entire history – the company has agreed to buy Cerner, a leading electronic health records vendor, for $28.3 billion. The Cerner acquisition is slated to be an all-cash deal of $95/share and is expected to complete early next year. Cerner is a healthcare technology firm that streamlines health information and facilitates its accessibility for modern clinical teams.

Read Post

OnPage

Read more about Oracle's Cerner Acquisition Will Drive Smarter Care Decisions

Enhanced Enterprise Alert Reporting with Power BI

Dec 22, 2021 By Derdack In Derdack

The benefits of using the correct reporting, analytics and information delivery capabilities can transform an organization. Having access to timely data, reporting, and analytic capabilities helps to ensure the ability to get the right data to the right users at the right time. Having the ability to pull any information that your business needs at any given time allows for the flexibility to get the information for your business when and where it is needed.

Read Post

Derdack

Read more about Enhanced Enterprise Alert Reporting with Power BI

12 Days of Tip-Mas with xMatters

Dec 22, 2021 By Megan Lo In xMatters

With Christmas only a few days away, we’d like to do a round-up of something extra festive that we’ve been sharing on social media: The 12 days of Tip-Mas with xMatters! Each day offers the “gift” of a top tip, a resource, or fun fact about xMatters. So go ahead — sing along to get into the holiday spirit!

Read Post

xMatters

Read more about 12 Days of Tip-Mas with xMatters

Incident Management Software | The Best Tools for Your Team

Dec 21, 2021 By Noor-ul-Anam Ruqayya In Blameless

Wondering about Incident Management Software? We explain the best incident management software tools and how they work.

Read Post

Blameless

Read more about Incident Management Software | The Best Tools for Your Team

Using context.Context to mock API clients

Dec 21, 2021 By Lawrence Jones In Incident.io

We've found a pattern to mock external client libraries while keeping code simple, reducing the number of injection spots and ensuring all the code down a callstack uses the same mock client. Establishing patterns like these is what makes test suites great, and improves developer productivity when writing tests. Here's how it works.

Read Post

Incident.io

Read more about Using context.Context to mock API clients

Use Microservices to Modernize IT Operations

Dec 20, 2021 By xMatters In xMatters

Many organizations are experiencing the need to modernize their IT systems to keep pace in an increasingly digital world. Adopting DevOps helps companies implement and initialize the modernization processes. At xMatters, our path to IT modernization has included implementing DevOps, but we have done it a little differently to ensure we are using agile processes.

Read Post

xMatters

Read more about Use Microservices to Modernize IT Operations

Why leading healthcare organizations recommend OnPage

Dec 20, 2021 By OnPage In OnPage

Adrienne, a Family and Nurse Practitioner at a leading healthcare organization recommends OnPage to anyone looking to adopt a clinical communication and collaboration solution. Keep watching to learn how her organization adopts OnPage to enhance their after-hours call paging workflows.

View Video

OnPage

Read more about Why leading healthcare organizations recommend OnPage

Who's on Call Report - xMatters Support

Dec 20, 2021 By xMatters In xMatters

The ‘Who’s on Call?’ report in xMatters gives you an at-a-glance view into the on-call status across the groups in your organization.

View Video

xMatters

Read more about Who's on Call Report - xMatters Support

We've successfully completed our SOC 2 audit

Dec 20, 2021 By Chris Evans In Incident.io

We're very pleased to announce that incident.io is now SOC 2 compliant, having successfully completed our Type I audit. Put simply, this means an external auditor has looked at how the company is operating, and how our software is managed and operated, and confirmed that we meet a set of high security standards.

Read Post

Incident.io

Read more about We've successfully completed our SOC 2 audit

A Site Reliability Engineer's Guide to the Holiday Season

Dec 17, 2021 By JJ Tang In Rootly

SREs face special challenges during the holidays. Here’s how to manage them.

Read Post

Rootly

Read more about A Site Reliability Engineer's Guide to the Holiday Season

Group Performance Report - xMatters Support

Dec 17, 2021 By xMatters In xMatters

The Group Performance Report in xMatters displays a group's event response statistics, letting organizations compare how groups are handling the events assigned to them.

View Video

xMatters

Incident Management

Read more about Group Performance Report - xMatters Support

OnPage Redefines On-Call Management With Digital Fail-Safe Scheduling

Dec 16, 2021 By OnPage In OnPage

OnPage Corporation Provides a Flexible, Error-Free Way to Democratize On-Call Schedule Creation for Response Teams.

Read Post

OnPage

Read more about OnPage Redefines On-Call Management With Digital Fail-Safe Scheduling

PowerBI and Enterprise Alert

Dec 16, 2021 By Derdack In Derdack

Enhanced Enterprise Alert reporting using PowerBI

View Video

Derdack

Read more about PowerBI and Enterprise Alert

Respond like superheroes with Derdack & SKyPRO

Dec 16, 2021 By Derdack In Derdack

Joint webinar session including live demo with Derdack & SKyPRO

View Video

Derdack

Read more about Respond like superheroes with Derdack & SKyPRO

Why SKyPRO Partners with Derdack

Dec 16, 2021 By Derdack In Derdack

Joint webinar session including live demo with Derdack & SKyPRO

View Video

Derdack

Read more about Why SKyPRO Partners with Derdack

Managed IT Service Provider, BDNet Corporate Networking Recommends OnPage

Dec 16, 2021 By OnPage In OnPage

In this video, Brian Domschke, CEO of BD Net Corporate Networking recommends OnPage for on-call management. Keep watching to learn how his organization leverages OnPage's digital fail-safe scheduling capabilities and alerting system to notify on-call staff after hours. OnPage continues to empower Managed Service Providers of all sizes to accelerate incident remediation for clients and provide exceptional IT services.

View Video

OnPage

Read more about Managed IT Service Provider, BDNet Corporate Networking Recommends OnPage

On-call by default

Dec 16, 2021 By Chris Evans In Incident.io

Like many SaaS businesses, we have an on-call rota to enable us to provide 24x7 cover if there are problems with incident.io. We have a 'pager' which will alert the relevant person if something unexpected happens in our app, so that they can investigate and fix it if needed. Note: This was adapted from an internal document we wrote about how we think about on-call at incident.io.

Read Post

Incident.io

Read more about On-call by default

The Exec's Guide to Embracing Availability: From Tick Mark to SLAs

Dec 15, 2021 By Phil Tee In Moogsoft

5 suggestions to mark the embrace of availability for executives during a time of digital transformation.

Read Post

Moogsoft

Read more about The Exec's Guide to Embracing Availability: From Tick Mark to SLAs

What does a DevOps Engineer do? We analyzed 29 job postings to find out.

Dec 15, 2021 By Pruthvi In Spike

Introduction As all companies become software driven, DevOps is becoming an important practice in enterprises and startups across the world. DevOps is about bringing velocity to delivering tech products and services, so you can delight customers and meet business goals. To achieve this velocity, development (dev) and operations (ops) teams work closely together across the software lifecycle - from planning to release. And this has led to a new role in engineering teams - DevOps Engineer.

Read Post

Spike

Read more about What does a DevOps Engineer do? We analyzed 29 job postings to find out.

PagerDuty for Facilities and Crisis Response

Dec 15, 2021 By PagerDuty In PagerDuty

Jason Flint, Senior Manager of Facilities and Crisis Response at PagerDuty joins the stream to chat about how PagerDuty the company uses PagerDuty the platform to meet the needs of an increasingly distributed workforce. His team keeps track of everything from extreme weather events to political unrest that might impact PagerDuty employees.

View Video

PagerDuty

Incident Management

Read more about PagerDuty for Facilities and Crisis Response

The Best Tools for System Monitoring

Dec 15, 2021 By Holly Pontifex In xMatters

It takes a lot to run a modern business. From websites to technical solutions and everything in between, it’s no surprise we need better monitoring systems to make sure everything is operational. With multiple gears turning at once on any given platform, incidents are inevitable—especially for companies that are constantly growing and innovating. And the impact of incidents can affect user services, operations, and even business reputation.

Read Post

xMatters

Read more about The Best Tools for System Monitoring

Group Roster Tab - xMatters Support

Dec 15, 2021 By xMatters In xMatters

The Roster tab in xMatters shows a list of all members in the group and the shifts they belong to. Users can easily add and remove group members and their shifts, and identify group members who may have too many or few shifts.

View Video

xMatters

Incident Management

Read more about Group Roster Tab - xMatters Support

How Disaster Ready are Your Backup Systems, Really?

Dec 14, 2021 By Emily Arnott In Blameless

In SRE, we believe that some failure is inevitable. Complex systems receiving updates will eventually experience incidents that you can’t anticipate. What you can do is be ready to mitigate the damage of these incidents as much as possible. One facet of disaster readiness is incident response - setting up procedures to solve the incident and restore service as quickly as possible. Another strategy involves reducing the chances for failure with tactics like reducing single points of failure.

Read Post

Blameless

Read more about How Disaster Ready are Your Backup Systems, Really?

Breaking down complex projects into smaller, shippable increments

Dec 14, 2021 By Lisa Karlin Curtis In Incident.io

Building a complex new product can be scary. What if no-one gets value from it? What if it doesn't work? What if it's hard to change? One way to mitigate these risks is to break down the product into smaller shippable increments, allowing you to capture feedback early and confirming the most important assumptions before fully committing to a solution.

Read Post

Incident.io

Read more about Breaking down complex projects into smaller, shippable increments

Automating Work in Real Time Through the PagerDuty Operations Cloud

Dec 14, 2021 By Greg Chase In PagerDuty

Earlier this fall, we announced a significant evolution in the IT process automation portfolio at PagerDuty—the general availability of PagerDuty Rundeck Actions and early access for Rundeck Cloud. These new offerings reflect our vision to enable companies to take real-time actions by democratizing access to automation. In other words, to quickly and safely delegate automated IT processes to the IT users (and APIs) that need them to get work done.

Read Post

PagerDuty

Read more about Automating Work in Real Time Through the PagerDuty Operations Cloud

The Principles of DevSecOps

Dec 13, 2021 By Stephen Walters In xMatters

As a Solution Architect here at xMatters, an Everbridge Company, and through my 30-year career in the IT industry, I've seen many frameworks offering bold new ideas. CMMI, ITIL, Prince 2, Agile, Scrum, and most recently, DevOps. These frameworks come and go, offering huge improvements in the way we deliver and manage our IT capabilities, but never lasting long enough to act on those promises. That's not to say they haven't made a marked difference in the IT space, or that they haven't been hugely impactful for organizations around the globe. They become launching off points for a new framework, and now there's a new term that's appeared, DevSecOps.

Read Post

xMatters

Read more about The Principles of DevSecOps

What Does ROI Really Mean?

Dec 13, 2021 By xMatters In xMatters

ROI might be one of the most popular business acronyms in recent memory, and business to business, the definition remains the same: return on investment. No matter the industry, leaders are concerned with ROI and ensuring that every dollar spent is used in the best interest of the organization. But in practice, what does ROI really mean? Let’s discuss!

Read Post

xMatters

Read more about What Does ROI Really Mean?

Everbridge Resident Connection

Dec 13, 2021 By Everbridge In Everbridge

Extend your Community Lifelines by maximizing the Whole Community approach to emergency communications. With Resident Connection, public alerting authorities can expand beyond their current reach to build a more informed and aware community during emergencies.

View Video

Everbridge

Incident Management

Read more about Everbridge Resident Connection

Take the Lead: Jennifer Tejada & Ebony Beckwith

Dec 13, 2021 By PagerDuty In PagerDuty

Listen in on our latest Take the Lead with CEO Jennifer Tejada and Ebony Beckwith of Salesforce. They talk about volunteering amidst a global pandemic and the importance of keeping employees engaged in Social Impact efforts.

View Video

PagerDuty

Incident Management

Read more about Take the Lead: Jennifer Tejada & Ebony Beckwith

Derdack Saves Christmas

Dec 13, 2021 By Derdack In Derdack

The tale of how Derdack Solutions help Santa Claus save Christmas

View Video

Derdack

Read more about Derdack Saves Christmas

Managing Shifts - xMatters Support

Dec 13, 2021 By xMatters In xMatters

The Shifts tab in xMatters allows you to create on-call schedules, define escalations, and create shift rotations for your groups.

View Video

xMatters

Incident Management

Read more about Managing Shifts - xMatters Support

Shhh... we have Private Incidents

Dec 13, 2021 By Vinessa Wan In FireHydrant

We’re excited to announce that private incidents are now available on FireHydrant. For the first time, incidents can have visibility limited to only permissioned users are able to see. This is a great solution for security and compliance teams who need to collaborate with their engineering counterparts to resolve incidents. The nature of these incidents that these teams work on dramatically differs from operational incidents.

Read Post

FireHydrant

Read more about Shhh... we have Private Incidents

Uncovering the Importance of Mean Time Between Failures

Dec 10, 2021 By Christopher Gonzalez In OnPage

In the IT world, application service providers (ASPs) build customer trust by ensuring the continuous, uninterrupted availability of their services and software. Service availability allows customers to operate normally and generate revenue without being directly impacted by their providers’ system failures. Though providers work to ensure system uptime, they are often challenged by unexpected technical issues that impact customer-facing systems.

Read Post

OnPage

Read more about Uncovering the Importance of Mean Time Between Failures

Monthly Moo Update | December 2021

Dec 10, 2021 By Adam Frank In Moogsoft

What a year 2021 has been for us all. We are extremely proud of the continuous innovation and delivery of new features and functionality we have provided throughout the year, all while maintaining enterprise scale and uptime that could win awards. We’ve heard success story after success story from our brilliant customers, each unique in their own way. We couldn’t have had the successful year we’ve had without you, and it’s been our honor to be part of your success.

Read Post

Moogsoft

Read more about Monthly Moo Update | December 2021

Practical Guide to SRE: Infrastructure-as-Code (IaC)

Dec 10, 2021 By Quentin Rousseau In Rootly

An overview of how SREs can benefit from Infrastructure-as-Code.

Read Post

Rootly

Read more about Practical Guide to SRE: Infrastructure-as-Code (IaC)

Calendar View - xMatters Support

Dec 10, 2021 By xMatters In xMatters

The Calendar tab in xMatters displays a visual representation of a group’s shifts, color-coded by occurrence, making it easy to see how shifts work together in a daily, weekly, or monthly view.

View Video

xMatters

Incident Management

Read more about Calendar View - xMatters Support

BigPanda's ServiceNow integration just got better

Dec 9, 2021 By Bhushan Jadhav In BigPanda

ServiceNow is widely used across Fortune 1000 and Global 5000 enterprises, so it’s no wonder that the majority of BigPanda customers use ServiceNow and integrate with it to streamline their ticketing requests. BigPanda’s AIOps Event Correlation and Automation Platform provides context-rich incidents to IT Ops teams relying on ServiceNow and helps them gain end-to-end real-time visibility into their operations.

Read Post

BigPanda

Read more about BigPanda's ServiceNow integration just got better

What we learned from AWS's us-east-1 outage

Dec 8, 2021 By Max Rozen In OnlineOrNot

In case you missed it, for several hours on December 7, 2021, AWS's us-east-1 region had an outage impacting multiple AWS APIs, taking out various websites across the internet. According to our own monitoring at OnlineOrNot, the outage started at 2021-12-07 15:32 UTC and began to recover well at 2021-12-07 22:48 UTC (with minor signs of life for a few minutes around 2021-12-07 20:08 UTC). Had we relied solely on AWS to update their status page before reacting, we would have been waiting a while.

Read Post

OnlineOrNot

Read more about What we learned from AWS's us-east-1 outage

Groups Overview Tab - xMatters Support

Dec 8, 2021 By xMatters In xMatters

The Group Overview tab in xMatters gives you an at-a-glance view of a group’s details. You can use this page to view and edit a group’s key information.

View Video

xMatters

Incident Management

Read more about Groups Overview Tab - xMatters Support

Modernize Your Operations with Automated Incident Response

Dec 8, 2021 By PagerDuty In PagerDuty

PagerDuty helps developers and IT professionals adopt full service ownership to ensure that those who go on call are 1) only interrupted by an alert when necessary, and 2) equipped with tools to remove the toil from managing incident response. Automating incident response increases developer and IT staff productivity, improves customer experience from service interruptions and unplanned downtime, and improves responder morale. Learn from PagerDuty customer Guidewire how Automated Incident Response can do all this for your teams.

View Video

PagerDuty

Read more about Modernize Your Operations with Automated Incident Response

SRE Incident Management: Overview, Techniques, and Tools

Dec 8, 2021 By Jacob Hall In Dotcom-Monitor

In the world of a site reliability engineer (SRE), failure is not only an option, but also expected. Systems, web applications, servers, devices, etc., are all prone to performance issues and unexpected outages at some point. It is an unavoidable fact. These unexpected failures can lead to huge revenue losses, customer trust and depending on the industry, maybe fines. Fortunately, SRE incident management is one of the core practices used to limit the disruption caused by unexpected issues.

Read Post

Dotcom-Monitor

Read more about SRE Incident Management: Overview, Techniques, and Tools

Incident Review - AWS Outages Crash Major Online Services - Including Amazon

Dec 7, 2021 By Karthik Suresh, Carol Hildebrand In Catchpoint

The following is an analysis of the Amazon Web Services incident on 12/07/2021. Millions of users were affected by an Amazon Web Services outage that took down major online services such as Amazon, Amazon Prime, Amazon Alexa, Venmo, Disney+, Instacart, Roku, Kindle, and multiple online gaming sites. The outage, which originated in the US-EAST-1 region on Dec. 7, 2021, is still ongoing at the time of blog publication.

Read Post

Catchpoint

Read more about Incident Review - AWS Outages Crash Major Online Services - Including Amazon

Space Made Simple: How PagerDuty Enabled Loft Orbital to Achieve Incident Response Lift Off

Dec 7, 2021 By PagerDuty In PagerDuty

The next great space race is on. Today, there are multiple companies competing to earn their slice of a global space industry set to be worth more than $1 trillion by 2040. However, launching a satellite into space still isn’t an option for most organizations due to the prohibitive costs and complex engineering required.

Read Post

PagerDuty

Read more about Space Made Simple: How PagerDuty Enabled Loft Orbital to Achieve Incident Response Lift Off

Why automation is the incident response 'easy button' MSPs & IR firms have been waiting for

Dec 7, 2021 By Noam Morginstin In Exigence

The managed security services market is booming. Coming in at $22.8 billion in 2021, it is projected to nearly double in just five years and grow to $43.7 billion by 2026. Moreover, cloud-based managed security services are poised to be the major growth driver for the broader MSP market, coming in at $219.59 billion in 2021, and expected to reach $557.10 billion by 2028. As we can see, providing robust security services is a key competitive differentiator for the lucrative MSP market.

Read Post

Exigence

Read more about Why automation is the incident response 'easy button' MSPs & IR firms have been waiting for

The Cultural Shift to Modern IT Operations

Dec 6, 2021 By xMatters In xMatters

In the world of always-on services, many organizations have taken the path to modernize their IT operations to provide greater agility, lower cost, and most importantly, to deliver frictionless digital customer experiences. Is your DevOps team deploying more frequently than operations can support? Are you struggling to keep up with the maintenance issues associated with aging software? Modernizing your IT operations can be the key to overcoming these complexities.

Read Post

xMatters

Read more about The Cultural Shift to Modern IT Operations

Managing Groups - xMatters Support

Dec 6, 2021 By xMatters In xMatters

Groups in xMatters enable you to notify a set of users, devices, dynamic teams, and other groups as a single recipient. Organizing users into groups allows you to create on-call schedules, define escalations, and set shift rotations to notify only members actively on duty.

View Video

xMatters

Read more about Managing Groups - xMatters Support

What's New: Updates to Runbook Automation, Event Intelligence,Partner Integrations, and More!

Dec 6, 2021 By Vera Chan In PagerDuty

We’re excited to announce a new set of updates and enhancements to the PagerDuty platform. The product team has been hard at work making updates from Event Intelligence, Runbook Automation, and Applications with Monitoring Tools, to PagerDuty and PagerDuty Community Events.

Read Post

PagerDuty

Read more about What's New: Updates to Runbook Automation, Event Intelligence,Partner Integrations, and More!

Reimagining Retail Incident Response for the Holidays

Dec 6, 2021 By Ritika Bramhe In OnPage

The holiday season is here, and global retailers are prepared for the biggest retail event of the year. The decrease in new COVID-19 cases, coupled with a rise in vaccination rates, provides a glimmer of hope for shoppers looking to spend for friends and family. Holiday spending is expected to break previous records this year, growing up to 10.5 percent over 2020.

Read Post

OnPage

Read more about Reimagining Retail Incident Response for the Holidays

Best Practices to implement in Incident Management

Dec 3, 2021 By Neil Haran In OneUptime

They are like 5 stages of an incident: 1. Assess impact 2. Inform customers (statuspage) 3. Identify the issue 4. Mitigate the issue 5. Resolve the incident Then there’s followup and further work. Also important to note that (2) should be ongoing as you progress. Updating the status page should be done within reasonable periods – e.g. every 15-20 mins unless you specify otherwise.

Read Post

OneUptime

Read more about Best Practices to implement in Incident Management

What can SREs do to make holiday season's peak traffic less chaotic?

Dec 3, 2021 By Vardhan NS In Squadcast

Holiday season's peak traffic is the most challenging period for SREs and on-call engineers. In this blog, we have highlighted the things that SREs can do to make the holiday season less chaotic. The recently concluded Black Friday weekend could have potentially been the most challenging shift for on-call engineers working in the Retail or E-Commerce sector. Since such peak-traffic events push the system to the limits, engineering teams are engulfed in a lot of tension preparing for it.

Read Post

Squadcast

Read more about What can SREs do to make holiday season's peak traffic less chaotic?

Dashboard Fridays: Sample PagerDuty Alerting dashboard

Dec 3, 2021 By Squared Up In Squared Up

Adam Kinniburgh is back with another Dashboard Fridays episode, this time joined by Ashley Thompson as they showcase this example PagerDuty Alerting dashboard. This dashboard gives an overview of alerting sent to PagerDuty from any source, even external sources like Pingdom.

View Video

Squared Up

Read more about Dashboard Fridays: Sample PagerDuty Alerting dashboard

Who Needs Site Reliability Engineers (SREs)?

Dec 3, 2021 By JJ Tang In Rootly

Although every company can benefit from SREs, some need SREs more than others.

Read Post

Rootly

Read more about Who Needs Site Reliability Engineers (SREs)?

Nixle Support Center Introduction

Dec 3, 2021 By Everbridge In Everbridge

View Video

Everbridge

Incident Management

Read more about Nixle Support Center Introduction

User Performance Reports in xMatters - xMatters Support

Dec 3, 2021 By xMatters In xMatters

The user performance report in xMatters gives you detailed statistics on how users have responded to notifications. The report can be used to gain insights into how well xMatters is being adopted in an organization.

View Video

xMatters

Incident Management

Read more about User Performance Reports in xMatters - xMatters Support

DevOps Workflow | A Complete Guide & Best Practices

Dec 2, 2021 By Myra Nizami In Blameless

Curious about DevOps Workflow? We explain the DevOps process, how automation relates to workflow, and best practices for workflow design DevOps is a methodology that involves Development and Operations working together during the development process. Workflow is the sequence in which tasks occur. DevOps workflow relies heavily on automation and involves: Using DevOps, teams can increase collaboration and improve processes to create more stable and manageable processes.

Read Post

Blameless

Read more about DevOps Workflow | A Complete Guide & Best Practices

Site Reliability Engineering, Observability, and the Tradeoffs of Modern Software

Dec 2, 2021 By Jason Bloomberg In Moogsoft

This blog post defines SRE by explaining SLOs and error budgets, highlighting the innovation vs. reliability tradeoff.

Read Post

Moogsoft

Read more about Site Reliability Engineering, Observability, and the Tradeoffs of Modern Software

December 2021 Update - On-duty board, Manual Signls and Azure Sentinel update

Dec 2, 2021 By René In SIGNL4

Our December update brings a ‘Who is on duty’ board displaying current team members on duty with contact information. In addition, we have simplified the manual sending of Signls and improved the integration with Azure Sentinel. As always, you can find all the details in this article.

Read Post

SIGNL4

Read more about December 2021 Update - On-duty board, Manual Signls and Azure Sentinel update

User Roles in xMatters - xMatters Support

Dec 1, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he explains roles in xMatters. Roles control your xMatters priviledges, including the features you can access and how you can engage with other rules. They also control whether you can send notifications, manage other users, configure xMatters and more.

View Video

xMatters

Incident Management

Read more about User Roles in xMatters - xMatters Support

Workflows: your process, automated

Dec 1, 2021 By Chris Evans In Incident.io

After many weeks of work, we're delighted to announce the latest feature of the incident.io platform: Workflows. Configure your processes once, and we'll make sure you follow them, every time ✨ A little while ago, I was asked the question: “what makes a good incident response?”. Whilst there’s infinite nuance in the answer, mine was pretty straightforward. The best incidents are founded on principles of communication, coordination, and clear roles and responsibilities.

Read Post

Incident.io

Read more about Workflows: your process, automated

How to Reduce Noise, Resolve Faster, and Automate More Often with PagerDuty

Dec 1, 2021 By Vivian Chan In PagerDuty

When we asked how technology leaders are feeling about increased pressure on digital services, they reported that, unsurprisingly, their investments in digital have grown. In fact, 72% are ramping up digital transformation efforts. Yet while the C-suite is interested in AIOps and automation to help their teams, it’s not always clear what their approach should be and how this technology can be applied to solve problems for their teams today.

Read Post

PagerDuty

Read more about How to Reduce Noise, Resolve Faster, and Automate More Often with PagerDuty

Operations | Monitoring | ITSM | DevOps | Cloud

December 2021