February 2022

xMatters Out Run Release Recap: Service-centric Automations, Callable Flows, and More!

Feb 28, 2022 By xMatters In xMatters

What’s one of the fundamental principles of DevOps? Automation. There are many ways to leverage automation to facilitate DevOps practices for enabling consistency, reliability, and efficiency within the organization. That’s why we’re taking serious strides to ensure that xMatters can allow full automation and coordination of the many tools we use to make incident management easier and more efficient for front-line responders.

Read Post

xMatters

Read more about xMatters Out Run Release Recap: Service-centric Automations, Callable Flows, and More!

Workflow Forms - xMatters Support

Feb 28, 2022 By xMatters In xMatters

In xMatters, forms define what information will be contained in a message, how it will look and interact with external systems, and what response options are available to users.

View Video

xMatters

Incident Management

Read more about Workflow Forms - xMatters Support

Creating Subscription Forms - xMatters Support

Feb 25, 2022 By xMatters In xMatters

In xMatters, you can use subscriptions to ensure that you are always informed about certain events. These subscriptions will send you notifications whenever an event occurs that matches your pre-determined criteria, even if you are not directly targeted to receive a notification for that event. Follow us on social!

View Video

xMatters

Incident Management

Read more about Creating Subscription Forms - xMatters Support

Traditional vs Modern Incident Response

Feb 24, 2022 By Kristijan Mitevski In Squadcast

An incident is an event (network outage, system failure, data breach, etc.) that can lead to loss of, or disruption to, an organization's operations, services or functions. Incident Response is an organization’s effort to detect, analyze and correct the hazards caused due to an incident. In the most common cases, when an incident response is mentioned, it usually relates to security incidents. Sometimes incident response and incident management are more or less used interchangeably.

Read Post

Squadcast

Read more about Traditional vs Modern Incident Response

SIGNL4 in IT Security

Feb 24, 2022 By SIGNL4 In SIGNL4

How SIGNL4 complements IT SecOps operations and improves data and IT security through rapid incident response

View Video

SIGNL4

Read more about SIGNL4 in IT Security

Finding a pricing model that's just right

Feb 24, 2022 By Katie Hewitt In Incident.io

Getting your pricing right is critical to the success of any SaaS company, but finding a model that works can be tough. Price too high, you won’t close enough deals - your business will fail. Price too low, your business model will be unsustainable - your business will fail. To add to the complication, when you’re a new startup your goals are evolving.

Read Post

Incident.io

Read more about Finding a pricing model that's just right

SRE Tools (All of the Tools Your Team Needs)

Feb 24, 2022 By Myra Nizami In Blameless

Wondering about SRE Tools? We explain the best tools for every step of the SRE development process.

Read Post

Blameless

Read more about SRE Tools (All of the Tools Your Team Needs)

Putting the "Action" in Actionable Intelligence

Feb 24, 2022 By Heath Newburn In PagerDuty

AIOps combines machine learning and people to deliver technical outcomes in IT operations. The promise of this capability continues to drive new contenders to the market. AIOps has become a core messaging component for all the major event management players. Many have just rebranded their products to specifically highlight AIOps features. Emerging event management players have arrived and tried to also claim the AIOps space.

Read Post

PagerDuty

Read more about Putting the "Action" in Actionable Intelligence

Can Endpoint Protection Keep up With Modern Threats?

Feb 24, 2022 By OnPage Corporation In OnPage

Endpoint protection is a security approach that focuses on monitoring and securing endpoints, such as desktops, mobile devices, laptops, and tablets. It involves deploying security solutions on endpoints to monitor and protect these devices against cyber threats. The goal is to establish protection regardless of the endpoint’s location, inside or outside the network.

Read Post

OnPage

Read more about Can Endpoint Protection Keep up With Modern Threats?

Episode 1: Mooving to SaaS from On-Prem: Lift and Shift -Then We're Done, Right?

Feb 24, 2022 By Minami (Coirin) Rojas In Moogsoft

Dave Mangot, author, speaker, and consultant, sits down with Moogsoft to discuss moving to SaaS from On-Prem in the premiere episode of “Mooving To…”

Read Post

Moogsoft

Read more about Episode 1: Mooving to SaaS from On-Prem: Lift and Shift -Then We're Done, Right?

Incident severity and priority 101

Feb 24, 2022 By Robert Ross In FireHydrant

Severity and priority can be challenging for a company to nail. When an incident is declared, it's essential to have a system to define the impact and how urgently it should be handled. Incident severity and priority are the two knobs teams can leverage to define scope and urgency, and eventually, the appropriate process to take action. But how should we define them, and what are the differences?

Read Post

FireHydrant

Read more about Incident severity and priority 101

What Is a DevOps Toolchain and How Does It Work?

Feb 23, 2022 By Hollie Whitehead In xMatters

Picture yourself trying to resolve a code error when you notice an additional issue outside your realm of expertise that's making matters worse. Your instinct is to get in touch with the right contact as quickly as possible to resolve the issue so that there's no further impact on the system's uptime. But what if you can't get in touch with them immediately, or don't know who to contact? Instead of trying to solve the problem without support, a DevOps toolchain could have mitigated this chain reaction from the start.

Read Post

xMatters

Read more about What Is a DevOps Toolchain and How Does It Work?

Integration Builder - xMatters Support

Feb 23, 2022 By xMatters In xMatters

The Integration Builder allows you to create integrations between xMatters and other applications. Follow us on social!

View Video

xMatters

Incident Management

Read more about Integration Builder - xMatters Support

Major IT Outage 2021 Recap

Feb 23, 2022 By AlertOps In AlertOps

We saw that no one is immune from major IT outages in 2021, not even mega titans like Google, Facebook, and Amazon AWS. The following is a recap of some of the major IT outages with widespread impact for 2021. Amazon Web Services’ (AWS) historic outage occurred on December 7, 2021 and lasted roughly 6 and a half hours. The breadth of Amazon and its reach caused not only their warehouse and delivery operations to stop.

Read Post

AlertOps

Read more about Major IT Outage 2021 Recap

Slack outage

Feb 23, 2022 By AlertOps In AlertOps

Slack, a popular enterprise communications platform, faced a 5-hour system outage yesterday between 9:25 AM – 2:24 PM EST on February 22, 2022. Slack services affected included: messaging, search, link previews, apps/integrations/APIs, posts/files, workspace/org administration, login/SSO, notifications, connections, and calls. AlertOps was NOT affected by this outage.

Read Post

AlertOps

Read more about Slack outage

Cloud Incident Management Guide

Feb 23, 2022 By Ritika Bramhe In OnPage

It is a well-established fact that companies looking to grow in the digital age can facilitate this mission by adopting the cloud. When pursued with the right intent and implementation strategy, cloud adoption acts as a powerful force multiplier, yielding a cutting-edge IT powerhouse for businesses and helping them grow and innovate at an accelerated pace. Organizations that adopt a cloud-first strategy must safeguard themselves from critical, service-disrupting incidents.

Read Post

OnPage

Read more about Cloud Incident Management Guide

Incident Management Metrics | Choosing KPIs that Matter

Feb 22, 2022 By Noor-ul-Anam Ruqayya In Blameless

Wondering about incident management metrics? We explain what incident management metrics are, how to track them, and what to do with the information.

Read Post

Blameless

Read more about Incident Management Metrics | Choosing KPIs that Matter

New Features: Incident Comms, iLert Developer Platform, Complex Schedules, WhatsApp Notifications, New Integrations

Feb 22, 2022 By iLert In iLert

This post highlights some of the features and improvements that we have released in the last 6 months.

Read Post

iLert

Read more about New Features: Incident Comms, iLert Developer Platform, Complex Schedules, WhatsApp Notifications, New Integrations

PagerDuty Receives Financial Services Competency From AWS

Feb 22, 2022 By Inga Weizman In PagerDuty

We are excited to announce that PagerDuty is now an approved AWS Financial Services Competency Partner. We’re looking forward to expanding our global reach and helping financial services organizations accelerate their cloud migration and digital acceleration journeys. This will allow us to further streamline and automate financial service companies’ digital operations while helping them reduce risk and manage compliance requirements.

Read Post

PagerDuty

Read more about PagerDuty Receives Financial Services Competency From AWS

Episode 3: Mooving to... Stability: The Role of Catastrophic Failure in Software Design

Feb 22, 2022 By Moogsoft Team In Moogsoft

In this episode of Mooving to… Stability: The Role of Catastrophic Failure in Software Design, we had the opportunity to chat with Jeff Atwood, yes that Jeff Atwood of, Coding Horror, Stack Overflow, and Discourse (Chief Happiness Officer). Jeff started writing 911 software in Boulder, Colorado for a small company, which was a crash-course in writing code for software that has real consequences. With this unique and deep perspective, B.J.

Read Post

Moogsoft

Read more about Episode 3: Mooving to... Stability: The Role of Catastrophic Failure in Software Design

Starting projects at incident.io

Feb 21, 2022 By Lisa Karlin Curtis In Incident.io

We’re a small startup (10 people at time of writing) with big ambitions, particularly when it comes to our product. With so many things we want to do, it’s important for us to be structured the way we approach our work, without being so process-driven that we lose all the benefits of being small and nimble. As we’re still new, and the team is growing all the time, very little is set in stone.

Read Post

Incident.io

Read more about Starting projects at incident.io

Everything you need to know about Squadcast and Microsoft Teams Integration

Feb 21, 2022 By Vishal Padghan In Squadcast

Microsoft Teams is one of the most versatile tools in terms of providing collaboration and chat solutions to numerous enterprises. We at Squadcast understand how important Microsoft Teams can be for your organization. Hence, we bring you this blog on Squadcast-Microsoft Teams integration that will tell you how this integration can help in improved incident management, effective collaboration and a lot more.

Read Post

Squadcast

Read more about Everything you need to know about Squadcast and Microsoft Teams Integration

Integrated Properties - xMatters Support

Feb 21, 2022 By xMatters In xMatters

Integrated properties in xMatters allow message senders to automatically pull data from external sources into their message content.

View Video

xMatters

Incident Management

Read more about Integrated Properties - xMatters Support

Sprint planning - How to prioritize urgent production issues?

Feb 20, 2022 By Aman Swami In Zenduty

Small engineering team members wear a lot of hats while working on a product. It becomes hard to prioritize and deal with issues that arise during production when a sprint is already planned and put in place. This not only makes sprints harder to plan but also reduces accountability. How do you tackle this problem and make sure your engineering team does not burn out at the same time? Let’s list down a couple of characteristics of this engineering team that is quite common across the board.

Read Post

Zenduty

Read more about Sprint planning - How to prioritize urgent production issues?

[Webinar] The Convergence of Digital and Physical Security

Feb 18, 2022 By Everbridge In Everbridge

View Video

Everbridge

Read more about [Webinar] The Convergence of Digital and Physical Security

Workflow Properties - xMatters Support

Feb 18, 2022 By xMatters In xMatters

Properties are elements of xMatters workflows that help define and structure message content. Properties might include information such as the severity or location of an incident in an integrated system.

View Video

xMatters

Incident Management

Read more about Workflow Properties - xMatters Support

Designing your incident severity levels

Feb 17, 2022 By Stephen Whitworth In Incident.io

We wrote this article in response to a question asked in our Slack Community. Click here to join hundreds of technology leaders discussing best practices for incident response! ✨ We know a thing or two about incident response. As such, we're often asked to advise when companies are designing their incident response processes. A common question is "How do you design your incident severity levels?". It's a great question given how central they are to incident response!

Read Post

Incident.io

Read more about Designing your incident severity levels

Incident Response Team | Roles & Responsibilities Defined

Feb 17, 2022 By Myra Nizami In Blameless

We discuss what an incident response team does, how it is structured, and how to form the best one for your organization.

Read Post

Blameless

Read more about Incident Response Team | Roles & Responsibilities Defined

Why and How SREs Can Benefit from Feature Flags

Feb 17, 2022 By Weihan Li In Rootly

When you think of who uses feature flags, your mind most likely goes to developers. In general, feature flags are closely associated with software engineering. But Site Reliability Engineers, too, can benefit from feature flags. SREs may not be the ones to create feature flags, but they should work closely with developers to ensure that the applications their teams support include feature flags.

Read Post

Rootly

Read more about Why and How SREs Can Benefit from Feature Flags

Prepare Your Organization for a Hurricane

Feb 17, 2022 By Everbridge In Everbridge

Hurricanes pose immense risk to the safety of an organization’s people, the continuity of operations, and the connectivity of communications systems. During a hurricane, critical event managers must be able to communicate crucial safety information to the people for which they are responsible. In addition to hurricane preparedness, critical event managers should ready their business in the case of any severe weather event.

Read Post

Everbridge

Read more about Prepare Your Organization for a Hurricane

An easier way to create runbooks

Feb 17, 2022 By Vinessa Wan In FireHydrant

Runbooks have been a game changer for many incident response teams, and we just made it easier for you to get up and running with them. Runbooks reduce toil for responders and ensure consistency in your incident management processes.In the thick of trying to resolve an issue, remembering things like emailing customers is likely the last thing on responders minds but yet forgetting to do so can be detrimental.

Read Post

FireHydrant

Read more about An easier way to create runbooks

February 2022 Update - Centralized and time-based notification patterns

Feb 16, 2022 By René In SIGNL4

With our February update, it is now possible to centrally configure how Signls should be notified. And of course, each team can have a different configuration of their notification preferences. This also includes response and escalation settings. In addition, it is now possible to set different notification patterns per day and time of the day, e.g. to notify via different channels at night than during office hours.

Read Post

SIGNL4

Read more about February 2022 Update - Centralized and time-based notification patterns

A Day in the Life of a DevOps Engineer

Feb 16, 2022 By Kalen Wessel In xMatters

In the past five years, DevOps adoption has almost doubled. In fact, 74 percent of companies now use DevOps in some form. As a growing number of organizations seek to implement DevOps practices, the need for qualified DevOps engineers is soaring. But what exactly does a DevOps engineer do, and what skills are required to succeed in this in-demand role?

Read Post

xMatters

Read more about A Day in the Life of a DevOps Engineer

Workflows - xMatters Support

Feb 16, 2022 By xMatters In xMatters

The xMatters Workflows tab allows you to create, delete, enable, and manage access permissions to your workflows. Workflows and their associated flows are key components of xMatters' powerful messaging functionality. Follow us on social!

View Video

xMatters

Incident Management

Read more about Workflows - xMatters Support

Customer Service Ops - New Features Release

Feb 16, 2022 By Hadijah Creary In PagerDuty

Over the last few years, our world has become increasingly digital, from streaming and shopping to work and health care. Customers want these digital experiences to be seamless. This has become a key priority for all businesses as well, as they depend on happy customers to drive sales and brand reputation. To ensure these seamless digital experiences, technology teams have doubled down on reliability, user experience, and building new features.

Read Post

PagerDuty

Read more about Customer Service Ops - New Features Release

Cloud Complexity - Bringing Resources together in Multi-cloud Environments

Feb 15, 2022 By Caleb Munyasya In Squadcast

The world is still getting used to operating within the cloud. Moving to the cloud is challenging for many organizations. So why do we see a rise in the adoption of multicloud strategies? In this blog, we will explore why this trend is worth considering for your organization, as well as look at the challenges that it brings.

Read Post

Squadcast

Read more about Cloud Complexity - Bringing Resources together in Multi-cloud Environments

Customer Success at an early-stage B2B SaaS company

Feb 15, 2022 By Esther Delignat In Incident.io

Based on our newfound data feet, we’ve started consistently tracking the adoption rate of our latest features. As it happens, we’ve been impressed with the results! For example, we were delighted to see that our new tutorial flow was completed end-to-end by 35% of our users (against an industry average of less than a quarter for 6-step product tours like ours). I know, I know: being at such an early stage means it is arguably easier to hit customer needs on the head.

Read Post

Incident.io

Read more about Customer Success at an early-stage B2B SaaS company

How We Define SRE Work

Feb 15, 2022 By Fred Hebert In Honeycomb

At the time of writing this post, I have officially been at Honeycomb for one year as a site reliability engineer (SRE). I had shared my initial experiences and impressions in this post and thought it would make sense to check back in now that I’ve had the opportunity to spend time learning about the team, the culture, and the code base more in depth.

Read Post

Honeycomb

Read more about How We Define SRE Work

Exploring the Importance of Change Management in Healthcare

Feb 15, 2022 By Christopher Gonzalez In OnPage

Change management is an organized, structured approach with methods that enable healthcare organizations to transform workflows seamlessly. Organizational change management requires the collective involvement of C-level executives and stakeholders to successfully implement changes within a care facility. Change is required when individuals, processes, teams, and tools cannot keep pace with the ever-changing needs and expectations of the organization.

Read Post

OnPage

Read more about Exploring the Importance of Change Management in Healthcare

Improved routing for Jira Cloud and Jira Server tickets with multi-project support

Feb 15, 2022 By Dylan Nielsen In FireHydrant

If you love Jira then you probably love customization, and we’ve made your integration with Jira Cloud and Jira Server even better with multi-project support! You can now route your incident tickets and follow-up work to remediation teams' Jira projects directly from FireHydrant, saving you valuable time and clean-up work. Let’s take a look at what has changed and some additional use cases unlocked with this integration.

Read Post

FireHydrant

Read more about Improved routing for Jira Cloud and Jira Server tickets with multi-project support

New Native Slack functionality from PagerDuty - Available Now

Feb 14, 2022 By Jorge Villamariona In PagerDuty

At PagerDuty we invest a significant part of our time listening to our customers. From what we have learned from those conversations we are adding a new set of features to our Slack Integration. These features will make leveraging PagerDuty from Slack even more seamless and allow Incident Responders to conduct their work without switching context, expediting response times, and ultimately maintaining high customer satisfaction.

Read Post

PagerDuty

Read more about New Native Slack functionality from PagerDuty - Available Now

Conference Tab - xMatters Support

Feb 14, 2022 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Conference Tab - xMatters Support

The three pillars of great incident response

Feb 11, 2022 By Lisa Karlin Curtis In Incident.io

There’s no one-size-fits-all incident response process. Depending on your organisation’s shape and size, you’ll have different requirements and priorities. But the same three pillars form the core of any good process, whether it’s for the largest e-commerce giant or a scrappy SaaS startup.

Read Post

Incident.io

Read more about The three pillars of great incident response

What is a Runbook And How Can It Help My Team

Feb 11, 2022 By Myra Nizami In Blameless

Wondering what a runbook is? We explain what a runbook is, common tasks a runbook can help with, and how to create one.

Read Post

Blameless

Read more about What is a Runbook And How Can It Help My Team

Getting Started with Playbooks

Feb 11, 2022 By Mattermost In Mattermost

Playbooks are collaborative checklists for prescribed, repeatable processes, integrated with channels and automations in the Mattermost platform.. You'll create a free account on the Mattermost Community server and get a walk-through of some basic navigation and usage.

View Video

Mattermost

Read more about Getting Started with Playbooks

It's not ready for production until it has an Operational Readiness Checklist

Feb 11, 2022 By Ally McKnight In FireHydrant

Maintaining the reliability of complex services just got easier with Operational Readiness Checklists. Service owners and engineering leaders can now evaluate and maintain the production readiness of the services their users rely on every day: spot risks in your service dependencies before they cause incidents, and respond quickly if they do. Before you put a new service into production, readiness checklists help you dot-your-is and cross-your-ts.

Read Post

FireHydrant

Read more about It's not ready for production until it has an Operational Readiness Checklist

Messaging Page - xMatters Support

Feb 11, 2022 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Messaging Page - xMatters Support

Integration Options with SIGNL4

Feb 10, 2022 By Ronald In SIGNL4

SIGNL4 integrates with various backend systems like IT monitoring, service management, IoT systems, sensors, etc. to automatically alert users and teams about certain incidents. A list of selected tools along with integration descriptions is available in our integrations section. How can you integrate SIGNL4 with your own tools? In the following we list some options offering different levels of sophistication.

Read Post

SIGNL4

Read more about Integration Options with SIGNL4

12 ways to ace customer communications during a system outage

Feb 10, 2022 By Radhika Narayanan In Freshservice

System outages are the worst nightmares for IT support teams, but they also provide an opportunity to stand out. During a major service outage, customers are often impacted a lot more because they have much less information about what is happening. Some of the biggest outages that affected users all over the world last year include those of Slack, PlayStation, Airbnb, FedEx, and Amazon.

Read Post

Freshservice

Read more about 12 ways to ace customer communications during a system outage

The Math & Fun Behind Nesting Event Rules with Event Orchestration

Feb 10, 2022 By PagerDuty In PagerDuty

PagerDuty Senior Product Manager Frank Emery joins us on Twitch to talk about Event Orchestration, a new feature in the PagerDuty Platform. We found in our data that 20% of incidents are resolved - by human responders - in under 5 minutes. Why are team members being interrupted for these alerts? Automation is a better answer. Event Orchestration utilizes powerful, flexible rules to turn alerts into automated activities so your team can keep working and avoid unnecessary interruptions!

View Video

PagerDuty

Read more about The Math & Fun Behind Nesting Event Rules with Event Orchestration

SauceLabs & PagerDuty Notifications Channel for API Tests & Monitors

Feb 10, 2022 By PagerDuty In PagerDuty

"APIs are the backbone of the apps and web services that run the world, yet most companies don’t have a true understanding of their functional uptime and reliability. Sauce Labs collects those insights by leveraging functional and integration tests as monitors. This provides a single source of truth for uptime and detailed reporting for when problems occur with functionality or performance. With PagerDuty, Sauce Labs' users gain granular control over notifications to ensure compliance with company policies while centralizing test and incident response processes among developers, testers, and product owners.

View Video

PagerDuty

Read more about SauceLabs & PagerDuty Notifications Channel for API Tests & Monitors

Top 9 Skills for SREs from ex-Instacart SRE

Feb 10, 2022 By Quentin Rousseau In Rootly

A list of the top nine SRE skills, from incident management, to cloud computing, to networking and beyond.

Read Post

Rootly

Read more about Top 9 Skills for SREs from ex-Instacart SRE

Squadcast Earns a Spot on G2's Top 50 Best Software Awards for IT Management Products 2022

Feb 9, 2022 By Squadcast Community In Squadcast

We are thrilled to announce that G2 has recognized Squadcast as a High Performer in the Incident Management space and rated us as one of the Best Software for IT Management Products. Over the last three years, G2 has acknowledged our impact in the IT Incident Management space, which led to us being recognized as a Momentum Leader in the Incident Management and IT Alerting categories. Thanks to our learnings from customer feedback, we have been able to shape our product vision and grow further.

Read Post

Squadcast

Read more about Squadcast Earns a Spot on G2's Top 50 Best Software Awards for IT Management Products 2022

Three Common Incident Response Process Examples

Feb 9, 2022 By Hollie Whitehead In xMatters

What makes an engineering team? Communication, collaboration, process, order, and common goals. Otherwise, they would just be a bunch of engineers. The same is true of their tools. Connectivity and process turn a bunch of tools into a DevOps toolchain. If you need a DevOp toolchain, you can use it to easily build an incident response process.

Read Post

xMatters

Read more about Three Common Incident Response Process Examples

Scheduling Managers on Duty for Alert Escalation

Feb 9, 2022 By Matt In SIGNL4

A question we are hearing often is related to manager escalations, a heavily utilized feature in SIGNL4. Users ask us if those managers can be scheduled. The short answer is ‘yes’, but you need to use a different feature in SIGNL4 and do a little re-configuration.

Read Post

SIGNL4

Read more about Scheduling Managers on Duty for Alert Escalation

Slash MTTR, avoid costly downtime with improved cross-team Collaboration

Feb 9, 2022 By David Arrowsmith In Interlink

Every second counts when IT teams are called upon to resolve business impacting issues. In modern enterprises, poor communication, fragmented toolchains and spiralling IT complexity can conspire to slow down incident response, putting service availability and ultimately customer satisfaction in peril.

Read Post

Interlink

Read more about Slash MTTR, avoid costly downtime with improved cross-team Collaboration

Starting a Conference Bridge - xMatters Support

Feb 9, 2022 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Starting a Conference Bridge - xMatters Support

Use your words: the importance of clear writing in product development

Feb 8, 2022 By Sophie Koonin In Incident.io

The role of an engineer at a startup is a tangled web: as well as writing code, you have to be your own product manager, QA tester, customer support and designer. But there’s another hat that you have to wear which you might not have thought about: copywriter. All products have copy, from welcome messages to text on a submit button. At incident.io, we have to put on our copywriting hats every time we add a new feature.

Read Post

Incident.io

Read more about Use your words: the importance of clear writing in product development

6 Software Reliability Metrics That Matter

Feb 7, 2022 By Myra Nizami In Blameless

Wondering about software reliability metrics? We explain the important metrics you need to track.

Read Post

Blameless

Read more about 6 Software Reliability Metrics That Matter

Enterprise Alert Update 9.3 brings great improvements for the OPC connector

Feb 7, 2022 By Derdack In Derdack

We have released an update for Enterprise Alert 9 (version 9.3) that revolutionizes our OPC connector and also includes some bug fixes. Read all the details in this article.

Read Post

Derdack

Read more about Enterprise Alert Update 9.3 brings great improvements for the OPC connector

Sending Alerts - xMatters Support

Feb 7, 2022 By xMatters In xMatters

View Video

xMatters

Incident Management

Read more about Sending Alerts - xMatters Support

Importance of Good Incident Communication

Feb 4, 2022 By Michael Marchese - SRE Manager In Rootly

From alerting to during to post incident, great communication is the key to effective incident response.

Read Post

Rootly

Read more about Importance of Good Incident Communication

What is MTTR? Resolve incidents faster through ops, alerting and documentation

Feb 3, 2022 By Joel Hans In Raygun

When downtime strikes any distributed software deployment or platform, it's all hands on deck until the lights are green and service is restored. This process, from the recognition of a problem to a deployed solution, has most commonly been defined as MTTR - mean time to resolution. In just the last few years, DevOps and site reliability (SRE) professionals have developed sophisticated new models for how they work and audit their successes. In 2022, MTTR is one of the most widely-used software performance success metrics.

Read Post

Raygun

Read more about What is MTTR? Resolve incidents faster through ops, alerting and documentation

Now You can Invoke PagerDuty Rundeck Actions Within the PagerDuty Slack Integration

Feb 3, 2022 By Joseph Mandros In PagerDuty

Last year, we released PagerDuty Rundeck Actions, a PagerDuty add-on product that connects responders to automated diagnostics and remediation for common problems directly in the PagerDuty incident response workflow. After working with our customers and listening to the community, we are excited to announce that PagerDuty Rundeck Actions now integrates with PagerDuty’s Slack integration.

Read Post

PagerDuty

Read more about Now You can Invoke PagerDuty Rundeck Actions Within the PagerDuty Slack Integration

The startup guide to sensible incident management

Feb 3, 2022 By Chris Evans In Incident.io

If you’re working at an early stage startup and looking to get some good incident management foundations in place without investing excessive time and effort, this guide is quite literally for you. There’s an enormous amount of content available for organisations looking to import ‘gold standard’ incident management best practices – things like the PagerDuty Response site, the Atlassian incident management best practices, and the Google SRE book.

Read Post

Incident.io

Read more about The startup guide to sensible incident management

Announcing Grafana Incident, smart incident management for your teams

Feb 2, 2022 By Mat Ryer In Grafana

A huge challenge when dealing with incidents is the coordination and communication needed to put things right. What’s happened so far? Who has tried what query? Did we remember to keep stakeholders informed? What is the severity of the incident? Does this affect customers? Figuring this out requires a lot of back and forth as new team members join the incident.

Read Post

Grafana

Read more about Announcing Grafana Incident, smart incident management for your teams

Grafana Incident: First look at the smart incident management tool

Feb 2, 2022 By Grafana In Grafana

Announcing Grafana Incident, the smart incident management tool for your teams. Grafana Incident allows teams to start collaborating immediately by automatically setting up all the essential spaces and resources needed for incident response, from Zoom meetings and Slack channels to a tracker for important tasks and TODO items. A chatbot offers a command-line interface for managing incidents, and provides the ability to instantly embed Grafana queries, dashboards, and metadata, GitHub issues and pull requests, and more. Grafana Incident is available in preview for Grafana Cloud users.

View Video

Grafana

Read more about Grafana Incident: First look at the smart incident management tool

Grafana OnCall is now generally available on Grafana Cloud, with a generous free tier

Feb 2, 2022 By Matvey Kukuy In Grafana

Today we’re announcing the general availability of Grafana OnCall on Grafana Cloud for all paid and free plans. A big part of delivering great software is ensuring the right people get the right information when the inevitable incidents occur. We want to help you do that with Grafana OnCall, an easy-to-use, developer-first on-call management tool that’s built on top of the Grafana stack you know and love.

Read Post

Grafana

Read more about Grafana OnCall is now generally available on Grafana Cloud, with a generous free tier

SRE Best Practices For Successful Teams

Feb 1, 2022 By Myra Nizami In Blameless

Wondering about SRE best practices? If you are trying to improve and streamline your current process, we explain best practices and tips for implementing them. What are SRE best practices?

Read Post

Blameless

Read more about SRE Best Practices For Successful Teams

Top tips to make Round Robin Scheduling successful for your team

Feb 1, 2022 By Hannah Culver In PagerDuty

You may have heard of Round Robin Scheduling before and thought to yourself, is this right for my team? Understanding how Round Robin Scheduling can be used and what teams it works best for is important when considering this method of on-call. Additionally, it comes with some pitfalls you’ll want to avoid, as well as best practices to adopt. In this blog post, we’ll share everything you need to know about Round Robin Scheduling within PagerDuty and how to get started.

Read Post

PagerDuty

Read more about Top tips to make Round Robin Scheduling successful for your team

What is Crisis Management?

Feb 1, 2022 By Everbridge In Everbridge

Crisis Management is an organization’s process- and strategy-based approach for identifying and responding to a threat, an unanticipated event, or any negative disruption with the potential to harm people, property, or business processes. Being prepared for any event to become a crisis requires a crisis management plan.

Read Post

Everbridge

Read more about What is Crisis Management?

How Can a Digital Operations Platform Support Operational Resilience?

Feb 1, 2022 By Everbridge In Everbridge

More organizations are requiring solutions that can automate and streamline digital operations across teams and toolsets, enabling enterprises to deliver continuous service uptime and enhance customer satisfaction.

Read Post

Everbridge

Read more about How Can a Digital Operations Platform Support Operational Resilience?

Operations | Monitoring | ITSM | DevOps | Cloud

February 2022