December 2023

Runbook vs Playbook: What's the difference?

Dec 29, 2023 By Chitra Bisht In Squadcast

What's the difference between Runbook and Playbook?- for once and all we'll end this confusion today. If you find yourself worrying about forgetting the detailed process of the incident your team just resolved, you're not alone. This is where documentations like Runbooks and Playbooks come into play. Runbooks and playbooks serve as the organizational guides, providing essential information and instructions for teams to navigate through tasks and processes effectively. They not only help your team help themselves but also frees up your time for your ever-growing to-do list.

Read Post

Squadcast

Read more about Runbook vs Playbook: What's the difference?

2023 Rewind: Squadcast Year-End Review

Dec 29, 2023 By Chitra Bisht In Squadcast

Hold the confetti, everyone, because it's time to POP the champagne! 2023 was a year where Squadcast truly leveled up. We dropped some remarkable features that got our hearts racing (and alerts under control!), snagged some fantastic recognition for our impact, and even gave our website a stunning makeover. And we couldn't have done it without you! Buckle up to get a rewind of everything altogether, Let's get started.

Read Post

Squadcast

Read more about 2023 Rewind: Squadcast Year-End Review

Public Safety - Everbridge

Dec 28, 2023 By Everbridge In Everbridge

For over 20 years, Everbridge has been a trusted partner to governments worldwide. From fires or floods to terrorist attacks, we’ve monitored potential hazards, preparing, responding to incidents, and effectively providing the right people with the right information. Be it a country-wide emergency or a neighborhood outage, communities rely on Everbridge to keep them informed and safe.

View Video

Everbridge

Read more about Public Safety - Everbridge

Introducing Squadcast's Service Graph | Incident Management | Squadcast

Dec 28, 2023 By Squadcast In Squadcast

This video introduces you to Squadcast's Service Graph feature.

View Video

Squadcast

Read more about Introducing Squadcast's Service Graph | Incident Management | Squadcast

G2 Winter Report 2023: Squadcast Maintains Leadership in IT Alerting and Incident Management

Dec 27, 2023 By Sanjog Sandhu In Squadcast

2023 has been a year of significant growth for Squadcast, with an expanding presence in both Mid-Market and Enterprise segments across IT Alerting and Incident Management categories. And with the release of the G2 Winter Report '23, it's an opportune moment to share some of our key achievements.

Read Post

Squadcast

Read more about G2 Winter Report 2023: Squadcast Maintains Leadership in IT Alerting and Incident Management

On-Call Software Engineer Roles and Responsibilites

Dec 27, 2023 By Zoe Collins In OnPage

Most software engineers know that they are typically tasked with on-call shifts, but new software engineers entering the field may be asking themselves – What do I even do if I get scheduled for an on-call shift? This is a common question that often doesn’t get answered until that first on-call shift, and unfortunately that can be overwhelming for a young professional who is nervous about their first on-call shift, let alone their first incident.

Read Post

OnPage

Read more about On-Call Software Engineer Roles and Responsibilites

A Little Resilience Goes A Long Way

Dec 27, 2023 By Emily Arnott In Blameless

‍ Let’s call this the mother of all understatements. If you’re reading this blog, there’s a good chance that you: ‍ a.) Agree wholeheartedly with this sentiment and think it should go without saying, AND… b.) Are surrounded by folks who pay lip service to this idea while not taking it as seriously as they should.

Read Post

Blameless

Read more about A Little Resilience Goes A Long Way

Reflecting on a momentous 2023 at incident.io

Dec 23, 2023 By Luis Gonzalez In Incident.io

2023 at incident.io was a year to remember. While it's easy to be cyclical about proclaiming that every year was better than the last, a few things stand out that made 2023 truly a year for the books. TL;DR, a lot happened! Especially when you consider that a lot of things didn't make the list above. So as we turn the page to 2024, we wanted to take a moment to reflect on the transformative year that was 2023, not only for us but our customers as well.

Read Post

Incident.io

Read more about Reflecting on a momentous 2023 at incident.io

Beacons Webinar

Dec 21, 2023 By StatusCast In StatusCast

Discover how StatusCast Beacons can transform your IT Incident Management process in our upcoming webinar. This webinar covers the power of Beacon Automations providing.

View Video

StatusCast

Read more about Beacons Webinar

The Debrief: Incident management for data teams

Dec 21, 2023 By Incident.io In Incident.io

If you're on a data team, have you ever considered using an incident management tool to respond to pipeline issues? If the answer is no, then you might want to check out this episode. Here, we chat with Jack, Data Analyst at incident.io, to better understand why data teams can—and should—look to incident management tools like incident.io to manage issues. We chat about.

View Video

Incident.io

Incident Management

Read more about The Debrief: Incident management for data teams

The Debrief: A year in review-2023 at incident.io

Dec 21, 2023 By Incident.io In Incident.io

What a year 2023 was at incident.io! While it's hard to summarize 365 days, a few things stand out: So as we close the curtain on 2023, we sat down with the three co-founders of incident.io to do a bit of reflection on the wild ride that was this year. In this episode you'll hear them discuss challenges, big wins, moments of growth, what's next for us, and most importantly, what the three co-founders like most about one another.

View Video

Incident.io

Incident Management

Read more about The Debrief: A year in review-2023 at incident.io

How To View Previous Incidents To Gain Helpful Context During Incident Triage?

Dec 20, 2023 By Chitra Bisht In Squadcast

Picture this: you're knee-deep in resolving a P1/P0 incident, urgently seeking answers. What if you could tap into past incidents to get important incident insights and streamline your troubleshooting process? In this blog, we pitch into the practical aspects of leveraging Squadcast's Past Incidents feature to help you enhance your Incident Management process.

Read Post

Squadcast

Read more about How To View Previous Incidents To Gain Helpful Context During Incident Triage?

Setting the foundations for on-call that's fair, balanced, and human-focused

Dec 20, 2023 By incident.io In Incident.io

Whenever you're providing a service to businesses or individuals that they rely on, it's important to make sure that it's up and running as much as possible without disruptions. But the reality is that, despite your best efforts, downtime does happen. Regardless of when incidents strike, whether it’s 2 PM in the middle of the working day or 2 AM, it's important to have people available to diagnose and resolve issues as soon as possible.

Read Post

Incident.io

Read more about Setting the foundations for on-call that's fair, balanced, and human-focused

SRE Essentials: Building a Team and Culture

Dec 20, 2023 By Anjali Udasi In Zenduty

What differentiates tech companies that weather digital storms with unwavering resilience? In many cases, the answer lies in a deeply ingrained SRE culture, which fosters proactive approaches to system reliability. Site Reliability Engineering (SRE) culture extends beyond mere tech tools and automated scripts. It emphasizes proactive care, shared responsibility, and continuous improvement, leveraging incident management software as a vital component in fostering these core values of SRE.

Read Post

Zenduty

Read more about SRE Essentials: Building a Team and Culture

On-call 3.0 Overview

Dec 20, 2023 By Spike In Spike

Here is On-call 3.0, the result of your feedback, deep examination, and fundamental improvements that makes on-call a better day to day experience.

View Video

Spike

Read more about On-call 3.0 Overview

Process Automation Release Notes v5.0

Dec 20, 2023 By PagerDuty In PagerDuty

Chat with the PagerDuty Process Automation product management team. Join us to learn more about what's new in the new major release - 5.0 - and what's coming for automation!

View Video

PagerDuty

Read more about Process Automation Release Notes v5.0

Tracking developer build times to decide if the M3 MacBook is worth upgrading

Dec 19, 2023 By Lawrence Jones In Incident.io

All incident.io developers are given a MacBook which they use for their development work. That meant when Apple released the M3 MacBook Pros in October, people naturally started asking questions like “wow, how much more productive might I be if my laptop looked that good?” and “perhaps we’d be more secure if our machines were Space Black 🤔” Pete’s (our CTO) response to this was “if you can prove it’s worthwhile, we’ll do it”

Read Post

Incident.io

Read more about Tracking developer build times to decide if the M3 MacBook is worth upgrading

BigPanda's latest Unified Console features unveiled

Dec 19, 2023 By C. Beers In BigPanda

In the fast-paced realm of incident management and response, the need to stay ahead is more vital than ever. In recognition of this, BigPanda has significantly enhanced the Unified Console, introducing a suite of new features designed to revolutionize incident handling. Let’s explore these transformative updates and how they can redefine your approach to incident management.

Read Post

BigPanda

Read more about BigPanda's latest Unified Console features unveiled

Year in Review: Key Trends in Critical Event Management

Dec 19, 2023 By Eric Boger In Everbridge

As we approach the end of 2023, it’s vital to reflect on the transformative year in the field of critical event management. Throughout the year, we’ve witnessed escalating geopolitical tensions, a surge in security threats encompassing both physical and cyber domains, and growing concerns over the intensifying impacts of climate change-induced severe weather events.

Read Post

Everbridge

Read more about Year in Review: Key Trends in Critical Event Management

What is a multi-cloud management platform?

Dec 19, 2023 By Amy Brennen In BigPanda

As an IT leader, you’re acutely aware of the struggles of juggling multiple cloud environments, from integration headaches to holistic incident management to monitoring multiple clouds at once. Seeking a more efficient multi-cloud management solution is crucial to alleviate these pressures and streamline your cloud operations.

Read Post

BigPanda

Read more about What is a multi-cloud management platform?

Episode 23: Zero-Downtime Updates with Todd Whitney

Dec 19, 2023 By PagerDuty In PagerDuty

With limited error budgets and low user tolerance for maintenance window, the ability to execute routine updates without a maintenance window is an increasingly important socio-technical capability. Hear from Todd Whitney, who recently spoke at HashiConf about how PagerDuty performs updates while upholding its promise to customers of taking zero maintenance windows.

View Video

PagerDuty

Incident Management

Read more about Episode 23: Zero-Downtime Updates with Todd Whitney

How MSPs and MSSPs can reduce risk and liability for their clients

Dec 19, 2023 By Noam Morginstin In Exigence

For 83% of companies, a cyber incident is just a matter of time (IBM). And when it does happen, it will cost the organization millions, coming in at a global average of $4.35 million per breach. Add to that stringent data protection laws and the growing frequency and reach of ransomware and other sophisticated attacks.

Read Post

Exigence

Read more about How MSPs and MSSPs can reduce risk and liability for their clients

All I want for Christmas... from Slack

Dec 18, 2023 By Kelsey Mills In Incident.io

When declaring and responding to an incident with incident.io, most of your interactions with our product will go via Slack. You might configure your forms in our web dashboard, but the responder using them to declare an incident is most likely doing so from a Slack modal, and the incident announcement will be posted as a Slack message. This means a lot of our product design falls within the constraints of what we can build using Slack’s block kit.

Read Post

Incident.io

Read more about All I want for Christmas... from Slack

Impressions from Gartner IOCS 2023

Dec 18, 2023 By Conor Castronovo In BigPanda

Gartner’s IT Infrastructure, Operations & Cloud Strategies Conference (IOCS) is an annual event that attracts ITOps, SRE, and DevOps leaders from around the world. As Gartner explains, IOCS “brings the world’s technology leaders together to hear top trends, find objective answers, and explore topic coverage in addition to best practices. Gain the insights and guidance to create an effective pathway to the future and network with your peers.”

Read Post

BigPanda

Read more about Impressions from Gartner IOCS 2023

Why monitoring your application is important

Dec 18, 2023 By Sam Osborn In BigPanda

Effective monitoring and observability tools are critical for modern enterprises. Daily operations, digital transformation, moving to a cloud-native architecture, and an ever-evolving tech stack all require ITOps, DevOps, and SRE teams to monitor increasingly complex systems. So what happens if your applications suddenly cease to function? Every moment of downtime translates to lost income, decreased customer satisfaction, and harm to your company’s reputation.

Read Post

BigPanda

Read more about Why monitoring your application is important

APAC Retrospective: Learnings from a Year of Tech Turbulence

Dec 18, 2023 By David Ridge In PagerDuty

Throughout 2023, one thing has become abundantly clear: regardless of an organization’s size or industry, incidents are inevitable. Recently across the APAC region, we’ve seen numerous regulatory bodies clamp down on large companies who are failing to provide acceptable service, with some handing out quite severe penalties. For many, the cost of an incident is no longer just lost revenue and customer trust, but financial penalties and business restrictions.

Read Post

PagerDuty

Read more about APAC Retrospective: Learnings from a Year of Tech Turbulence

The Debrief: A year in review-2023 at incident.io

Dec 18, 2023 By incident.io In Incident.io

What a year 2023 was at incident.io! While it's hard to summarize 365 days into just a few sentences, a handful of moments stood out from this transformative year: So as we close the curtain on a momentous 2023, we sat down with the three co-founders of incident.io—Chris, Stephen, and Pete—to do a bit of reflection on the wild ride that was this year.

Read Post

Incident.io

Read more about The Debrief: A year in review-2023 at incident.io

#TwitchRecap 2023: PagerDuty Community streamers

Dec 16, 2023 By PagerDuty In PagerDuty

'Tis the season to reflect back on the year! Join the PagerDuty Advocacy team for a look back at a busy year of live streaming.

View Video

PagerDuty

Incident Management

Read more about #TwitchRecap 2023: PagerDuty Community streamers

Understanding ServiceNow Incident Management: A comprehensive guide

Dec 15, 2023 By Amy Brennen In BigPanda

You’re focused on swiftly identifying, analyzing, and resolving disruptions in IT services. And you know all too well that correctly deploying and adopting incident management holds the key to delivering a more reliable and responsive IT environment for your applications and services. That’s why you’re using or are considering using ServiceNow’s incident management to ensure a structured and efficient approach to handling your IT service incidents.

Read Post

BigPanda

Read more about Understanding ServiceNow Incident Management: A comprehensive guide

Mute Notifications in ilert

Dec 15, 2023 By iLert In iLert

Activate the Mute Notifications feature to silence all notifications from ilert channels during periods when you're unavailable, such as vacations.

View Video

iLert

Read more about Mute Notifications in ilert

Better Incidents Winter Bonfire: Inside On-Call

Dec 14, 2023 By FireHydrant In FireHydrant

Engineers are bombarded with pages left and right. There's uncertainty about how to escalate. A constant blur exists between what's urgent and what can wait. This never-ending ping-pong game takes a toll. Burnout creeps in, and your engineering culture has taken a nose dive before you know it.

View Video

FireHydrant

Read more about Better Incidents Winter Bonfire: Inside On-Call

Automated incident response in ITOps: Here's everything you need to know

Dec 14, 2023 By Amy Brennen In BigPanda

If you’re like most IT leaders, you realize that automating repetitive, low-level incident response actions is key to unlocking enhanced workforce productivity, improved IT services, minimized downtime, better user experiences, cost savings, and the freedom to focus on innovation. Yet you don’t know where to start – or maybe aren’t sure of the best approach.

Read Post

BigPanda

Read more about Automated incident response in ITOps: Here's everything you need to know

BookMyShow's Cinematic Product Journey - Incidentally Reliable Podcast with Viraj Patel

Dec 14, 2023 By Zenduty In Zenduty

Grab some popcorn and catch Viraj talk about his experiences and BookMyShow's journey from its inception in the early 2000s to the entertainment behemoth it is today, their stints innovating at the forefront of the mobile and e-commerce revolutions, and their harmony with reliability engineering in the colourful, ever-changing yet challenging world of movies and online ticketing. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.

View Video

Zenduty

Read more about BookMyShow's Cinematic Product Journey - Incidentally Reliable Podcast with Viraj Patel

LLM Monitoring and Observability

Dec 14, 2023 By Ritika Bramhe In OnPage

Large Language Models (LLMs) are advanced artificial intelligence models designed to comprehend and generate human-like language. With millions or even billions of [parameters, these models, like GPT-3, excel in natural language processing, understanding context, and generating coherent and contextually relevant text across various applications.

Read Post

OnPage

Read more about LLM Monitoring and Observability

incident.io - now available in AWS Marketplace

Dec 14, 2023 By Chris Evans In Incident.io

We're pleased to announce incident.io can now be purchased through the AWS Marketplace!

Read Post

Incident.io

Read more about incident.io - now available in AWS Marketplace

The Everbridge Risk Intelligence Monitoring Center (RIMC) real-time alerting

Dec 14, 2023 By Everbridge In Everbridge

The Everbridge Risk Intelligence Monitoring Center (RIMC) analyzes thousands of trustworthy, vetted, and hyper-local data sources – across over 100 risk categories – using machine-learning and AI technology, complemented by an experienced team of global risk analysts. The RIMC team’s real-time alerting streamlines your organization’s ability to monitor and analyze worldwide incidents and events, dramatically increasing your ability to respond to risks that threaten your people, organization, supply chain, and more.

View Video

Everbridge

Read more about The Everbridge Risk Intelligence Monitoring Center (RIMC) real-time alerting

Everbridge Signal - Open Source Threat Intelligence to Keep People Safe and Operations Running

Dec 13, 2023 By Everbridge In Everbridge

There are billions of people online right now. Among that noise is information that could be vital to your organization’s safety and security. Everbridge Signal will help you find relevant information using Artificial Intelligence and Machine Learning. Detect incidents in real-time by gathering data from public sources including the dark web, deep web and social media. Whether your issues are cyber or physical, Signal can help.

View Video

Everbridge

Read more about Everbridge Signal - Open Source Threat Intelligence to Keep People Safe and Operations Running

Everbridge Flow Designer - Overview

Dec 13, 2023 By Everbridge In Everbridge

Flow Designer is a stunningly simple, visual workflow builder that’s as easy as drag, drop, and done. Built-in steps make it easy to create virtually any workflow connecting your applications. Just drop in the steps you need to launch a critical event management process, post progress updates to a public page, and create spaces for personnel to collaborate.

View Video

Everbridge

Read more about Everbridge Flow Designer - Overview

SIGNL4 Integration with TOPdesk for Mobile Alerting and Incident Response

Dec 13, 2023 By SIGNL4 In SIGNL4

Enhance your TOPdesk experience with reliable mobile alerting via push, SMS and voice calls. Including features like escalation, duty scheduling, team collaboration and ticket updates.

View Video

SIGNL4

Read more about SIGNL4 Integration with TOPdesk for Mobile Alerting and Incident Response

Lessons in Incident Response I Learned While Waiting Tables

Dec 13, 2023 By Ashley Sawatsky In Rootly

Before I stumbled into the tech industry (a story for another day), I spent several years in the customer service world as a server and front-of-house manager in restaurants. It was in these jobs that I first honed some critical skills that would later lead me on the path to incident response.

Read Post

Rootly

Read more about Lessons in Incident Response I Learned While Waiting Tables

Getting started with IT operations automation

Dec 13, 2023 By Amy Brennen In BigPanda

Tech companies face a daunting challenge: a staggering 90% of their IT teams are stuck doing mundane, repetitive tasks, leaving only 10% to focus on strategic innovation. Companies know that automation is the solution to these repetitive, low-level incident response actions; however, many need support to begin automating.

Read Post

BigPanda

Read more about Getting started with IT operations automation

The ultimate guide to incident management KPIs and metrics

Dec 13, 2023 By BigPanda In BigPanda

IT incident management aims to swiftly identify, address, and resolve IT disruptions to restore normal service operations. Tracking IT incident management key performance indicators (KPIs) is a vital step toward minimizing disruptions for customers and users. But there are several different KPI and metrics choices, and it’s not easy to identify the right ones that can drive meaningful improvements in incident management.

Read Post

BigPanda

Read more about The ultimate guide to incident management KPIs and metrics

Insights - xMatters Support

Dec 13, 2023 By xMatters In xMatters

Insights provide real-time, actionable suggestions to incident commanders and resolution teams to help them mitigate and resolve incidents faster, and also provide additional context during post-incident review. In this video, I’m going to show you how insights can provide helpful and informative suggestions to improve your resolution processes.

View Video

xMatters

Incident Management

Read more about Insights - xMatters Support

Adobe Experience Cloud Outage: The Impact of Relying on Third-party Services

Dec 12, 2023 By Shreedhar Shirgurkar In Catchpoint

On December 8, 2023, Adobe's extensive customer base was impacted by a series of outages in the Adobe Experience Cloud, starting from 8:00 AM EST and continuing until 1:45 AM EST on December 9. We haven't seen a third-party outage of this magnitude since the DoubleClick outage of 2018.

Read Post

Catchpoint

Read more about Adobe Experience Cloud Outage: The Impact of Relying on Third-party Services

The Debrief: Incident management for data teams

Dec 12, 2023 By incident.io In Incident.io

Read Post

Incident.io

Read more about The Debrief: Incident management for data teams

How BookMyShow Empowered SREs - Incidentally Reliable Podcast #incidentmanagement #devops #shorts

Dec 12, 2023 By Zenduty In Zenduty

Incidentally Reliable Episode 4 dropping this Thursday the 14th, chatting about BookMyShow's journey from inception to the entertainment behemoth it is today, their experience innovating at the forefront of the mobile and e-commerce revolutions, and their harmony with reliability in the colourful yet challenging world of movies. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about How BookMyShow Empowered SREs - Incidentally Reliable Podcast #incidentmanagement #devops #shorts

What is Mean Time to Resolution - and why does it matter?

Dec 12, 2023 By Amy Brennen In BigPanda

Mean Time to Resolution (MTTR) is a key performance indicator (KPI) that measures the average duration needed to restore normal operation for an application, service or piece of infrastructure component. Your MTTR directly impacts customer satisfaction, so you must have a keen understanding how it influences the reliability and availability of your services and applications to make informed decisions, enable operational efficiency, and ensure a seamless customer experience.

Read Post

BigPanda

Read more about What is Mean Time to Resolution - and why does it matter?

Incident vs Bug: Understanding the Key Differences

Dec 12, 2023 By Anjali Udasi In Zenduty

Incidents and bugs are two common occurrences that can disrupt the smooth operation of systems and applications. While these terms may seem similar, they represent distinct concepts with different implications. Understanding the nuances between incidents and bugs is crucial for effective incident management and proactive problem resolution.

Read Post

Zenduty

Read more about Incident vs Bug: Understanding the Key Differences

SIGNL4 Schedule Export

Dec 12, 2023 By SIGNL4 In SIGNL4

A short demo video on how to Export your SIGNL4 Schedule and import it into your personal calendar to help keep track of your scheduled shifts.

View Video

SIGNL4

Read more about SIGNL4 Schedule Export

What is Mean Time to Detect (MTTD) - and why does it matter for ITOps?

Dec 11, 2023 By Amy Brennen In BigPanda

Have you ever wondered about your IT team’s efficiency in detecting incidents? Your Mean Time to Detect (MTTD) is an incident management Key Performance Indicator (KPI) that reveals your productivity during the first stage of incident resolution and enables investigation into opportunities for improvement. ITOps and DevOps teams that can lower their MTTD can more quickly identify issues, minimize potential downtime, and maintain system reliability too.

Read Post

BigPanda

Read more about What is Mean Time to Detect (MTTD) - and why does it matter for ITOps?

Understanding IT event analytics: From basics to AIOps

Dec 11, 2023 By Nathan Bao In BigPanda

A wise person once said, “What’s measured is what matters.” This couldn’t be more true than in the high-stakes world of IT operations, where the ability to swiftly measure, analyze, and respond to events is crucial for improving IT operational performance. This blog delves into defining IT event analytics, guiding you on getting started, showcasing real-world examples, and introducing essential methods to transforming your incident response strategy.

Read Post

BigPanda

Read more about Understanding IT event analytics: From basics to AIOps

Where Intention Meets Sweet Innovation.

Dec 11, 2023 By StatusCast In StatusCast

Welcome to our latest video on system layering – where intention meets sweet innovation! Discover the delectable world of technology architecture as we unveil the secrets behind system layering, likened to the art of crafting a perfect cake. Just like each layer contributes to the overall masterpiece, each system layer plays a crucial role in creating a robust and efficient IT infrastructure.

View Video

StatusCast

Read more about Where Intention Meets Sweet Innovation.

ilert ChatOps: Check on-call status on Microsoft Teams

Dec 11, 2023 By iLert In iLert

There are multiple methods to configure the ilert on-call lookup for Microsoft Teams. In this video, we demonstrate how to designate the lookup specifically for a particular team and that team's chat.

View Video

iLert

Read more about ilert ChatOps: Check on-call status on Microsoft Teams

Winter safety tips for employees in private and public sectors

Dec 11, 2023 By Everbridge In Everbridge

Winter storms can significantly impact both private and public sectors, affecting their people, operations, and critical infrastructure. The NOAA stated that, in 2022 alone, the total cost of winter storms in the United States was 8.7 billion dollars.

Read Post

Everbridge

Read more about Winter safety tips for employees in private and public sectors

Demo Roundup! From Alert to ServiceNow with the PagerDuty Operations Cloud

Dec 9, 2023 By PagerDuty In PagerDuty

In the December edition of What's New in the PagerDuty Operations Cloud Demo Roundup, we'll see a flurry of new capabilities in action that accelerate and automate an unplanned incident.

View Video

PagerDuty

Read more about Demo Roundup! From Alert to ServiceNow with the PagerDuty Operations Cloud

Comparing Uptime Monitoring, Heartbeat Monitoring, and Synthetic Monitoring

Dec 8, 2023 By Chitra Bisht In Squadcast

In the quest for a high-velocity development environment, one fundamental question looms large: "How can you ensure an exceptional end-user experience when an array of engineers continually push and deploy code?" The unequivocal answer to this pivotal inquiry lies in the establishment of robust, straightforward, and well-defined monitoring practices.

Read Post

Squadcast

Read more about Comparing Uptime Monitoring, Heartbeat Monitoring, and Synthetic Monitoring

Incident tracking: How it works and why it matters for IT operations

Dec 8, 2023 By Amy Brennen In BigPanda

Constantly juggling IT incidents can be exhausting as you try to track and resolve them before they escalate into disruptions. With each incident demanding prompt and precise attention, keeping up takes significant work. However, you can manage these challenges more efficiently and with less stress and less risk by optimizing your incident-tracking process.

Read Post

BigPanda

Read more about Incident tracking: How it works and why it matters for IT operations

Fault Tolerance: What It Is & How To Build It

Dec 8, 2023 By Muhammad Raza In Splunk

Fault incidents are inevitable. They occur in any large-scale enterprise IT environment, especially when: In fact, research indicates, more than half (50%) the leaders in tech and business organizations consider the complexity of their data architecture a significant pain point. From an end-user perspective, businesses must overcome complex architecture in order to ensure service delivery and continuity.

Read Post

Splunk

Read more about Fault Tolerance: What It Is & How To Build It

Now in beta: alerting for modern DevOps teams

Dec 8, 2023 By Robert Ross In FireHydrant

Although FireHydrant has spent five years focused on what happens after your team (erg, I mean service 🙄) gets paged, the topic of alerting often comes up in discussions with our community. People are tired of paying big bucks for software that’s expensive, bloated, and hasn’t seen much innovation. Clearly, there’s a problem here – and we’re tackling it head on.

Read Post

FireHydrant

Read more about Now in beta: alerting for modern DevOps teams

Autocorrelate Alerts With Squadcast's Key-Based Deduplication

Dec 7, 2023 By Chitra Bisht In Squadcast

With the increasing complexity of technology stacks and monitoring tools, managing incidents can become overwhelming, leading to alert noise, alert fatigue, and delayed responses. This is where Key-Based Deduplication comes to the rescue, streamlining incident handling and enhancing the effectiveness of your Incident Management platform.

Read Post

Squadcast

Read more about Autocorrelate Alerts With Squadcast's Key-Based Deduplication

How to monitor resources in OneUptime?

Dec 7, 2023 By OneUptime In OneUptime

OneUptime can be used to monitor variety of resources - like API, Website, IP Addresses, Ports and more. This video talks about how all of this works and gives you a sneak peak.

View Video

OneUptime

Read more about How to monitor resources in OneUptime?

How to create an on-call policy and rotation in OneUptime?

Dec 7, 2023 By OneUptime In OneUptime

In this tutorial video, we walk you through the process of creating an on-call policy and rotation in OneUptime. We start by explaining what an on-call policy is and why it’s crucial for your organization. We then guide you step-by-step on how to set up a policy, including defining the policy name, setting the escalation rules, and adding users to the policy. Next, we delve into creating a rotation for the policy. We explain how to set the rotation length, start time, and participants. We also show you how to handle holidays and time-off requests within the rotation.

View Video

OneUptime

Read more about How to create an on-call policy and rotation in OneUptime?

How to build workflows in OneUptime and integrate OneUptime with anything?

Dec 7, 2023 By OneUptime In OneUptime

OneUptime is a complete open-source observability platform. It allows you to create workflows and integrate with over 5000 different services and products without writing any code. This integration capability allows OneUptime to connect with the rest of your software stack. Building workflows in OneUptime likely involves defining the sequence of operations that should occur based on certain triggers or conditions. These workflows can help automate processes, such as incident management, alerting the right people at the right time, and more.

View Video

OneUptime

Read more about How to build workflows in OneUptime and integrate OneUptime with anything?

How Zenduty Helps You Address Incidents - in 60 seconds.

Dec 7, 2023 By Zenduty In Zenduty

Zenduty is an end-to-end incident management platform that gives you greater control and automation over the incident management lifecycle.

View Video

Zenduty

Read more about How Zenduty Helps You Address Incidents - in 60 seconds.

When More Incident Commanders are Better

Dec 6, 2023 By Strong Liang In Rootly

It has been lightly revised and reposted with his permission from the original article on Medium. Leading major incident responses can be extremely stressful. You have to quickly gather an ad-hoc team, figure out what went wrong, identify a fix and make sure this doesn't make things worse, all the while with senior leadership breathing down your neck. Are we having fun yet? Many people think having a dedicated incident commander role will solve the problem.

Read Post

Rootly

Read more about When More Incident Commanders are Better

OnPage and Slack for Healthcare

Dec 6, 2023 By OnPage In OnPage

View Video

OnPage

Read more about OnPage and Slack for Healthcare

Captain's Log: Diving into our scheduling design

Dec 5, 2023 By Robert Ross In FireHydrant

On-call scheduling is tricky. Like, really tricky. It was one of the scariest parts when we decided to build a modern alerting system earlier this year. We knew we couldn't cut any corners on Day One of our release because it needed to be a fully loaded feature for someone to realistically use our product (and replace an incumbent). This meant including windowed restrictions, coverage requests, and simple to complex rotations.

Read Post

FireHydrant

Read more about Captain's Log: Diving into our scheduling design

SIGNL4 Onboarding: Call Routing

Dec 5, 2023 By SIGNL4 In SIGNL4

The SIGNL4 Onboarding series walks users through the process's of SIGNL4 from Signup to Alerts to Settings. Today's video focuses on routing incoming calls to your on-duty personnel. This video is packed with helpful tips to help you get the most out of your account.

View Video

SIGNL4

Read more about SIGNL4 Onboarding: Call Routing

On-Call Management Models

Dec 5, 2023 By Sirine Karray In iLert

In today's fast-paced digital landscape, incident management is crucial for maintaining operational excellence. During this process, on-call management models play a critical role in promptly addressing and resolving incidents. On-call management involves the organization of teams to ensure prompt response and resolution of incidents and is necessary to streamline incident resolution, ensure 24/7 availability, and allow for fair and transparent on-call rotations.

Read Post

iLert

Read more about On-Call Management Models

The Unplanned Show, Ep. 22: CSOps at PagerDuty with Arturo Suarez Martin

Dec 5, 2023 By PagerDuty In PagerDuty

Even with the best monitoring in the world, some customer-impacting issues still go undetected and are ultimately reported by customers. In this episode, we'll hear from PagerDuty's Senior Director of Global Support, Arturo Suarez Martin, about the journey that PagerDuty has been on to tighten feedback loops between Customer Support and Engineering and mitigate the risk of poor customer experiences.

View Video

PagerDuty

Incident Management

Read more about The Unplanned Show, Ep. 22: CSOps at PagerDuty with Arturo Suarez Martin

New line issue with markdowns

Dec 4, 2023 By OneUptime In OneUptime

View Video

OneUptime

Read more about New line issue with markdowns

Ping Command: A Comprehensive Guide to Network Connectivity Tests

Dec 4, 2023 By PagerTree In PagerTree

The ping network test, a core utility since the 80s, plays a crucial role in confirming connectivity between IP-networked devices. In this guide, we'll delve into what the ping command is, how to run a ping network test, common IP addresses to ping, interpreting results, and troubleshooting errors.

Read Post

PagerTree

Read more about Ping Command: A Comprehensive Guide to Network Connectivity Tests

OnPage-Slack Integration walkthrough

Dec 4, 2023 By OnPage In OnPage

Extend OnPage's incident alert management to Slack.

View Video

OnPage

Read more about OnPage-Slack Integration walkthrough

Events vs. Alerts vs. Incidents

Dec 4, 2023 By Meeta Lalwani In Virtana

Event. Alert. Incident. These terms are bandied about, often interchangeably, in IT operations management. Broadly speaking, they all refer to situations where something is potentially amiss and needs to be investigated and resolved. Each of these three words does, however, have a distinct definition. Because they are used in scenarios where clear communication and timeliness are critical, it’s important to understand the differences and use them appropriately.

Read Post

Virtana

Read more about Events vs. Alerts vs. Incidents

How "On Update" workflow works in OneUptime

Dec 4, 2023 By OneUptime In OneUptime

View Video

OneUptime

Read more about How "On Update" workflow works in OneUptime

Reducing the burden of incident response on your teams

Dec 1, 2023 By Incident.io In Incident.io

In this webinar, a panel of engineering leaders, including Chris Evans, CPO at incident.io, share how they reduce the burden of incident response for their teams. They advocate for a culture of shared responsibility across the board, offering practical strategies to educate the business about engineering practices during the chaos of an outage.

View Video

Incident.io

Read more about Reducing the burden of incident response on your teams

ilert ChatOps: Check on-call status on Slack

Dec 1, 2023 By iLert In iLert

The /il-oncall Slash command allows you to find out who is on-call from any Slack channel. This video demonstrates how to set up on-call lookup for a specific team and chat. Learn more about this ilert feature and its setup options here.

View Video

iLert

Read more about ilert ChatOps: Check on-call status on Slack

4 SRE Golden Signals (What they are and why they matter)

Dec 1, 2023 By Blameless Community In Blameless

SRE’s Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.

Read Post

Blameless

Read more about 4 SRE Golden Signals (What they are and why they matter)

Learn the Incident Response Life Cycle - Best Practices and Strategies

Dec 1, 2023 By Emily Arnott In Blameless

No company plans for a security breach, major outage, or other cyber incident, but they happen. When an incident occurs, having a standardized, regulated method of managing the fallout is critical. This is where the incident response life cycle comes in ‍

Read Post

Blameless

Read more about Learn the Incident Response Life Cycle - Best Practices and Strategies

Operations | Monitoring | ITSM | DevOps | Cloud

December 2023