December 2022

Sponsored Post

SLA Vs SLO: Tutorial & Examples

Dec 30, 2022 By Squadcast In Squadcast

Service level agreements (SLA) and service level objectives (SLO) are increasing in popularity because modern applications rely on a complex web of sub-services such as public cloud services and third-party APIs to operate, making service quality measurement an operational necessity for serving a demanding market. This article focuses on the similarities and differences between SLAs and SLOs, explains the intricacies involved in implementing them, presents a case study, and finally recommends industry best practices for implementing them.

Read Post

Squadcast

Read more about SLA Vs SLO: Tutorial & Examples

PagerTree Team Management

Dec 30, 2022 By PagerTree In PagerTree

This video shows the key information for PagerTree teams and how edit a team. PagerTree intelligent on-call alert routing gives teams flexible schedules, escalations, & reliable notifications via email, SMS, voice, chatbots, & smartphone app.

View Video

PagerTree

Read more about PagerTree Team Management

PagerTree Alert Workflow

Dec 30, 2022 By PagerTree In PagerTree

This video shows how data is ingested from 3rd party systems, transformed, and moves through PagerTree to ultimately notify Account Users. PagerTree intelligent on-call alert routing gives teams flexible schedules, escalations, & reliable notifications via email, SMS, voice, chatbots, & smartphone app.

View Video

PagerTree

Read more about PagerTree Alert Workflow

PagerTree OnCall Schedules

Dec 30, 2022 By PagerTree In PagerTree

This video shows how to schedule users on-call in PagerTree. We go over the following: PagerTree intelligent on-call alert routing gives teams flexible schedules, escalations, & reliable notifications via email, SMS, voice, chatbots, & smartphone app.

View Video

PagerTree

Read more about PagerTree OnCall Schedules

Looking back at our journey through 2022

Dec 30, 2022 By Squadcast Community In Squadcast

We are on the cusp of breaking into 2023🗓️with a bag full of interesting memories. Before we wrap up this year end's celebrations we'd like to look back and highlight some notable events that took place at Squadcast. ‍ Squadcast has grown leaps and bounds over the 12 months in our journey towards becoming an integrated Reliability Workflow platform. 😎

Read Post

Squadcast

Read more about Looking back at our journey through 2022

PagerTree Quick Start - Getting Started

Dec 30, 2022 By PagerTree In PagerTree

PagerTree intelligent on-call alert routing gives teams flexible schedules, escalations, & reliable notifications via email, SMS, voice, chatbots, & smartphone app. In this video, we'll cover the basics to get you started with PagerTree!

View Video

PagerTree

Read more about PagerTree Quick Start - Getting Started

Critical System Alerts via SIGNL4

Dec 29, 2022 By Derdack In Derdack

I recently had a call with a long-term customer who had been using Enterprise Alert for years without any major incidents. But in light of a recent proactive monitoring project, he also revisited Enterprise Alert and reached out to me to ask for my opinion on how he could improve the monitoring of Enterprise Alert from within the solution.

Read Post

Derdack

Read more about Critical System Alerts via SIGNL4

Squadcast + Hund Integration: A Simplified Approach for effective Alert Routing

Dec 28, 2022 By Vishal Padghan In Squadcast

Hund is a versatile Service Monitoring & Communication tool. It helps monitor services and keeps your audience informed about any status changes automatically through a status page. If you use Hund for monitoring and management requirements, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from Hund to the right users in Squadcast.

Read Post

Squadcast

Read more about Squadcast + Hund Integration: A Simplified Approach for effective Alert Routing

Getting Amazon GuardDuty alerts via SNS Endpoint

Dec 27, 2022 By Vishal Padghan In Squadcast

Monitoring your infrastructure and safeguarding it against threats is not easy. Setting up the infrastructure, monitoring, collecting and analyzing information for threat detection, is indeed a cumbersome process. This is where a security monitoring service like Amazon GuardDuty can help. In this blog, we will explore Amazon GaurdDuty service and discuss how integrating it with Squadcast can help you route alerts to the right users for quick and efficient incident response.

Read Post

Squadcast

Read more about Getting Amazon GuardDuty alerts via SNS Endpoint

Sponsored Post

Operations Management Is More Than Incident Management

Dec 26, 2022 By Amalya Shnaps In MoovingON

To many, incident management and operations management may seem similar though they differ significantly. This difference, which lies in their end goals, also suggests that operations management is much more than incident management. To better understand why, it helps to look at the purpose of each one.

Read Post

MoovingON

Read more about Operations Management Is More Than Incident Management

Season's Freezings: Automated Diagnostics with Jake Cohen

Dec 24, 2022 By PagerDuty In PagerDuty

Jake Cohen talks to Scott McAllister about automation in PagerDuty and the advancements in Automated Diagnostics.

View Video

PagerDuty

Read more about Season's Freezings: Automated Diagnostics with Jake Cohen

Sponsored Post

Incident Management for Digital Service Providers

Dec 23, 2022 By xMatters In xMatters

Digital service providers (DSP) are valued for their ability to provide access to digital content on demand. A high-quality customer experience and instant access to digital services are the greatest expectations of consumers and vital aspects of successful DSPs. Therefore, it's crucial that incidents, when they occur, don't impact your operations. With a robust incident management strategy, DSPs can provide their teams with tools for automating, coordinating, and quickly resolving issues without-or with minimal-service interruptions.

Read Post

xMatters

Read more about Incident Management for Digital Service Providers

Webinar: 2023 ITOps budgeting to win: use new research-based outage cost data

Dec 23, 2022 By BigPanda In BigPanda

It’s no secret that the digital transformation essentially broke IT operations. With the rise in technology came a rise in outages capable of bringing organizations to a screeching halt. Those outages are expensive, and for years, the same number was thrown around as the authority on how much an outage cost (around $5,600 per minute). This number took off and was used in presentations, sales decks and other resources for years. But how could this number have stayed the same year over year?

Read Post

BigPanda

Read more about Webinar: 2023 ITOps budgeting to win: use new research-based outage cost data

Maximize efficiency with Terraformer: Manage Squadcast resources via IaC

Dec 23, 2022 By Vardhan NS In Squadcast

Ever since Terraform was first launched by HashiCorp, infrastructure teams have been quick to leverage its functionality. Because deploying infrastructure via code became so much easier and error-free. This surely became a great way to deploy new infrastructure with custom configurations, but what about managing cloud infrastructure that is already defined? Can Terraform be used to make changes to them? Or can it be used to deploy the same configurations to new environments?

Read Post

Squadcast

Read more about Maximize efficiency with Terraformer: Manage Squadcast resources via IaC

Automation Seasons Freezings Wrap Up and New Year's Resolutions

Dec 22, 2022 By Madeline Stack In PagerDuty

It’s that time of year where you may feel pressured to pick your New Year’s resolutions. Well, we went ahead and tried to give you a head start. 2023 is the year we tame toil so we can focus on the fun stuff like engineering and innovation. Hopefully you have had the chance to follow along with us for the month of December for Seasons Freezings, the time of year you are locked out of production, so you have time to explore new ideas like automation 🙂.

Read Post

PagerDuty

Read more about Automation Seasons Freezings Wrap Up and New Year's Resolutions

You really like us: customer trust wins FireHydrant 3 G2 awards

Dec 21, 2022 By Robert Ross In FireHydrant

FireHydrant received three G2 Winter 2023 awards — High Performer, a High Performer in the Enterprise category, and a High Performer in the United Kingdom. We are honored to be recognized by G2 because these awards are based on customer reviews.

Read Post

FireHydrant

Read more about You really like us: customer trust wins FireHydrant 3 G2 awards

Alarm optimization - what SIGNL4 has to offer

Dec 21, 2022 By emily In SIGNL4

Having all relevant information pertaining to a critical incident is vital for quickly identifying the issue and prioritize its importance. SIGNL4 optimizes the perception, response and handling of incidents through customizable alerts with enriched parameters, images, sounds files, links to tickets or PDFs, as well as maps with geo-location information.

Read Post

SIGNL4

Read more about Alarm optimization - what SIGNL4 has to offer

Best Practices for API Versioning

Dec 21, 2022 By xMatters In xMatters

As your experience and knowledge of a system grow, change becomes inevitable. Your application requirements change, your bug fixes require code changes, and your APIs evolve. A key challenge in the software ecosystem is managing changes—especially when they concern APIs. Because you’re likely using APIs in multiple applications, you must document all updates and changes made to your APIs. This is where API versioning becomes crucial.

Read Post

xMatters

Read more about Best Practices for API Versioning

Why AIOps is the Connector Between Monitoring, Observability and Incident Management

Dec 20, 2022 By Richard Whitehead In Moogsoft

Over the years, as companies have moved from monolith to cloud-native architectures, maintaining high availability has become more challenging. After all, today’s IT ecosystems are complex, distributed and ephemeral, making it increasingly difficult (and, in many cases, downright impossible) for DevOps practitioners and SREs to identify and fix issues manually.

Read Post

Moogsoft

Read more about Why AIOps is the Connector Between Monitoring, Observability and Incident Management

Incident management vs. event management

Dec 20, 2022 By LogicMonitor In LogicMonitor

As you explore IT event management and IT incident management, they may look and even sound similar, but it’s essential to understand how they differ. Your IT management team needs to know what to look for, both in an event and an incident, so they can resolve any red-flag issues and return your system to normalcy. But why is it so important to recognize the difference?

Read Post

LogicMonitor

Read more about Incident management vs. event management

Data Aggregation

Dec 20, 2022 By Aman Swami In Zenduty

TL;DR: Data aggregation is the process of collecting and organizing large sets of data from multiple sources in order to provide a comprehensive view of a particular situation or system. It allows organizations to better understand and make sense of the vast amount of data being generated in the modern, highly connected world.

Read Post

Zenduty

Read more about Data Aggregation

Goodbye, 2022. Hello, 2023 - reflecting on a year of change, progress and incidents

Dec 20, 2022 By Chris Evans In Incident.io

Let’s get one thing out of the way: we’re going into 2023 on a high-note. We’ve closed deals with some of the most respected companies in both the UK and US, we’ve hired in the double-digits, expanded into New York, and revenue is growing steadily. But we aren’t hanging up our football boots just yet. Yes, we can take some time to celebrate our wins, but we’re all hands on deck for 2023 planning.

Read Post

Incident.io

Read more about Goodbye, 2022. Hello, 2023 - reflecting on a year of change, progress and incidents

The Critical Role of Intrusion Prevention Systems in Network Security

Dec 20, 2022 By Abdu Kibuuka In OnPage

An Intrusion Prevention System (IPS) is a network security and threat prevention tool. Its goal is to create a proactive approach to cybersecurity, making it possible to identify potential threats and respond quickly. IPS can inspect network traffic, detect malware and prevent exploits. IPS is used to identify malicious activity, log detected threats, report detected threats, and take precautions to prevent threats from harming users.

Read Post

OnPage

Read more about The Critical Role of Intrusion Prevention Systems in Network Security

11 unique insights into SLOs and reliability management

Dec 20, 2022 By Bashyam Anant In Sumo Logic

A quarter has passed since we launched our Reliability Management capabilities that help developers focus on defining, monitoring and managing Service Level Objectives (SLOs) to drive great digital experiences. Reducing alert fatigue and balancing innovation with reliability are common outcomes that customers expect from Reliability Management. If you are new to SLOs, these insights from our customers capture common practices among peer developers.

Read Post

Sumo Logic

Read more about 11 unique insights into SLOs and reliability management

What is AIOps: Prevent and resolve IT Outages

Dec 20, 2022 By BigPanda In BigPanda

The definition of AIOps continues to evolve, but understanding the fundamentals of how it works can help you keep up and invest in the right AIOps platform, tools, and features. According to Gartner, AIOps “combines big data and machine learning to automate IT operations processes”. Specifically, Gartner explains that “AIOps platforms analyze telemetry and events, and identify meaningful patterns that provide insights to support proactive responses”.

Read Post

BigPanda

Read more about What is AIOps: Prevent and resolve IT Outages

Public Demo - How to respond to incidents faster with ilert

Dec 19, 2022 By iLert In iLert

In this public demo, you can get a first overview of how our incident response platform works. Our CEO, Birol, will show you how to manage on-call, respond to incidents and communicate them via status pages using a single application. Learn how ilert helps you to increase service uptime and become an uptime hero.

View Video

iLert

Read more about Public Demo - How to respond to incidents faster with ilert

Sponsored Post

SRE Best Practices

Dec 16, 2022 By Squadcast In Squadcast

Site Reliability Engineering (SRE) is a practice that emerged at Google because of its need for highly reliable and scalable systems. SRE unifies operations and development teams and implements DevOps principles to ensure system reliability, scalability, and performance. There's plenty of documentation on tactics for adopting automation and implementing infrastructure as code, but practical ops-focused SRE best practices based on real-world experience are harder to find. This article will explore 6 SRE best practices based on feedback from SREs and technical subject matter experts.

Read Post

Squadcast

Read more about SRE Best Practices

Introduction to Kubernetes Imperative Commands

Dec 16, 2022 By Squadcast Community In Squadcast

Kubernetes was born out of the need to make our complex applications highly available, scalable, portable and deployable in small microservices independently. It also extends its capabilities to make adoption of DevOps processes and helps you set up modern Incident Response strategies to enhance the reliability of your applications.

Read Post

Squadcast

Read more about Introduction to Kubernetes Imperative Commands

Tickets Make Operations Unnecessarily Miserable

Dec 16, 2022 By Damon Edwards In PagerDuty

IT Operations has always been difficult. There is always too much work to do—and not enough time to do it. The frequent interruptions and high levels of toil certainly don’t help. Moreover, there is relentless pressure from executives that question why everything takes too long, breaks too often, and costs too much. In search of improvement, we have repeatedly bet on new tools to improve our work.

Read Post

PagerDuty

Read more about Tickets Make Operations Unnecessarily Miserable

Schedules | On-Call Rotations | Set up On-Call Schedules

Dec 15, 2022 By Squadcast In Squadcast

With Squadcast's schedules, You can choose to create as many on-call schedules to support your current team and system structures much like before. What’s new is that you can customize it to the color you want the schedule to reflect on the calendar.

View Video

Squadcast

Read more about Schedules | On-Call Rotations | Set up On-Call Schedules

Plesk 360 + Squadcast: Alert Routing Made Easy

Dec 15, 2022 By Vishal Padghan In Squadcast

Plesk is a popular web hosting platform that makes it easier for administrators to set up and manage websites. Its offering Plesk 360 empowers users to Monitor & Manage Servers more effectively. With its features like fully integrated site & server monitoring helps users keep track of performance and prevent downtime.

Read Post

Squadcast

Read more about Plesk 360 + Squadcast: Alert Routing Made Easy

Learn from 50,000 incidents with the first Incident Benchmark Report

Dec 15, 2022 By Robert Ross In FireHydrant

Using anonymized data from 50,000 incidents, the Incident Benchmark Report reveals insights into the when, what, who, and how behind incidents and highlights behaviors that correlate to faster response times.

Read Post

FireHydrant

Read more about Learn from 50,000 incidents with the first Incident Benchmark Report

A New Era for Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Our new brand refresh conveys trust and simplicity in a playful, energetic way — representing our team and product.

View Video

Squadcast

Read more about A New Era for Squadcast

Tagging & Routing at Squadcast | Incident Management | Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Event Tagging is a rule-based, auto-tagging system with which you can define customized tags based on incident payloads, that get automatically assigned to incidents when they are triggered. Auto-add relevant information like priority, severity or alert type to make incoming incidents context-rich. Route alerts to the right responder(s) based on the tags they carry

View Video

Squadcast

Read more about Tagging & Routing at Squadcast | Incident Management | Squadcast

Escalation Policy I Round Robin & Advanced Escalations I Incident Assignment Strategies I Squadcast

Dec 15, 2022 By Squadcast In Squadcast

An escalation policy is a collection of rules used to define how and when an incident should be escalated. In Squadcast an Incident escalation happens when a responder hands off the task/incident to another member, and this handoff is subject to specific rules. This video explains how to set up Escalation Policies, and Round Robin Incident Assignment Strategy in Squadcast.

View Video

Squadcast

Read more about Escalation Policy I Round Robin & Advanced Escalations I Incident Assignment Strategies I Squadcast

Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Teams using MS Teams can now integrate with Squadcast and easily Acknowledge, Resolve & Reassign incidents using MS Teams. You can configure Squadcast to send a notification to the configured MS Teams channel as soon as an incident is triggered.

View Video

Squadcast

Read more about Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

Creating Routing Rules I Creating Incident Routing Flows I Alert Routing I Event Tags I Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Alert Routing allows you to configure Routing Rules to ensure that alerts are routed to the right responder with the help of event tags attached to them. This video explains how you can utilise Routing rules to create various incident routing flows.

View Video

Squadcast

Read more about Creating Routing Rules I Creating Incident Routing Flows I Alert Routing I Event Tags I Squadcast

Integrating Slack & Squadcast- Trigger, Acknowledge, Resolve & Reassign incidents from Slack channel

Dec 15, 2022 By Squadcast In Squadcast

You can integrate Squadcast and Slack to collaborate efficiently with your team while working on incidents. Squadcast sends a notification to the configured Slack Channel as soon as an incident is triggered.

View Video

Squadcast

Read more about Integrating Slack & Squadcast- Trigger, Acknowledge, Resolve & Reassign incidents from Slack channel

Alert Suppression Rules in Squadcast to prevent Alert fatigue | Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Alert suppression can help you avoid alert fatigue by suppressing notifications for non-actionable alerts. Squadcast will suppress the incidents that match any of the Suppression Rules you create for your Services. These incidents will go into the Suppressed state and you will not get any notifications for them.

View Video

Squadcast

Read more about Alert Suppression Rules in Squadcast to prevent Alert fatigue | Squadcast

Using StatusPage at squadcast | SRE Best practices | Squadcast

Dec 15, 2022 By Squadcast In Squadcast

Let your customers know how your Services are doing, without them having to ask you about it. One of the core principles of SRE is Transparency and Status Pages help you communicate the status of your Services to your customers at all times, as opposed to you getting to know the status of your Services through support tickets logged by your customers.

View Video

Squadcast

Read more about Using StatusPage at squadcast | SRE Best practices | Squadcast

APImetrics + Squadcast: Routing Alerts Made Easy

Dec 14, 2022 By Vishal Padghan In Squadcast

APImetrics is an API Compliance, Monitoring and Security solution that lets you make and run API calls or sequences of API calls (workflows) from external, remote cloud locations using exactly the same security configurations as a typical end user would use. If you use APImetrics for API calling requirements, you can integrate it with Squadcast, an end-to-end incident response tool, to route detailed alerts from APImetrics to the right users in Squadcast.

Read Post

Squadcast

Read more about APImetrics + Squadcast: Routing Alerts Made Easy

SRE Maturity Model: How Do You Assess Your Team?

Dec 14, 2022 By Myra Nizami In Blameless

How do you evaluate your SRE team’s progress in implementing SRE? We discuss the key SRE indicators for evaluating your team’s progress in the SRE maturity model. ‍ What is the SRE maturity model? ‍ The SRE maturity model is a way of judging how far you are in implementing SRE principles. It is a method used by teams to understand where they ought to implement more SRE best practices to reach greater SRE maturity.

Read Post

Blameless

Read more about SRE Maturity Model: How Do You Assess Your Team?

"Just get on with it!" - The Horrors of Task Prioritization

Dec 14, 2022 By Aman Swami In Zenduty

Learn how to prioritize tasks, get stuff moving by performing non-blocker tasks first, effectively create postmortems, perform RCAs faster and not have an overburdened high priority(P0) dashboard. The below article should help you plan your product/feature launch faster without having to compromise on the reliability of the existing services.

Read Post

Zenduty

Read more about "Just get on with it!" - The Horrors of Task Prioritization

Doing More with Less: Building Greater Operational Efficiency with PagerDuty

Dec 14, 2022 By Nancy Lee In PagerDuty

How many of us can say with confidence that we know a tool inside and out? If you’re like most, you probably use just a small fraction of a product’s features. When it comes to feature-rich software like Microsoft Word or Excel, it’s a safe bet that most users are aware of less than half of the features, and use even less on a regular basis. And the longer we’ve been using a piece of software, the more likely we fall into this trap of feature underutilization.

Read Post

PagerDuty

Read more about Doing More with Less: Building Greater Operational Efficiency with PagerDuty

How to design an effective incident on-call program

Dec 13, 2022 By Blameless In Blameless

If anyone on your team has paged a colleague in the middle of the night, your DevOps team has an incident on-call program. Whether that team member knew who to page, and felt comfortable sending the page, is indicative of your on-call program's effectiveness. Join Thai Wood, founder of Resilience Roundup, and Matt Davis, SRE Advocate at Blameless, to discuss: This webinar was recorded live on December 13, 2022.

View Video

Blameless

Read more about How to design an effective incident on-call program

Season's Freezings: Change Freezes with Rich Lafferty

Dec 13, 2022 By PagerDuty In PagerDuty

PagerDuty Staff SRE Rich Lafferty joins Scott McAllister and Mandi Walls for a session on Change Freezes, why PagerDuty does them, and how we manage change during times when a majority of folks are out of the office.

View Video

PagerDuty

Read more about Season's Freezings: Change Freezes with Rich Lafferty

What is an Incident Commander in ITSM?

Dec 13, 2022 By iLert In iLert

Incident Commanders play a crucial role in the successful operation of IT service management (ITSM) teams. By applying best practices, they can ensure that incidents are handled quickly and efficiently, so that downtime for end users is kept to a minimum. ‍ This article provides an overview of the requirements for an effective Incident Commander in ITSM. It discusses the skills and competencies needed for effective incident management, and highlights some best practices for this role.

Read Post

iLert

Read more about What is an Incident Commander in ITSM?

Kubernetes Lens: Improving Operational Awareness of Kubernetes Clusters

Dec 13, 2022 By Ritika Bramhe In OnPage

Kubernetes Lens is an integrated development environment (IDE) that allows users to connect and manage multiple Kubernetes clusters on Mac, Windows, and Linux platforms. It is an intuitive graphical interface that allows users to deploy and manage clusters directly from the console. It provides dashboards that display key metrics and insights into everything running on a cluster, including deployments, configurations, networking, storage, and access control.

Read Post

OnPage

Read more about Kubernetes Lens: Improving Operational Awareness of Kubernetes Clusters

A New Era for Squadcast

Dec 12, 2022 By Anusuya Kannabiran In Squadcast

Our new brand design conveys trust and simplicity in a playful, energetic way - representing our team and product. Get a behind-the-scenes look at our makeover and what it means to our customers' experiences.

Read Post

Squadcast

Read more about A New Era for Squadcast

Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

Dec 12, 2022 By Squadcast In Squadcast

With Squadcast, you can define and monitor Service Level Objects for your services. SLOs allow you to define and enforce an agreement between two parties regarding the delivery of a given service. A Service Level Objective (SLO) is a reliability target, measured by a Service Level Indicator (SLI), and sometimes serves as a safeguard for a Service Level Agreement (SLA). SLOs represent customer happiness and guide the development team’s velocity.

View Video

Squadcast

Read more about Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

Introduction to Service Catalog | Service Ownership | Service Classification | Squadcast

Dec 12, 2022 By Squadcast In Squadcast

To make service management a breeze, we bring to you our improved Service Catalog. The Service Catalog is designed to improve Service Classification and bring more transparency to Service Ownership within your org. This video explains how a consolidated summary of all active services from a single dashboard can help you better track your service health.

View Video

Squadcast

Read more about Introduction to Service Catalog | Service Ownership | Service Classification | Squadcast

Squadcast Product Demo

Dec 11, 2022 By Squadcast In Squadcast

Squadcast is the Only integrated platform that unites on-call alerting and incident management along with Site Reliability Engineering (SRE) workflows under one hood and, in turn, automates human tasks efficiently.

View Video

Squadcast

Read more about Squadcast Product Demo

The founder's story: a trip down memory lane - incident.fm

Dec 9, 2022 By Incident.io In Incident.io

In this podcast, the three incident.io co-founders Stephen, Chris and Pete take a trip down memory lane, revisiting the story of how they came to found incident.io and the major milestones of the first 12 months in business. Key topics/timestamps.

View Video

Incident.io

Incident Management

Read more about The founder's story: a trip down memory lane - incident.fm

The founder's story: a trip down memory lane

Dec 9, 2022 By Chris Evans In Incident.io

Read Post

Incident.io

Read more about The founder's story: a trip down memory lane

Recapping this year's AWS re:Invent 2022

Dec 9, 2022 By Ritika Bramhe In OnPage

Amazon recently concluded their five-day long conference, AWS re:Invent 2022. This year’s conference was hybrid with the company streaming a significant portion of their in-person conference for free. For ten years now, the event has seen attendees across the cloud continuum come together to learn, share and get inspired. This year was no different as we saw some of the biggest names in cloud computing make their presence felt at the conference in Las Vegas.

Read Post

OnPage

Read more about Recapping this year's AWS re:Invent 2022

Taking incident management to the next level with an internal developer portal

Dec 8, 2022 By Cortex In Cortex

There is no denying that incident management is one of the most crucial processes concerning the service and business aspects of software deployment. Not having a robust system in place to address and remedy unfortunate incidents can lead to user dissatisfaction, which can ultimately take a toll on your business metrics. A suboptimal management system can also have adverse impacts internally if it prioritizes efficiency and speed of recovery to the point of neglecting employee well-being.

Read Post

Cortex

Read more about Taking incident management to the next level with an internal developer portal

Tag You're It: Organized, Configurable Tagging is a Must-do for Great Incident Analytics.

Dec 8, 2022 By Aaron Lober In Blameless

Wouldn’t it be nice to learn which parts of your service see the most incidents, or why one service experiences more Sev1 incidents than the others? It’s not always easy to see the full disruptive impact of an engineering incident. Even harder to see trends across incidents and over time. Developing incident insights that you can use to help guide and shape the way your team designs and operates your product takes time, careful consideration, team engagement and the right tooling.

Read Post

Blameless

Read more about Tag You're It: Organized, Configurable Tagging is a Must-do for Great Incident Analytics.

PagerDuty App for ServiceNow: Extend ITSM with Real-Time Digital Operations

Dec 8, 2022 By PagerDuty In PagerDuty

Watch this demo to learn about extending your ITSM solution with Real-Time Digital Operations via the PagerDuty App for ServiceNow. You'll learn about what you will get out of the box and will see the integration in action.

View Video

PagerDuty

Read more about PagerDuty App for ServiceNow: Extend ITSM with Real-Time Digital Operations

AIOps for Managed Service Providers

Dec 8, 2022 By Interlink In Interlink

AIOps for Managed Service Providers: modernize and monetize your monitoring offering. For Managed Service Providers, a highly competitive market and the rapidly changing digital landscape can present a threat or an opportunity.

View Video

Interlink

Read more about AIOps for Managed Service Providers

Let's talk bugs versus incidents

Dec 7, 2022 By Jouhné Scott In FireHydrant

In this post, we’ll dig into the difference between a bug and an incident, why alignment on how they are defined matters, and how to ensure you’re still learning from the issue, even if it’s “just a bug.”

Read Post

FireHydrant

Read more about Let's talk bugs versus incidents

Swimlane Frameworks and Diagrams for Structured Incident Resolution

Dec 7, 2022 By Blameless Community In Blameless

Orchestrate incident resolution with swimlane software that offers customizable frameworks, unifying your team's diagnostic efforts.

Read Post

Blameless

Read more about Swimlane Frameworks and Diagrams for Structured Incident Resolution

Initiating An Incident - xMatters Support

Dec 7, 2022 By xMatters In xMatters

When things go sideways, you need to rally your team as quickly as possible, and give them the information they need to resolve the problem. In this video, learn how to manually initiate an incident in the UI using the basic built-in form, and update details as the incident progresses.

View Video

xMatters

Incident Management

Read more about Initiating An Incident - xMatters Support

Sponsored Post

Outages ITOps professionals are thankful to avoid

Dec 6, 2022 By meshIQ In meshIQ

As we settle into the time of year when we reflect on what we're thankful for, we tend to focus on important basics such as health, family and friends. But on a professional level, IT operations (ITOps) practitioners are thankful to avoid disastrous outages that can cause confusion, frustration, lost revenue and damaged reputations. The very last thing ITOps, network operations center (NOC) or site reliability engineering (SRE) teams want while eating their turkey and enjoying time with family is to get paged about an outage. These can be extremely costly - $12,913 per minute, in fact, and up to $1.5 million per hour for larger organizations.

Read Post

meshIQ

Read more about Outages ITOps professionals are thankful to avoid

How to choose an incident management software

Dec 6, 2022 By Simran Achpal In Freshservice

The ITIL definition of an incident is “an unplanned interruption to or a quality reduction of an IT service”. In your IT ecosystem, an incident may be caused due to a malfunctioning asset, or a network failure. Common incidents include issues with the printer, Wi-Fi connectivity, application locks, email service, laptop, file sharing, unresponsive servers, or even authentication errors.

Read Post

Freshservice

Read more about How to choose an incident management software

Best practices for on-call scheduling and management

Dec 6, 2022 By Cortex In Cortex

An on-call schedule forms the backbone of your incident response system in the event of an outage or when an issue is raised. This type of schedule does not keep end-users waiting and helps maintain the reliability and availability of your software. However, on-call management practices often induce worry and anxiety in team members. In extreme cases, it can even be a contributing factor in employee burnout.

Read Post

Cortex

Read more about Best practices for on-call scheduling and management

5 tips for a more modern and efficient on-call management

Dec 6, 2022 By iLert In iLert

‍ On-call management is one of the most important aspects of seamless IT service. Its aim is to ensure that the right person is notified in the case of an incident, so that they can react accordingly as quickly as possible. In certain cases, many people have to be notified. To achieve this as efficiently as possible, it is vital to have an up-to-date and smoothly functioning system.

Read Post

iLert

Read more about 5 tips for a more modern and efficient on-call management

ITIL and CI/CD

Dec 6, 2022 By BigPanda In BigPanda

In the world of IT, there are two main approaches to managing changes—the information technology infrastructure library (ITIL) and continuous integration and continuous delivery/deployment (CI/CD). Both have their own benefits and drawbacks, so it’s important to understand the difference between them before deciding which one is right for your organization. In this article, learn about the difference between CI/CD and ITIL, and find out which approach is best for your needs.

Read Post

BigPanda

Read more about ITIL and CI/CD

Toil: Still Plaguing Engineering Teams

Dec 6, 2022 By Damon Edwards In PagerDuty

Our industry has always had localized expressions for work that was necessary but didn’t move the company forward. The SRE movement calls this type of work “toil.” The concept of toil is a unifying force because it provides an impartial framework for identifying — then containing — the work that takes up our time, blocks people from fulfilling their engineering potential, and doesn’t move the company forward.

Read Post

PagerDuty

Read more about Toil: Still Plaguing Engineering Teams

Cyber, incident, downtime: Three words that chill the board, and how to tame them

Dec 2, 2022 By PagerDuty In PagerDuty

There are three words that every member around a boardroom table fears when they hear them strung together: "Cyber... incident... downtime". They are never the precursor to a good meeting! Technology incidents can leave the business in the dark and bring the wheels of industry grinding to a halt. With no operational systems, a Gartner report found that companies can lose up to half a million dollars per hour from severe incidents based on losses and remediation.

Read Post

PagerDuty

Read more about Cyber, incident, downtime: Three words that chill the board, and how to tame them

DERDACK SIGNL4 for Microsoft Sentinel, Defender for Cloud and more

Dec 2, 2022 By SIGNL4 In SIGNL4

Doreen talks us through the value-add of SIGNL4 for MSPs and enterprise customers of Microsoft Security products and how SIGNL4 facilitates an automated and seamless 24/7 oncall management experience. Derdack SIGNL4 is a member of the Microsoft Intelligent Security Alliance (MISA).

View Video

SIGNL4

Read more about DERDACK SIGNL4 for Microsoft Sentinel, Defender for Cloud and more

PagerDuty Operations Cloud Delivers Process Automation on AWS, Delivering Rapid Return on Investment and Better Customer Experience

Dec 1, 2022 By PagerDuty In PagerDuty

Automated Diagnostics for AWS Customers Reduces Manual Work, Improves Resiliency, Enables Consolidation on PagerDuty.

Read Post

PagerDuty

Read more about PagerDuty Operations Cloud Delivers Process Automation on AWS, Delivering Rapid Return on Investment and Better Customer Experience

PagerDuty Incident Workflows for Automated Incident Response Demo

Dec 1, 2022 By PagerDuty In PagerDuty

Leverage Incident Workflows to automate your incident response process. Enjoy a demo of a use case that introduces how to standardize major incident workflows across all P1 and P2 incidents.

View Video

PagerDuty

Read more about PagerDuty Incident Workflows for Automated Incident Response Demo

Operations | Monitoring | ITSM | DevOps | Cloud

December 2022