Monthly Archive

Using SLOs to Increase Software Reliability

Aug 31, 2022 By Sanjana Gagrani In SolarWinds

The partnership between Nobl9 and SolarWinds® Pingdom® is the bridge between business and technology everyone’s been waiting for.

Read Post

SolarWinds

Read more about Using SLOs to Increase Software Reliability

Using incidents to level up your teams

Aug 31, 2022 By Lisa Karlin Curtis In Incident.io

I joined GoCardless as a junior engineer. It was one of my first coding jobs, and in my time there I progressed to senior much faster than I had expected. When I reflect on how this happened, one pattern stands out to me; the big step changes in my understanding, and my ability to solve larger and more complex engineering problems, came as a result of incidents.

Read Post

Incident.io

Read more about Using incidents to level up your teams

What's New: Updates to PagerDuty Process Automation Software & PagerDuty Runbook Automation, Integrations, and More!

Aug 31, 2022 By Vera Chan In PagerDuty

We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent development and app updates from the product team include PagerDuty® Process Automation, our Partner Integrations and App Ecosystem, as well as Community & Advocacy Events updates. We continue to help customers automate everywhere to optimize cloud operations and reduce the amount of issues escalated to other teams.

Read Post

PagerDuty

Read more about What's New: Updates to PagerDuty Process Automation Software & PagerDuty Runbook Automation, Integrations, and More!

Creating a 2-way connection between SIGNL4 and Logic Monitor

Aug 31, 2022 By SIGNL4 In SIGNL4

A walkthrough of using the SIGNL4 connector to connect to Logic Monitor for 2 way connectivity.

View Video

SIGNL4

Read more about Creating a 2-way connection between SIGNL4 and Logic Monitor

RESOLVE '22: Warp speed to digital innovation

Aug 31, 2022 By Evan Freedman In BigPanda

The pandemic accelerated digital transformation in the business world by forcing companies to double down on areas in which they’d already begun investing. The mass move to video conferencing solutions in industries such as healthcare and education are two examples. In other industries, companies were only able to survive by jumping into completely new areas: brick-and-mortar retailers diving feet-first into e-commerce after lockdowns and health concerns kept shoppers indoors, for example.

Read Post

BigPanda

Read more about RESOLVE '22: Warp speed to digital innovation

RESOLVE '22: Bit by bit

Aug 30, 2022 By Ronnel Vergara In BigPanda

It is difficult to define a single, solid maturity model for IT Operations. As moderator Jason Walker, BigPanda’s COO, said in our RESOLVE ’22 event Bit by bit, maturity models in “almost every other domain of IT” have not turned into a workable set of guideposts and indicators in the Ops domain. We welcomed Insurity’s Lead Cloud Operations Performance & Monitoring Admin, Ronnel Vergara, to take the stage and talk over this high-level topic at our event.

Read Post

BigPanda

Read more about RESOLVE '22: Bit by bit

Round Robin Escalation: An Efficient Way to Distribute On-Call Responsibilities

Aug 30, 2022 By Vishal Padghan In Squadcast

Nowadays, organizations address a high volume of incidents everyday. With so much happening, responders can be overwhelmed by the volume of incidents and may end up de-prioritizing certain important incidents. Hence, it is important to have an efficient on-call scheduling and escalation process in place. In this blog, we will explore how Round Robin Escalations can help distribute on-call load and set up efficient on-call schedules. This blog covers the following pointers.

Read Post

Squadcast

Read more about Round Robin Escalation: An Efficient Way to Distribute On-Call Responsibilities

Bridging the gap between Engineering and Customer Support during incidents

Aug 30, 2022 By incident.io In Incident.io

Customer trust and satisfaction are the most important currency your business can own. No matter how brilliant your product, without happy customers your business will struggle. When everything is running smoothly, it’s easy to feel that heady dose of customer love. It’s when things break during an incident that these relationships are really put to the test.

Read Post

Incident.io

Read more about Bridging the gap between Engineering and Customer Support during incidents

The Five Main Components of a Fully Developed EHR System

Aug 30, 2022 By OnPage Corporation In OnPage

The adoption of electronic health record (EHR) systems has seen tremendous growth across geographies, especially in the US. According to American Hospital Association data shared by the Office of the National Coordinator for Health Information Technology, over 93% of American hospitals are enabled by some form of EHR in their organization. Implementing an EHR system in your clinic or hospital is a big decision.

Read Post

OnPage

Read more about The Five Main Components of a Fully Developed EHR System

Get started with Grafana OnCall and Terraform

Aug 29, 2022 By Innokentil Konstantinov In Grafana

Managing on-call schedules and escalation chains, especially across many teams, can get cumbersome and error prone. This can be especially difficult without as-code workflows. Here on the Grafana OnCall team, we’re focused on making Grafana OnCall as easy to use as possible. We want to make it easier to reduce errors with your on-call schedules, create schedule and escalation templates quickly, and fit on-call management into your existing as-code patterns.

Read Post

Grafana

Read more about Get started with Grafana OnCall and Terraform

Healthchecks + Squadcast Integration: Routing Alerts Made Easy

Aug 26, 2022 By Vishal Padghan In Squadcast

Healthchecks is a cron job monitoring service which listens to HTTP requests and email messages ("pings") from your cron jobs and scheduled tasks ("checks"). It lets you update your job to send an HTTP request to the ping URL every time the job runs. When your job does not ping Healthchecks.io on time, then you will receive an alert! If you use Healthchecks for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Healthchecks to the right users in Squadcast.

Read Post

Squadcast

Read more about Healthchecks + Squadcast Integration: Routing Alerts Made Easy

Introduction to Service Catalog | Service Ownership | Service Classification Squadcast

Aug 26, 2022 By Squadcast In Squadcast

To make service management a breeze, we bring to you our improved Service Catalog. The Service Catalog is designed to improve Service Classification and bring more transparency to Service Ownership within your org. This video explains how a consolidated summary of all active services from a single dashboard can help you better track your service health.

View Video

Squadcast

Read more about Introduction to Service Catalog | Service Ownership | Service Classification Squadcast

How To Reduce Incident Tickets

Aug 25, 2022 By xMatters In xMatters

In IT environments, incidents happen all the time and it's impossible to prevent all of them. Regardless of the available software solutions or the level of technical training of both users and developers, no organization is immune to incidents. The increased dependence on IT infrastructure to provide core services means that any disruption in IT services can cause any organization significant financial and reputational harm. For example, IT service providers need to resolve customer support tickets following the service-level agreements (SLAs), and failing to do so makes them liable for breaching such agreements.

Read Post

xMatters

Read more about How To Reduce Incident Tickets

What are Runbooks? And why are they needed?

Aug 25, 2022 By Vardhan NS In Squadcast

Imagine being an Ops engineer in a team just struck by tragedy. Alarms start ringing, and incident response is in full force. It may sound like the situation is in control. WRONG! There's panic everywhere. The on-call team is scrambling for the heavenly door to redemption. But, the only thing that doesn't stop - Stakeholder Inquiries. This situation is bad. But it could be worse. Now imagine being a less-experienced Ops engineer in a relatively small on-call team struck by tragedy. If you don't have sufficient guidance, let alone moral support- you're toast.

Read Post

Squadcast

Read more about What are Runbooks? And why are they needed?

RESOLVE '22: Expert predictions for AIOps 2022-2025

Aug 25, 2022 By Fred Koopmans In BigPanda

BigPanda’s RESOLVE ‘22 conference hosted a number of luminaries in the AIOps and IT Ops world, so naturally we needed to get their thoughts on the future of the market and where they see AIOps going in the next few years. Our guests for the session titled Expert predictions for AIOps 2022-2025 were from the press, investor community, analyst community and vendor world.

Read Post

BigPanda

Read more about RESOLVE '22: Expert predictions for AIOps 2022-2025

Using StatusPage at squadcast | SRE Best practices | Squadcast

Aug 25, 2022 By Squadcast In Squadcast

Let your customers know how your Services are doing, without them having to ask you about it. One of the core principles of SRE is Transparency and Status Pages help you communicate the status of your Services to your customers at all times, as opposed to you getting to know the status of your Services through support tickets logged by your customers.

View Video

Squadcast

Read more about Using StatusPage at squadcast | SRE Best practices | Squadcast

What are Canary Deployments and Why are they Important?

Aug 25, 2022 By Vishal Padghan In Squadcast

Every modification to software comes with the potential for production problems. Application failures often have serious consequences which can result in a loss of revenue and a poor customer experience. Additionally, organizations constantly try to improve their services for a better customer experience. How can you minimize the chance of error and update your application with confidence?

Read Post

Squadcast

Read more about What are Canary Deployments and Why are they Important?

incident.io + Indent - on-demand system access

Aug 25, 2022 By incident.io In Incident.io

At incident.io, we empower teams to run incidents quickly and effectively from start to finish. One of the ways we help is by taking the manual admin out of your incidents. More often than not, folks are spending too much time thinking about the process, when the time would be better spent focusing on fixing. Our automated workflows, nudges and prompts help to embed best practices and unlock time for more impactful work.

Read Post

Incident.io

Read more about incident.io + Indent - on-demand system access

Mattermost Playbooks How-to: OKR Management

Aug 25, 2022 By Elli Ludwigson In Mattermost

Creating, managing, and tracking high level goals can be incredibly burdensome and complex for organizations with numerous stakeholders and cross-functional collaboration. Team leads and executives manage multitudes of reporting tools and departments while contributors often have little visibility into the process of creating goals or the progress towards achieving those goals.

Read Post

Mattermost

Read more about Mattermost Playbooks How-to: OKR Management

Performing Postmortems & Postmortem Templates at Squadcast | SRE Best practices | Squadcast

Aug 25, 2022 By Squadcast In Squadcast

Postmortems are a way to summarize the resolution for an incident once it is resolved. It is also a way for you to create a knowledge-base of failures and fixes that can be shared across your team to help build a culture of shared learning and learning from failures.

View Video

Squadcast

Read more about Performing Postmortems & Postmortem Templates at Squadcast | SRE Best practices | Squadcast

Feeling zen, finding DORA, and the policy police

Aug 24, 2022 By incident.io In Incident.io

We’ve had a bumper month here at incident.io HQ. We’ve welcomed 3 new joiners, celebrated two 1 year incident.io anniversaries (congrats Lisa and Lawrence!), released a whole load of exciting new features and (for those of you wondering what’s been causing the recent heatwave) we’ve redesigned our website and it is on fire 🔥 😎 Here’s a round-up of some of this month's highlights…

Read Post

Incident.io

Read more about Feeling zen, finding DORA, and the policy police

Updating our data stack

Aug 24, 2022 By Jack Cook In Incident.io

It’s been over 6 months since Lawrence’s excellent blog post on our data stack here at incident.io, and we thought it was about time for an update. This post runs through the tweaks we’ve made to our setup over the past 2 months and challenges we’ve found as we’ve scaled from a company of 10 people to 30, now with a 2 person data team (soon to be 3 - we’re hiring)!

Read Post

Incident.io

Read more about Updating our data stack

Defining a Strategy for Process Automation

Aug 24, 2022 By xMatters In xMatters

As business systems grow to encompass more locations, tools, and organizations, defining processes that keep pace with these changes can’t be left to a hodgepodge of disconnected programs—or worse, manual implementation of paper documentation. You need to automate. Automation within businesses first arose in the 1960s, alongside resource planning systems.

Read Post

xMatters

Read more about Defining a Strategy for Process Automation

PagerDuty Service Standards helps organizations better configure services at scale

Aug 23, 2022 By Hannah Culver In PagerDuty

Service ownership, a DevOps best practice, is a method that many companies are pivoting towards. The benefits of service ownership are varied and include boons such as bringing development teams much closer to their customers, the business, and the value being delivered. The “build it, own it model” has tangible effects on customer experience, as developers are incentivized to innovate and drive customer-facing features that delight.

Read Post

PagerDuty

Read more about PagerDuty Service Standards helps organizations better configure services at scale

RESOLVE '22: AIOps: Not just a buzz phrase anymore

Aug 22, 2022 By Ken Serembus, Sean McDermott and Isaac Sacolick In BigPanda

Thinking back to the rapidly expanding tech world of the 2010s, it’s easy to list off a number of buzzwords and phrases that became IT Ops mainstays over time. “Internet of things,” “big data” and even ideas as simple as the cloud were all once considered little more than slick marketing talk.

Read Post

BigPanda

Read more about RESOLVE '22: AIOps: Not just a buzz phrase anymore

Mean Time to Recovery (MTTR) explained

Aug 22, 2022 By Sleuth In Sleuth

It's Friday afternoon, and you have mail. Apparently, a user received a 500 error when attempting to sign in. She contacted Customer Service. They didn't know what to do, so they forwarded the email to your engineering team. A close look at the email thread reveals that Customer Service received it... on Tuesday. And they sat on it until today. ‍ Hopefully, it was just this one user. You open your browser, navigate to the web application, and attempt to sign in. You also get a 500 error.

Read Post

Sleuth

Read more about Mean Time to Recovery (MTTR) explained

What is on-call, and why is it important?

Aug 21, 2022 By isDown In isDown

Your company has a product/service that needs to be up and running 24/7 or serving customers worldwide? Heads up, you might need an on-call team. In this article, we’ll start with the basics of what is on-call and why it is important.

Read Post

isDown

Read more about What is on-call, and why is it important?

PagerDuty and Arize: Integrations for ML Observability

Aug 19, 2022 By PagerDuty In PagerDuty

Arize is an ML Observability platform aimed to detect, troubleshoot, and eliminate ML problems faster. Use Arize to monitor your production models and send alerts to PagerDuty when your models deviate from a certain threshold. Arize and Pagerduty help keep your teams in the loop, send more comprehensive metadata through alerts, and debug your models faster than ever before.

View Video

PagerDuty

Read more about PagerDuty and Arize: Integrations for ML Observability

RESOLVE '22: Best in class

Aug 19, 2022 By Craig Ferrara In BigPanda

Our RESOLVE ‘22 event Best in class, moderated by BigPanda Vice President of Value & Adoption Craig Ferrara, took a slightly different approach than most other panels during the event. Where most focused on a given topic and allowed our expert panelists to weigh in, this one was all about storytelling.

Read Post

BigPanda

Read more about RESOLVE '22: Best in class

A new channel per incident - helpful or harmful?

Aug 18, 2022 By Chris Evans In Incident.io

I caught the tail-end of a Twitter thread the other day which centred around the use of Slack channels for incidents, and whether creating a new channel for each new incident is helpful or harmful. It turns out this is a much more evocative subject than I thought, and since I have opinions I thought I’d share them!

Read Post

Incident.io

Read more about A new channel per incident - helpful or harmful?

Uptime + Squadcast Integration: Routing Alerts Made Easy

Aug 18, 2022 By Vishal Padghan In Squadcast

Uptime is a site monitoring solution used to reach various endpoints & notify users via push notifications when downtime is detected. It collects and stores downtime & response time data & which is then made available as reports to the users. If you use Uptime for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Uptime to the right users in Squadcast. The below steps will help you set up Uptime and Squadcast integration.

Read Post

Squadcast

Read more about Uptime + Squadcast Integration: Routing Alerts Made Easy

That Rogers Outage is Going to be More Expensive Than You Think

Aug 18, 2022 By Mark Towler In Catchpoint

On July 8 of 2022, the Canadian telecom company Rogers Communications suffered a major outage that impacted most of Canada for almost two days. This wasn’t completely unprecedented (they’d had an outage in 2021 that impacted their wireless servers for several hours) but the breadth and severity of this one is going to end up costing them far, far more than it seems at first glance.

Read Post

Catchpoint

Read more about That Rogers Outage is Going to be More Expensive Than You Think

See the big picture with the Service Dependency Graph

Aug 18, 2022 By Crystal Poenisch In FireHydrant

Understanding the impact and scope of an incident when degradation occurs is critical for returning your service online. This requires modeling the many downstream and upstream relationships between your services. Our new Service Dependency Graph provides a shortcut – a way to surface dependencies quickly, understand the relationship between services, and determine the scope or impact of an incident.

Read Post

FireHydrant

Read more about See the big picture with the Service Dependency Graph

Strategic venture arms of UBS and Wells Fargo fuel further growth of BigPanda

Aug 17, 2022 By BigPanda In BigPanda

Strategic venture arms of UBS and Wells Fargo fuel further growth of BigPanda's industry-leading AIOps platform

View Video

BigPanda

Read more about Strategic venture arms of UBS and Wells Fargo fuel further growth of BigPanda

Wells Fargo invests in BigPanda to help eliminate costs

Aug 17, 2022 By BigPanda In BigPanda

Wells Fargo invests in BigPanda to help eliminate operational costs and complexity of digital transformation

View Video

BigPanda

Read more about Wells Fargo invests in BigPanda to help eliminate costs

UBS invests in BigPanda to help drive digital disruption and innovation in AIOps

Aug 17, 2022 By BigPanda In BigPanda

UBS is one of the leaders in the financial sector and one of the early adopters that are levering AI to do things better, cheaper and faster to bring their IT Operations in line with their cloud migration and digital transformation strategy. BigPanda is thrilled to have UBS as a customer and an investor to drive real transformation.

View Video

BigPanda

Read more about UBS invests in BigPanda to help drive digital disruption and innovation in AIOps

August 2022 Update - Change duty status of colleagues, configurable duty notifications and revised password change

Aug 17, 2022 By René In SIGNL4

Our August update now allows administrators and team administrators to change the service status of other users in the portal. We also made service settings more granular and e.g. introduced the ability to turn off certain push messages when colleagues’ service statuses change. We have also revised the way of changing personal password or remote action PIN in the portal. All details are available in this article.

Read Post

SIGNL4

Read more about August 2022 Update - Change duty status of colleagues, configurable duty notifications and revised password change

RESOLVE '22: The SOC and the NOC

Aug 17, 2022 By Kris Taylor, Roger Barranco and Craig Bowman In BigPanda

In our RESOLVE ’22 event The SOC and the NOC, moderator and 3 Tree Tech VP of Cybersecurity Kris Taylor welcomed two esteemed guests to the stage: As Kris noted at the top of the event, we brought our panelists together to talk about “the culture of the network operating center (NOC) and security operations center (SOC).” Along the way, they discussed different philosophical and practical takes on the high-level topics of networking and security.

Read Post

BigPanda

Read more about RESOLVE '22: The SOC and the NOC

PagerDuty Process Automation and Rundeck Open Source Release Notes 4.5.0

Aug 17, 2022 By PagerDuty In PagerDuty

Product Manager Forrest Evans talks about new features in the 4.5.0 release of Process Automation and Rundeck Open Source.

View Video

PagerDuty

Read more about PagerDuty Process Automation and Rundeck Open Source Release Notes 4.5.0

IHS Markit: Centralizing Incident Management With PagerDuty & ServiceNow

Aug 17, 2022 By Lisa Duckrow In PagerDuty

In today’s digital world, organizations are constantly undergoing change. They’re moving to the cloud and rolling out DevOps at scale—all in the name of driving innovation. But moving from a monolith to microservices can lead to applications becoming increasingly distributed. When problems arise, customers don’t care how many teams and services you have, or how complex your architecture is. They only care that your services work when they need them to.

Read Post

PagerDuty

Read more about IHS Markit: Centralizing Incident Management With PagerDuty & ServiceNow

Automate user provisioning with SCIM

Aug 16, 2022 By Max Tilka In FireHydrant

Many of our customers use an identity provider to provision new users to our app via SAML & SSO. We are further streamlining this user provisioning by integrating with SCIM 2.0 protocol.

Read Post

FireHydrant

Read more about Automate user provisioning with SCIM

How DORA will impact incident management at financial entities

Aug 16, 2022 By Charlie Kingston In Incident.io

The Digital Finance Strategy is a European directive that aims to support and develop digital finance in Europe whilst maintaining financial stability and consumer protection. There are three main components to the package: In this blog post, we’ll attempt to summarise the 113-page DORA proposal, highlighting how it will apply to incident management at financial entities.

Read Post

Incident.io

Read more about How DORA will impact incident management at financial entities

New Feature: StatusCast now integrates with Google Translate

Aug 15, 2022 By StatusCast In StatusCast

Here at StatusCast we understand the importance of a resourceful and communicative status page. A status page is the ambassador of your incident response management process, and like any good ambassador, it needs to speak the language. If your status page is now hosted by StatusCast, it is now fully integrated with Google Translate, a powerful tool that allows your subscribers and even viewers to translate your page into the language most comfortable to them.

Read Post

StatusCast

Read more about New Feature: StatusCast now integrates with Google Translate

Minimizing Data Science Model Drift by Leveraging PagerDuty

Aug 15, 2022 By Thomas Pin In PagerDuty

PagerDuty has an Early Warning System (EWS) model which helps the Customer Success and Sales departments ascertain the wellness of existing PagerDuty customers based on product usage and external business factors. This Early Warning System model has become critical infrastructure and the first line of defense in identifying poor product usage that could result in account churn.

Read Post

PagerDuty

Read more about Minimizing Data Science Model Drift by Leveraging PagerDuty

Fast track video series: Integrate ticketing and messaging tools with BigPanda

Aug 15, 2022 By BigPanda In BigPanda

BigPanda’s Agnostic Integrations provides powerful bi-directional integration for enterprise ticketing, service desk and collaboration tools such as chat and incident response, so operators can easily share BigPanda incidents with other users in their ticketing and collaboration tools of choice. With BigPanda, teams can easily automate ticket creation as well as notifications and war room creation in chat tools.

View Video

BigPanda

Read more about Fast track video series: Integrate ticketing and messaging tools with BigPanda

Connecting to incident.io with Zapier

Aug 11, 2022 By Charlie Kingston In Incident.io

At incident.io, we believe that incidents are for everyone. As part of enabling that mission, we think it’s essential to ensure that all users can create, configure, and maintain business processes related to an incident. Today, we have two approaches to support different people, products, and organisational structures: We’re excited to announce that we’re taking this further and adding Zapier to our growing list of options to automate your processes (and focus on fixing)!

Read Post

Incident.io

Read more about Connecting to incident.io with Zapier

Get to the Root (Cause Analysis) in 5 Easy Steps

Aug 10, 2022 By PagerDuty University In PagerDuty

What is one of the first things you should do when you are assigned an incident via PagerDuty? If you immediately thought “Acknowledge!” you are not wrong, but after that, it’s all about resolving the issue as quickly and painlessly as possible. The first step to resolution is to investigate what caused the incident in the first place so you can easily get a fix in place.

Read Post

PagerDuty

Read more about Get to the Root (Cause Analysis) in 5 Easy Steps

Understanding Cloud Services: IaaS, SaaS, and PaaS

Aug 10, 2022 By xMatters In xMatters

Cloud services have skyrocketed in popularity in the past few years, providing a vast array of resources as well as a cost-effective path for the migration from on-premises servers to the cloud. In fact, cloud services are handling all the computing needs of many businesses. It’s very likely you’re already using cloud services and will continue to use more as time goes on.

Read Post

xMatters

Read more about Understanding Cloud Services: IaaS, SaaS, and PaaS

RESOLVE '22: Behind the scenes

Aug 10, 2022 By Alex Meyer In BigPanda

What do a sinking ship and an improperly equipped data center have in common? For Dell Senior Director of Global Network and Datacenter Services Paul Beninati, the two have a lot in common. At least, from the perspective of company proactivity and ITOps performance goals.

Read Post

BigPanda

Read more about RESOLVE '22: Behind the scenes

No Longer Haunted by the Ghost of AIOps Past

Aug 10, 2022 By Richard Whitehead In Moogsoft

How AIOps has evolved into an accessible and efficient solution.

Read Post

Moogsoft

Read more about No Longer Haunted by the Ghost of AIOps Past

PagerDuty Incident Response Demo (Extended)

Aug 9, 2022 By PagerDuty In PagerDuty

Enjoy this demo that showcases a day in the life of a team handling an incident with PagerDuty's Automated Incident Response solution. PagerDuty enables teams to orchestrate the right response for every incident. It also helps organizations protect revenue and improve customer experiences by resolving critical incidents faster and preventing future occurrences. Now you can bring major incident best practices to your organization with end-to-end response automation and friction-free postmortems.

View Video

PagerDuty

Read more about PagerDuty Incident Response Demo (Extended)

Arize integration with PagerDuty

Aug 9, 2022 By PagerDuty In PagerDuty

Streamline Model Monitoring with Integrated Alerts Arize is an ML Observability platform aimed to detect, troubleshoot, and eliminate ML problems faster. Use Arize to monitor your production models and send alerts to PagerDuty when your models deviate from a certain threshold. Arize and PagerDuty help keep your teams in the loop, send more comprehensive metadata through alerts, and debug your models faster than ever before.

View Video

PagerDuty

Read more about Arize integration with PagerDuty

Communication Channels in Squadcast | Incident Management | Squadcast

Aug 9, 2022 By Squadcast In Squadcast

Communication Channels help you add Video Call links, ChatOps links, and other external links to an incident. Additionally, you can create a dedicated Slack Channel for an incident using the Communications Card.

View Video

Squadcast

Read more about Communication Channels in Squadcast | Incident Management | Squadcast

Maintenance Mode in Squadcast - Create maintenance windows for Services | Squadcast

Aug 9, 2022 By Squadcast In Squadcast

Maintenance Mode enables you to reduce alert noise during the scheduled maintenance window. Thus alert notifications for false-positive incidents can be suppressed during Maintenance windows.

View Video

Squadcast

Read more about Maintenance Mode in Squadcast - Create maintenance windows for Services | Squadcast

Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

Aug 9, 2022 By Squadcast In Squadcast

With Squadcast, you can define and monitor Service Level Objects for your services. SLOs allow you to define and enforce an agreement between two parties regarding the delivery of a given service. A Service Level Objective (SLO) is a reliability target, measured by a Service Level Indicator (SLI), and sometimes serves as a safeguard for a Service Level Agreement (SLA). SLOs represent customer happiness and guide the development team’s velocity.

View Video

Squadcast

Read more about Using Squadcast's SLO Tracker | Error Budget | Setting up SLOs and configuring SLIs | Squadcast

Interrupts in software teams: using unplanned work to your advantage

Aug 9, 2022 By Alex Russell-Saw In Incident.io

Interrupts are often seen as a problem that eats away at your team’s productivity, and gets in the way of shipping important things for your customers. It’s often consciously accrued from the tech debt we accept to ship features sooner. However when a team doesn’t have a good strategy for dealing with the consequences of those decisions, the pain is felt much more acutely and much sooner.

Read Post

Incident.io

Read more about Interrupts in software teams: using unplanned work to your advantage

PagerDuty Debuts as a Leader in 2022 GigaOm Radar for AIOps Solutions

Aug 9, 2022 By Heath Newburn In PagerDuty

Every year there is a surprise in a Radar report. While it won’t be a surprise to our thousands of customers who are seeing tremendous benefits with us, PagerDuty is excited to be named a Leader in the 2022 GigaOm Radar for AIOps Solutions. GigaOm uses extensive criteria to evaluate vendors in their Radar.

Read Post

PagerDuty

Read more about PagerDuty Debuts as a Leader in 2022 GigaOm Radar for AIOps Solutions

PagerDuty Event Intelligence for AIOps (Extended)

Aug 9, 2022 By PagerDuty In PagerDuty

Enjoy this demo that showcases PagerDuty's Event Intelligence, a powerful AIOps solution that helps teams harness machine learning to reduce alert noise, create context for faster resolution, and remove toil by automating repetitive tasks.

View Video

PagerDuty

Read more about PagerDuty Event Intelligence for AIOps (Extended)

RESOLVE '22: How to get multi-cloud done right

Aug 8, 2022 By Anthony Evans In BigPanda

Multi-cloud is inevitable. With AIOps, struggling in its complexity doesn’t need to be. Business technology stacks don’t appear out of a vacuum. For the modern cloud-enabled, cloud-dependent company (that is to say, most of them), the look from the inside looks more like an ongoing evolution than a monolithic choice.

Read Post

BigPanda

Read more about RESOLVE '22: How to get multi-cloud done right

The Power of using Enterprise Alerts Remote Actions via Cloudbridge

Aug 8, 2022 By Derdack In Derdack

For over 20 years Derdack has been developing products that meet the challenges of incident management. It is well documented how Enterprise Alert and SIGNL4 not only filter through the noise with advanced alert policies, but also target the right on-call engineer with the use of sophisticated scheduling, anywhere ad-hoc collaboration and 2way communication back to the originating event source.

Read Post

Derdack

Read more about The Power of using Enterprise Alerts Remote Actions via Cloudbridge

We've made it even easier to manage your FireHydrant configuration with Terraform

Aug 8, 2022 By Michelle Peot In FireHydrant

Many of our customers use FireHydrant’s verified Terraform provider to track configuration changes, ensure consistency, and automate repetitive configuration tasks. Back in March we streamlined our Terraform provider support for service catalog configuration. Today we are releasing extensive Terraform provider improvements for configuring runbooks, task lists, service dependencies, incident roles, and more.

Read Post

FireHydrant

Read more about We've made it even easier to manage your FireHydrant configuration with Terraform

Monitor 3rd-party outages in PagerDuty

Aug 8, 2022 By isDown In isDown

We’ve integrated IsDown with PagerDuty so you can manage alerts in the same place you manage all your other alerts. The PagerDuty integration is part of our strategy to make it easy to monitor all the business dependencies that companies nowadays have. We live in a world where SaaS rules the world, and companies prefer to buy vs. build. But with that comes the problem of monitoring all these dependencies, which are critical to daily operations.

Read Post

isDown

Read more about Monitor 3rd-party outages in PagerDuty

GigaOm Radar Report

Aug 5, 2022 By Richard Whitehead In Moogsoft

In June, the research firm GigaOm, published the 2022 edition of their annual Radar for AIOps Solutions, having had time to digest the contents, it seems a good time to summarize the key takeaways from the Moogsoft perspective. Firstly, in case you are not familiar with GigaOm, here’s a brief introduction.

Read Post

Moogsoft

Read more about GigaOm Radar Report

MTTJ - What is Mean Time to Join (MTTJ)?

Aug 5, 2022 By AlertOps In AlertOps

MTTJ – The time taken to join a meeting, and delays caused in ensuring right people are available, can be avoided using software automation and tools. This is not an often talked about topic, but am sure everyone is affected directly from this. We discuss this in detail here. What, why and how it can be avoided?

Read Post

AlertOps

Read more about MTTJ - What is Mean Time to Join (MTTJ)?

Driving a customer-focused incident response process

Aug 4, 2022 By Martha Lambert In Incident.io

Deep into an incident, Slack firing, up to your ears in decisions, not sure where to turn next? It’s easy for external communication with your customers to fall far down the list of priorities in these moments. However, these are the exact situations where comms are vital, and where underestimating their importance can having damaging and lasting effects on your organisation.

Read Post

Incident.io

Read more about Driving a customer-focused incident response process

The Do's and Don'ts of Blameless Incident Postmortems

Aug 3, 2022 By xMatters In xMatters

When an incident inevitably occurs, many organizations have a well-prepared incident management team that springs into action. Whether it’s a power outage or security breach, an incident can damage your company’s operations if not handled properly. A strong incident response team is critical to mitigating any negative impacts successfully. Furthermore, once your team resolves the problem, you should initiate a postmortem to detail the incident and record any lessons learned.

Read Post

xMatters

Read more about The Do's and Don'ts of Blameless Incident Postmortems

RESOLVE '22: Incident management automation

Aug 3, 2022 By Ryan Taylor In BigPanda

“Make life easier” isn’t a mantra for the lazy—it’s a way to drill down on important automation in the IT Ops room. When Ryan Taylor, VP of solutions engineering at Transposit, talks about his experience and outlook in the IT Ops chair, people tend to listen.

Read Post

BigPanda

Read more about RESOLVE '22: Incident management automation

Episode 6: Mooving to... Real release strategies with Jake Laverty

Aug 3, 2022 By Richard Whitehead In Moogsoft

Every product or application needs a release strategy. It’s how you can double check that everything in your deployment is appropriately tested, validated and verified. Having a standardized release strategy in place allows your team to follow a protocol and reduce the number of unknowns they must face in the product life cycle. However, there are a few considerations to make this critical process run smoothly.

Read Post

Moogsoft

Read more about Episode 6: Mooving to... Real release strategies with Jake Laverty

New! Common Automated Diagnostics for AWS Users

Aug 3, 2022 By Jake Cohen In PagerDuty

Today’s modern cloud architectures centered on AWS are typically a composite of ~250 AWS services and workflows implemented by over 25,000 SaaS services, house-developed services, and legacy systems. When incidents fire off in these environments—whether or not a company has built out a centralized cloud platform—distinct expertise is often a necessity.

Read Post

PagerDuty

Read more about New! Common Automated Diagnostics for AWS Users

Automate incident response workflows with Eventarc and Datadog

Aug 2, 2022 By Thomas Sobolik In Datadog

Eventarc is a Google Cloud offering that ingests and routes events between GCP products, such as Cloud Run, Cloud Functions, and Pub/Sub, making it easy to build automated, event-driven workflows in complex environments. By taking care of event ingestion, delivery, authorization, and error handling, Eventarc reduces the development overhead that is required to build and maintain these workflows and helps you improve application resilience.

Read Post

Datadog

Read more about Automate incident response workflows with Eventarc and Datadog

Tell the story of your incident with timeline curation

Aug 2, 2022 By Martha Lambert In Incident.io

It isn’t the first time you’ve heard us say this and it won’t be the last: getting your post-incident process right is a game-changer. Being able to run effective debriefs and create useful postmortems helps us learn from our mistakes, respond better to future incidents and identify how we can build resilience in our product and teams. In short, it’s the thing the shifts the dial from just “fixing” to actually improving.

Read Post

Incident.io

Read more about Tell the story of your incident with timeline curation

Anti-patterns in Incident Response that you should unlearn

Aug 2, 2022 By Vishal Padghan In Squadcast

It is important to invest time and effort in understanding why a system performs the way it does and how we can improve it. Companies continue with practices that yield successful results, but ignoring anti-patterns can be far worse than choosing rigid processes. In this blog we will explore anti-patterns in incident response and why you should unlearn those.

Read Post

Squadcast

Read more about Anti-patterns in Incident Response that you should unlearn

What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today

Aug 2, 2022 By Vivian Chan In PagerDuty

Does your team deal with too much noise? Does your heart sink a bit when you think about how much your rulesets have sprawled in order to manage your event processing needs? That’s why we released Event Orchestration earlier this year to help teams reduce the amount of manual work that goes into event management. Event Orchestration is the next evolution of our Event Rules feature set, which helps to route, enrich, and modify events on ingest to remove noise and automate processes.

Read Post

PagerDuty

Read more about What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today

Dedicated Incident Channel Improvements for Slack on Webhooks V3 - Early Access

Aug 2, 2022 By Jorge Villamariona In PagerDuty

Today, we are excited to open Early Access for our improved Dedicated Incident Slack Channel. These improvements include: In order to take advantage of this feature you need to upgrade to Slack on WebHooks V3 and request Early Access from PagerDuty support. Once you are on the right version and have early access, there are two ways to create a dedicated incident channel.

Read Post

PagerDuty

Read more about Dedicated Incident Channel Improvements for Slack on Webhooks V3 - Early Access

To require or not require (fields): that is the question

Aug 1, 2022 By Dylan Nielsen In FireHydrant

Required fields have been a hot topic at FireHydrant. Choose too many (or the wrong ones), and you unnecessarily annoy your team during an incident or encourage sloppy data entry that someone has to come back and clean up manually. Don't use them at all and risk insufficient data to efficiently propel an incident toward resolution.

Read Post

FireHydrant

Read more about To require or not require (fields): that is the question

Overcome the integration bottleneck with self-service onboarding tools

Aug 1, 2022 By Tony Piunno In BigPanda

The amount of data volume and complexity within tech stacks is continuing to increase with no sign of slowing down. As a result, many organizations are facing significant challenges related to tool sprawl and the overwhelming amount of data that needs to be exchanged between all the different systems. The result is this new rapid pace of data which brings a need for faster flow and exchange of information.

Read Post

BigPanda

Read more about Overcome the integration bottleneck with self-service onboarding tools

Analytics in Squadcast | Incident Management | On-call | SRE | Squadcast

Aug 1, 2022 By Squadcast In Squadcast

Analyzing incident data plays a key role to do better SRE. Squadcast's Analytics Dashboard helps you analyze the performance of your Organization/ Team, for a given time period. It also gives you more insight into past outages that affected your systems.

View Video

Squadcast

Read more about Analytics in Squadcast | Incident Management | On-call | SRE | Squadcast

Integrating Squadcast with Jira (Cloud & Server) - Create tickets & bidirectional sync | Squadcast

Aug 1, 2022 By Squadcast In Squadcast

You can use this integration guide to install and configure the Squadcast extension in Jira Cloud & Jira Server to create issues in Jira projects when there is an incident in Squadcast. Also learn to automatically or manually sync the status bidirectionally.

View Video

Squadcast

Read more about Integrating Squadcast with Jira (Cloud & Server) - Create tickets & bidirectional sync | Squadcast

Integrating Slack & Squadcast- Trigger, Acknowledge, Resolve & Reassign incidents from Slack channel

Aug 1, 2022 By Squadcast In Squadcast

You can integrate Squadcast and Slack to collaborate efficiently with your team while working on incidents. Squadcast sends a notification to the configured Slack Channel as soon as an incident is triggered.

View Video

Squadcast

Read more about Integrating Slack & Squadcast- Trigger, Acknowledge, Resolve & Reassign incidents from Slack channel

Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

Aug 1, 2022 By Squadcast In Squadcast

Teams using MS Teams can now integrate with Squadcast and easily Acknowledge, Resolve & Reassign incidents using MS Teams. You can configure Squadcast to send a notification to the configured MS Teams channel as soon as an incident is triggered.

View Video

Squadcast

Read more about Integrating Microsoft Teams & Squadcast - Acknowledge, Resolve & Reassign Incidents | Squadcast

Tagging & Routing at Squadcast | Incident Management | Squadcast

Aug 1, 2022 By Squadcast In Squadcast

Event Tagging is a rule-based, auto-tagging system with which you can define customized tags based on incident payloads, that get automatically assigned to incidents when they are triggered. Auto-add relevant information like priority, severity or alert type to make incoming incidents context-rich. Route alerts to the right responder(s) based on the tags they carry

View Video

Squadcast

Read more about Tagging & Routing at Squadcast | Incident Management | Squadcast

Alert Suppression Rules in Squadcast to prevent Alert fatigue | Squadcast

Aug 1, 2022 By Squadcast In Squadcast

Alert suppression can help you avoid alert fatigue by suppressing notifications for non-actionable alerts. Squadcast will suppress the incidents that match any of the Suppression Rules you create for your Services. These incidents will go into the Suppressed state and you will not get any notifications for them.

View Video

Squadcast

Read more about Alert Suppression Rules in Squadcast to prevent Alert fatigue | Squadcast

What's New: Automation Actions in the PagerDuty Application for Zendesk

Aug 1, 2022 By Carrie Lacina In PagerDuty

The past few years have led to a significant increase in customer demands, and customer service agents are feeling the pressure. According to a recent Zendesk CX Trends report, 68% of agents report feeling overwhelmed. Here at PagerDuty, we believe that happier customer service agents lead to more positive customer interactions and stronger relationships with your brand.

Read Post

PagerDuty

Read more about What's New: Automation Actions in the PagerDuty Application for Zendesk

Key considerations before signing up for cyber insurance

Aug 1, 2022 By Noam Morginstin In Exigence

With 2021 seeing 5.1 billion records breached and an annual increase in attacks at 11%, the risk of security incidents is only getting greater every year. And when an attack hits, the cost to recover, which includes fines, penalties, legal fees, and much more, are also great. To help minimize the scope of financial damage, many organizations turn to cyber insurance. Albeit a relatively new branch of insurance, demand is already huge and ever increasing.

Read Post

Exigence

Read more about Key considerations before signing up for cyber insurance

Operations | Monitoring | ITSM | DevOps | Cloud