November 2021

Keeping people safe and operations running faster in Middle East healthcare

Nov 30, 2021 By Everbridge In Everbridge

The COVID-19 pandemic has caused widespread concerns in the interconnected international Healthcare Industry. This is especially true in the Middle East, where many of the world’s most renowned institutions have international operations. Healthcare leaders need to maintain a delicate operational equilibrium, balancing keeping their people safe and getting critical treatment to patients. This becomes more complex with healthcare practitioners in foreign locations, where they may not be familiar with the local norms or laws and need real-time informative communications in the event of a critical event.

View Video

Everbridge

Read more about Keeping people safe and operations running faster in Middle East healthcare

MTTR | Mean Time to Recovery Explained

Nov 30, 2021 By Noor-ul-Anam Ruqayya In Blameless

Curious about MTTR? We explain what the mean time to recovery is, why it matters to your development team, and how to reduce it.

Read Post

Blameless

Read more about MTTR | Mean Time to Recovery Explained

Observability and SaaS Providers

Nov 30, 2021 By Helen Beal In Moogsoft

SaaS is exploding and so it should; it takes commoditized work and infrastructure away from tech teams so that they can focus on differentiating features. But what happens when it goes wrong? How do SaaS platforms make sure they aren't letting their customers down and in turn, letting their customers down? Observability, bolstered with AI gives all the partners the best chance to optimize availability and customer experience. Here's how.

Read Post

Moogsoft

Read more about Observability and SaaS Providers

What's new in Enterprise Alert 9.1 - Full Length Session

Nov 29, 2021 By Derdack In Derdack

Walking through all new features in Enterprise Alert 9.1 Video from the Derdack User Group Meeting 2021: Alert Policy Enhancements Microsoft Teams 2-way Connector SIGNL4 Cloud Bridge And more...

View Video

Derdack

Read more about What's new in Enterprise Alert 9.1 - Full Length Session

SIGNL4 Cloud Bridge

Nov 29, 2021 By SIGNL4 In SIGNL4

How Enterprise Alert 9.1. can connect 2-way with SIGNL4 for hybrid cloud scenarios to get the best of two worlds

View Video

SIGNL4

Read more about SIGNL4 Cloud Bridge

6 Signs Your Incident Response Steps Are Working

Nov 29, 2021 By Curtis St Pierre In xMatters

Although IT incidents have always been a concern, the increase in customer-facing technology adds the cost of a bad customer experience to the cost of responding to and remediating an incident. While in a perfect world, you’d be able to prevent incidents from happening in the first place, the reality is they do happen and more often than most of us would like to admit.

Read Post

xMatters

Read more about 6 Signs Your Incident Response Steps Are Working

Supervised Users in xMatters - xMatters Support

Nov 29, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he explains the Supervised Users tab in xMatters. Supervisors can modify and delete the passwords for users they supervise. Is a supervisor creates or invtes a new user to xMatters, they automatically become their supervisor.

View Video

xMatters

Incident Management

Read more about Supervised Users in xMatters - xMatters Support

PagerDuty at AWS re:Invent 2021-Deepening Our Collaboration with AWS

Nov 29, 2021 By Inga Weizman In PagerDuty

Across the globe, in-person technology events are beginning to emerge from their pandemic hibernation. For developers and DevOps teams, no event has been more anticipated than AWS re:Invent, which is back in Las Vegas, November 29th — December 3rd to help bring us all back together and slowly let us find our new normal. While handshakes may be replaced by elbow bumps or other newfound greeting rituals, we are excited to be back and see all of you in real life.

Read Post

PagerDuty

Read more about PagerDuty at AWS re:Invent 2021-Deepening Our Collaboration with AWS

All About Incident Communication: What it Is, How to Do It, and Why It's Crucial for Business

Nov 29, 2021 By Brittany Storniolo In Statuspal

No matter how much you try to avoid it, incidents are bound to happen. And while your first instinct is to resolve the issue, it shouldn’t be your only priority. By solely focusing on solving the problem and not communicating it to affected stakeholders, like team members and customers, you’re actively making the situation worse. In this article, we’ll discuss what’s incident communication and how to create a strong incident communication plan.

Read Post

Statuspal

Read more about All About Incident Communication: What it Is, How to Do It, and Why It's Crucial for Business

3 Things to Know About AI/ML in the DevOps Toolchain

Nov 29, 2021 By Richard Whitehead In Moogsoft

At DevWeek Austin, we discussed how AI and ML have come to the DevOps toolchain and are a great fit! Here are the 3 main takeaways.

Read Post

Moogsoft

Read more about 3 Things to Know About AI/ML in the DevOps Toolchain

Enterprise Alert 9.1 Update brings Microsoft Teams and SIGNL4 connectivity

Nov 29, 2021 By Derdack In Derdack

As announced at the User Group Meeting 2021, we are now releasing Enterprise Alert 9.1. This version brings a set of new features extending the capabilities in some crucial areas. Here is what’s new in a nutshell: As always you will find more details, release notes and downloadable installer files in the online user group. You can also watch the session from our UGM (no cookie embedding): Watch this video on YouTube

Read Post

Derdack

Read more about Enterprise Alert 9.1 Update brings Microsoft Teams and SIGNL4 connectivity

Supervisors in xMatters - xMatters Support

Nov 26, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he outlines the privileges of supervisors in xMatters. Supervisors can modify the user profile of any user they supervise, as well as can view their groups, change their login details, and sign out of the mobile app on their behalf if their account has been compromised.

View Video

xMatters

Incident Management

Read more about Supervisors in xMatters - xMatters Support

Resolve Actions - Automated Incident Resolution vs Manual Procedure

Nov 25, 2021 By Resolve In Resolve

Watch the stark differences in effort and timing between manually investigating a network link failure versus the benefits of automating this known process using Resolve Actions.

View Video

Resolve

Read more about Resolve Actions - Automated Incident Resolution vs Manual Procedure

6 Steps SREs Should Take to Prepare for Black Friday and Cyber Monday 2021

Nov 24, 2021 By Quentin Rousseau In Rootly

Six tips on how Site Reliability Engineers (SREs) can prepare for the reliability challenges of Black Friday and Cyber Monday 2021

Read Post

Rootly

Read more about 6 Steps SREs Should Take to Prepare for Black Friday and Cyber Monday 2021

Who Should Be On Your Incident Response Team?

Nov 24, 2021 By Chrissy Simpson In xMatters

When an incident strikes, an organization’s reputation and revenue, as well as customer trust are at stake. Assembling an effective incident response team is critical to minimizing the incident’s impact. But what exactly is an incident response team? Who should be a part of the team and what are their responsibilities? Successful incident responses require a team with a diverse set of problem-solving and communication skills.

Read Post

xMatters

Read more about Who Should Be On Your Incident Response Team?

Managing Users in xMatters - xMatters Support

Nov 24, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he teaches you how to manage users in xMatters, including view and searching for users as well as inviting new users into your instance.

View Video

xMatters

Incident Management

Read more about Managing Users in xMatters - xMatters Support

How AWS & xMatters Drive Monitoring and Observability Forward - xMatters Demo

Nov 23, 2021 By xMatters In xMatters

Join Tiberiu Oprisiu, Solution Architect at AWS, Eric Maxwell, Solution Architect at xMatters, and Rutuja Rajwade, Partner Marketing Manager at xMatters, as they highlight and demo the benefits that come from pairing AWS with xMatters. Learn from Tiberiu which business imperatives drive observability, and what, why, and how AWS can do just this. And, stick around to see Eric dive deep into matters Flow Designer to see how these workflows can be set up with ease!

View Video

xMatters

Read more about How AWS & xMatters Drive Monitoring and Observability Forward - xMatters Demo

3 AIOps Trends in 2022

Nov 23, 2021 By Richard Whitehead In Moogsoft

The AIOps market is flourishing and the new year is coming up, so let’s take a look at the top 3 trends to watch out for in 2022.

Read Post

Moogsoft

Read more about 3 AIOps Trends in 2022

Recognizing Burnout, So You Don't Fallout

Nov 23, 2021 By Thom Duran In Moogsoft

Burnout from work is proven to have a tangible impact on your physical health and happiness. Learn how to recognize burnout in yourself and your employees, and build a happy developer culture!

Read Post

Moogsoft

Read more about Recognizing Burnout, So You Don't Fallout

A vital alerting solution

Nov 23, 2021 By Matt In SIGNL4

This article should give you a first idea of what SIGNL4 does. What do IT security, production monitoring and technical field service have in common? In all these scenarios, the right people need to get notified immediately – in case of technical malfunctions, urgent maintenance orders or emergencies, all in order to solve any incident quickly and efficiently.

Read Post

SIGNL4

Read more about A vital alerting solution

How We Deploy Product Releases at xMatters

Nov 22, 2021 By Doug Peete In xMatters

With Halloween behind us and the holiday shopping season fast approaching, engineering and product teams know what that means: code freezes! At xMatters, code freezes are a part of our product release process in anticipation of the busiest — and most important — time of the year for many of our customers. But code freezes are just one piece of the puzzle in how we ensure our customers have the most reliable experiences. The way our product releases are designed is much more than that.

Read Post

xMatters

Read more about How We Deploy Product Releases at xMatters

Your xMatters Schedule on Android Devices - xMatters Support

Nov 22, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he teaches you how to view and modify your schedule in the xMatters app on Android devices.

View Video

xMatters

Read more about Your xMatters Schedule on Android Devices - xMatters Support

Partner Integration - Dynatrace with PagerDuty and Rundeck

Nov 22, 2021 By PagerDuty In PagerDuty

Deliver perfect software experiences with real-time intelligence into customer satisfaction and behavior, your applications, and the performance of your hybrid multi-cloud. AI-powered root-cause analysis automatically identifies customer facing performance issues and pinpoints the root-cause within seconds. Open APIs allow ingestion of 3rd party metrics and enable complex system integrations. In this demo, Rob Jahn shares a sophisticated incident remediation workflow incorporating intelligence from Dynatrace, automation in Rundeck, and incidents in PagerDuty.

View Video

PagerDuty

Read more about Partner Integration - Dynatrace with PagerDuty and Rundeck

SLA vs. SLO (Differences Explained)

Nov 22, 2021 By Emily Arnott In Blameless

Wondering about SLAs and SLOs? We explain service level agreements and service level objectives, their differences, and the importance of each. What are the major differences between service level agreements (SLAs) and service level objectives? An SLA is a legal agreement between the business and the customer that includes a reliability target and the consequences of failing to meet it. An SLO is an internal target that measures how customers use the service.

Read Post

Blameless

Read more about SLA vs. SLO (Differences Explained)

Building safe-by-default tools in our Go web application

Nov 22, 2021 By Lisa Karlin Curtis In Incident.io

At incident.io, we're acutely aware that we handle incredibly sensitive data on behalf of our customers. Moving fast and breaking things is all well and good, but keeping our customer data safe isn't something we can compromise on. We run incident.io as a multi-tenant application, which means we have a single database (and a single application).

Read Post

Incident.io

Read more about Building safe-by-default tools in our Go web application

4 Ways To Ensure Reliability of Your Digital Services for GivingTuesday

Nov 22, 2021 By Jesse Maddex In PagerDuty

In today’s digital economy, seconds matter. For mission-driven organizations, seconds can be a matter of life and death, and service reliability can make or break access to suicide and safety hotlines, disaster relief, time-critical health care, food assistance, and more. That’s where real-time digital operations comes in.

Read Post

PagerDuty

Read more about 4 Ways To Ensure Reliability of Your Digital Services for GivingTuesday

History of SRE: Why Google Invented the SRE Role

Nov 19, 2021 By JJ Tang In Rootly

A history of Site Reliability Engineering from its origins at Google in 2003 to the present.

Read Post

Rootly

Read more about History of SRE: Why Google Invented the SRE Role

Your xMatters Schedule on iOS Devices - xMatters Support

Nov 19, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he teaches you how to view and modify your schedule in the xMatters app on iOS devices.

View Video

xMatters

Read more about Your xMatters Schedule on iOS Devices - xMatters Support

DevOps Benefits & How to Maximize Them for Your Team

Nov 18, 2021 By Myra Nizami In Blameless

Curious about DevOps benefits? Whether you are just adopting DevOps or improving your current process, we explain the top benefits and how to maximize them. What are DevOps benefits? In DevOps, the operations and development work closely together during the entire software development lifecycle. The collaborative approach in DevOps leads to many benefits, including.

Read Post

Blameless

Read more about DevOps Benefits & How to Maximize Them for Your Team

Using Predictive Analytics Capability to Resolve Critical Incidents

Nov 18, 2021 By Srinivas Miriyala In CloudFabrix

CloudFabrix solution provides a holistic approach for enterprises to implement proactive operations with the objective of eliminating/reducing critical incidents and improving customer satisfaction. The solution primarily relies on applying regression/forecasting models on any time-series data to detect and forecast anomalies. One of the unique features of the solution is the ability to convert unstructured data such as logs/incidents/alerts into time-series data to be used for running prediction models.

Read Post

CloudFabrix

Read more about Using Predictive Analytics Capability to Resolve Critical Incidents

Growing pains: the IT Ops maturity model

Nov 18, 2021 By BigPanda In BigPanda

Modern IT Ops environments have many moving parts that need to work together well, yet are evolving at different speeds. This gap in maturity creates many problems. In this CTO Perspective, Jason Walker, Chief Customer Officer at BigPanda, discusses why IT Ops teams should prioritize maintaining a common maturity across all their IT operations, and how best to do that.

View Video

BigPanda

Read more about Growing pains: the IT Ops maturity model

Deploying to production in <5m with our hosted container builder

Nov 18, 2021 By Lawrence Jones In Incident.io

Fast build times are great, which is why we aim for less than 5m between merging a PR and getting it into production. Not only is waiting on builds a waste of developer time — and an annoying concentration breaker — the speed at which you can deploy new changes has an impact on your shipping velocity. Put simply, you can ship faster and with more confidence when deploying a follow-up fix is a simple, quick change.

Read Post

Incident.io

Read more about Deploying to production in

Training Intelligent Alert Grouping

Nov 18, 2021 By Quintessence Anx In PagerDuty

Complex incidents are both exhausting and commonplace. In this case, incidents that I am referring to as “complex” are incidents that involve multiple, disparate, notifications in your alert management platform. Perhaps these incidents are logically separated because the underlying systems or services were seen as less coupled than they turned out to be in reality.

Read Post

PagerDuty

Read more about Training Intelligent Alert Grouping

How to Use Status Page to Deliver Bad News to Customers

Nov 17, 2021 By StatusHub In StatusHub

In this article, we’re exploring how status pages can help you deliver bad news to customers in a “good way,” starting with the psychology of news delivery and how you can use this knowledge for future incidents.

Read Post

StatusHub

Read more about How to Use Status Page to Deliver Bad News to Customers

Fail-Safe Digital Scheduler for On-Call Management

Nov 17, 2021 By OnPage In OnPage

In this video, we discuss how OnPage's advanced, fail-proof digital schedules enable organizations to distribute workload evenly among scheduled, On-Call team members. The OnPage scheduler starts out "FULL" and schedules are created on top of it. This guarantees that a notification is delivered reliably, even when a slot is left empty on the scheduler. The scheduler reverts to the default group order and the entire group is notified, ensuring continuous coverage across your organization.

View Video

OnPage

Read more about Fail-Safe Digital Scheduler for On-Call Management

Viewing Your Contacts on Android - xMatters Support

Nov 17, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he navigates you through the “My Contacts” section of the xMatters app for Android devices.

View Video

xMatters

Read more about Viewing Your Contacts on Android - xMatters Support

Tis The Season: Protect Your Availability During The Holidays

Nov 17, 2021 By Richard Whitehead In Moogsoft

Deck the halls! It's time for the annual holiday Code Freeze, that festive time of year when businesses impose a precautionary halt to code changes and Operations should be quiet. But before you kick up your feet, make sure that demand doesn’t lead to availability embarrassments. After all, retail experts suggest that we’re in for another online-heavy holiday shopping season, so businesses need to brace for increased digital traffic...with little tolerance for failure.

Read Post

Moogsoft

Read more about Tis The Season: Protect Your Availability During The Holidays

Partner Integration on Twitch: Lacework

Nov 16, 2021 By PagerDuty In PagerDuty

Lacework delivers complete #security and #compliance for the cloud. While the cloud enables enterprises to automatically scale workloads, deploy faster, and build freely, it also makes it increasingly difficult to: maintain visibility, remain compliant, stay free from known vulnerabilities, and track activity in both host workloads and ephemeral infrastructure within their environments. Integrate Lacework with PagerDuty to route Lacework Events to responders on your team. Manage and resolve configuration issues, behavioral anomalies, and compliance requirements in a timely manner across your cloud infrastructure.

View Video

PagerDuty

Read more about Partner Integration on Twitch: Lacework

How to Write Meaningful Retrospectives

Nov 16, 2021 By Emily Arnott In Blameless

One of the foundations of incident management in SRE practice is the incident retrospective. It documents all the learnings from an incident and serves as a checklist for follow-up actions. If we step back, there are 7 main elements to a retrospective. When done right, these elements help you better understand an incident, what it reveals about the system as a whole, and how to build lasting solutions.

Read Post

Blameless

Read more about How to Write Meaningful Retrospectives

5 ways incidents made me a better engineer

Nov 16, 2021 By Lisa Karlin Curtis In Incident.io

Incidents are a great opportunity to gather both context and skill. They take people out of their day-to-day roles, and force ephemeral teams to solve unexpected and challenging problems. In my career, I've found incidents can be a great accelerator - for both myself and others around me. It was after leading my first incident at GoCardless that I started to feel really comfortable in the codebase and the team.

Read Post

Incident.io

Read more about 5 ways incidents made me a better engineer

Fall 2021 Launch: Automate Incident Response to Accelerate Critical Work

Nov 16, 2021 By PagerDuty In PagerDuty

Modern businesses are digital businesses—so managing your business means mastering your critical services and operations for your employees and customers. Today, you need to be able to understand every aspect of your company—as it unfolds—because in this world, seconds matter to your productivity, your revenue, and most importantly, your customers.

Read Post

PagerDuty

Read more about Fall 2021 Launch: Automate Incident Response to Accelerate Critical Work

Achieving Operational Resilience for Cellular Carriers

Nov 16, 2021 By Everbridge In Everbridge

The world is changing, and with great change comes an evolving threat landscape. Increases in physical and digital disruption, such as civil unrest, cyberattacks, severe weather events, and unplanned outages, have left many industries scrambling to secure a robust operational resilience strategy, including the cellular industry. Today’s evolving threat landscape poses a unique threat to cellular carriers, whose business is growing at a breakneck pace.

Read Post

Everbridge

Read more about Achieving Operational Resilience for Cellular Carriers

Viewing Your Contacts on iOS - xMatters Support

Nov 15, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he navigates you through the “My Contacts” section of the xMatters app for iOS devices.

View Video

xMatters

Read more about Viewing Your Contacts on iOS - xMatters Support

Mobile Service Dispatching for In Plant Transport Logistics at BASF Coatings

Nov 15, 2021 By Matt In SIGNL4

BASF is the largest chemical producer in the world with a revenue of EUR 59bn, 247 manufacturing sites and 110,000 employees. BASF’s Coatings division employs 11,000 people and develops, produces and markets innovative solutions for automotive OEM and automotive refinish coatings and industrial coatings as well as architectural coatings and related coating processes.

Read Post

SIGNL4

Read more about Mobile Service Dispatching for In Plant Transport Logistics at BASF Coatings

IT Failures are Inevitable

Nov 15, 2021 By xMatters In xMatters

As infrastructure stacks grow increasingly complex and involve an ever-growing number of services, system failures are becoming more and more common. There can be a variety of reasons why systems fail: software bugs, misconfiguration or interactions between services that cause unexpected behavior, the network is down, and of course, those rare occasions where natural events can render data centers inoperative.

Read Post

xMatters

Read more about IT Failures are Inevitable

Your Guide to Developing a Fail-Safe Incident Response Plan

Nov 12, 2021 By Jared Curtis In xMatters

Incidents happen. Every organization's technical team will face an incident sooner or later, whether planned or unplanned.An incident can be declared or initiated in response to an event or combination of events that affects the integrity or availability of a system or service in a way that impacts core business processes.

Read Post

xMatters

Read more about Your Guide to Developing a Fail-Safe Incident Response Plan

Minimize the impact of critical incidents with Freshservice On-Call Management

Nov 12, 2021 By Anusha Jha In Freshservice

“Service outage! Help!” These words (or their variations), have preceded notable losses of millions and billions of dollars in the 21st century. From large corporations to SMBs, no one is immune to the effects of downtime – whether planned or unplanned. However, the earlier an issue is noticed, the faster it is acted upon and resolved, resulting in little or no customer impact.

Read Post

Freshservice

Read more about Minimize the impact of critical incidents with Freshservice On-Call Management

SRE Complete Resume Writing Guide

Nov 12, 2021 By Quentin Rousseau In Rootly

Follow these steps to write a great SRE job resume.

Read Post

Rootly

Read more about SRE Complete Resume Writing Guide

Monitoring & Observability for Sales, Marketing and Business ops teams with StackMoxie and PagerDuty

Nov 12, 2021 By PagerDuty In PagerDuty

Before Stack Moxie, every business ops team needed PagerDuty, but finding and pushing errors was a manual process. With Stack Moxie + PagerDuty, every business op professional can manage their sales, marketing, HR or customer success stack with the same quality engineers bring to code.

View Video

PagerDuty

Read more about Monitoring & Observability for Sales, Marketing and Business ops teams with StackMoxie and PagerDuty

OnPage's Clinical Communication and Collaboration Solution

Nov 12, 2021 By OnPage In OnPage

Modern healthcare teams require a modern solution to streamline clinical communications and medical workflows. In life and death situations, it’s critical that physicians receive immediate alerts and messages to provide patient care promptly. OnPage is the industry’s most trusted clinical communications platform. OnPage is more reliable and secure than traditional pagers. The system enables care teams to easily communicate and achieve maximum patient satisfaction.

View Video

OnPage

Read more about OnPage's Clinical Communication and Collaboration Solution

Viewing Your Devices on Android - xMatters Support

Nov 12, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he navigates you through the My Devices screen in the xMatters app for Android devices.

View Video

xMatters

Read more about Viewing Your Devices on Android - xMatters Support

4 IT Challenges Addressed by OnPage Automated Alerting

Nov 12, 2021 By Christopher Gonzalez In OnPage

IT organizations are challenged with delivering quick, effective resolution to customers’ database, hardware or software downtime issues. Contractually binding service-level agreements (SLAs) place further pressure on IT engineers to accelerate incident resolution time and minimize downtime. Though engineers are obligated to meet their SLAs, they are unable to do so without the help of an automated alerting system.

Read Post

OnPage

Read more about 4 IT Challenges Addressed by OnPage Automated Alerting

Self-Healing DevOps

Nov 12, 2021 By xMatters

In a world where IT teams and DevOps groups are tasked with not only keeping the lights on but also driving business efficiencies and providing a stellar digital experience to the customer, automation is imperative. Even better is automation between multiple systems for ease of healing when an incident occurs.

Get EBook

xMatters

Read more about Self-Healing DevOps

Summary of Bridging the Gap: DevOps to SRE

Nov 11, 2021 By Emily Arnott In Blameless

SRE is important and becoming more essential as software becomes a ubiquitous part of our life. Business leaders recognize this but fear they lack the resources to properly adopt SRE. Even large companies with mature DevOps functions experience headwinds.

Read Post

Blameless

Read more about Summary of Bridging the Gap: DevOps to SRE

Everbridge at COP26 World Climate Summit 2021

Nov 11, 2021 By Everbridge In Everbridge

World Climate Summit 2021 / Introduction by Vernon Irvin / Program with Jessica Deckinger, John Maeda, Dominic Jones

View Video

Everbridge

Incident Management

Read more about Everbridge at COP26 World Climate Summit 2021

Reveille and PagerDuty Integration

Nov 11, 2021 By PagerDuty In PagerDuty

"Real time notification and visibility of events impacting availability , operation, and performance of business critical ECM applications

View Video

PagerDuty

Read more about Reveille and PagerDuty Integration

Logs and tracing: not just for production, local development too

Nov 11, 2021 By Lawrence Jones In Incident.io

We're a small team of engineers right now, but each engineer has experience working at companies who invested heavily in observability. While we can't afford months of time dedicated to our tooling, we want to come as close as possible to what we know is good, while running as little as we can- ideally buying, not building. Even with these constraints, we've been surprised at just how good we've managed to get our setup.

Read Post

Incident.io

Read more about Logs and tracing: not just for production, local development too

Avoid frostbite: Stop doing code freezes

Nov 11, 2021 By Robert Ross In FireHydrant

As the holiday season aggressively approaches I want to perform a public service announcement for everyone toying with the idea of a code freeze for the holidays: please don't. It’s getting cold outside and the season of peppermint mochas is upon us, which might get you thinking about putting a code freeze in place for the holidays. A Word of warning: instituting a code freeze may have unintended consequences.

Read Post

FireHydrant

Read more about Avoid frostbite: Stop doing code freezes

Outage or Breach - Confront with Confidence (2021)

Nov 10, 2021 By AlertOps In AlertOps

A Recent Dice Article Titled – Data Breach Costs: Calculating the Losses referenced a 2021 IBM and Ponemon Institute study that looked at nearly 525 organizations in 17 countries and regions that sustained a breach last year, and found that the average cost of a data breach in 2020 stood at $3.86 million.

Read Post

AlertOps

Read more about Outage or Breach - Confront with Confidence (2021)

Reliable incident alerting for critical IT systems at German health insurance provider Debeka

Nov 10, 2021 By Derdack In Derdack

“Thanks to Enterprise Alert and the acknowledgement function, we can track the alerting and response digitally and have the certainty that our employees always take care of incidents in our critical IT infrastructure in a timely manner. IT alerting with Derdack, which has to be documented according to BaFin KRITIS, is highly reliable.”, Markus Reusch, Product Owner Monitoring, Debeka

Read Post

Derdack

Read more about Reliable incident alerting for critical IT systems at German health insurance provider Debeka

How to improve your influence as an SRE

Nov 10, 2021 By Ricardo Castro In Squadcast

Improving your influence over the company will help you deliver high quality work as your goals will be closely aligned with those of the company. In this blog piece, Ricardo has explained how to improve your influence as an SRE. Balancing fast-paced business requirements with the demands of keeping production services stable is not an easy task.

Read Post

Squadcast

Read more about How to improve your influence as an SRE

Playbooks in Action: Creating Effective, Repeatable Incident Resolution Workflows

Nov 10, 2021 By Elli Ludwigson In Mattermost

While service incidents can be wildly dissimilar, they tend to have one thing in common: a need for quick resolution. Response teams need a robust, repeatable process to follow that ensures fast, mistake-free execution, especially for those 4 AM calls. Having a documented checklist saved where the entire team can access and use it at any time could make the difference between quick resolution or compounding the problem.

Read Post

Mattermost

Read more about Playbooks in Action: Creating Effective, Repeatable Incident Resolution Workflows

Microservice Architecture | What It Is & Why It Matters

Nov 10, 2021 By Noor-ul-Anam Ruqayya In Blameless

Curious about microservice architecture? We explain what microservice architecture is, and how it can be used to quickly produce reliable lightweight applications.

Read Post

Blameless

Read more about Microservice Architecture | What It Is & Why It Matters

Viewing Your Devices on iOS - xMatters Support

Nov 10, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he navigates you through the My Devices screen in the xMatters app for iOS devices.

View Video

xMatters

Read more about Viewing Your Devices on iOS - xMatters Support

4 Recommendations for Optimizing DevOps

Nov 10, 2021 By xMatters In xMatters

The concept and development of DevOps have significantly changed the way IT teams work in the last decade. Small and large teams alike can see the difference when they switch from traditional software development cycles to a DevOps cycle: accelerated innovation, improved collaboration, faster time to market. And the list of benefits continues to grow. To effectively embrace DevOps, however, is not an easy task. Thankfully, there are ways to navigate this challenging journey.

Read Post

xMatters

Read more about 4 Recommendations for Optimizing DevOps

Announcing Grafana OnCall, the easiest way to do on-call management

Nov 9, 2021 By Matvey Kukuy In Grafana

A critical part of managing modern software development is setting up and running an on-call rotation. But that often involves significant toil, in part because many of the existing tools are cumbersome and not developer-friendly. That’s why we’re excited to announce Grafana OnCall, an easy-to-use on-call management tool that will help reduce toil in on-call management through simpler workflows and interfaces tailored for devs.

Read Post

Grafana

Read more about Announcing Grafana OnCall, the easiest way to do on-call management

Q4 2021 Release

Nov 9, 2021 By BigPanda In BigPanda

Our latest Q4 2021 release includes new integrations and self-service APIs that foster deeper collaboration between agile and traditional Ops teams, help them scale together, and accelerate AIOps adoption across their enterprise.

View Video

BigPanda

Read more about Q4 2021 Release

Now you see me, now you don't: feature-flagging with LaunchDarkly at incident.io

Nov 9, 2021 By Sophie Koonin In Incident.io

At incident.io, we ship fast. We're talking multiple times a day, every day (yes, including Fridays). Once I merge a pull request (PR), my changes rocket their way into production without me lifting a finger. 💅 It's when we tackle larger projects that this becomes a bit more complicated. We recently launched Announcement Rules, which let you configure which channels incident announcements are posted in depending on criteria you define.

Read Post

Incident.io

Read more about Now you see me, now you don't: feature-flagging with LaunchDarkly at incident.io

Your Ops and DevOps teams need to work together, and fast. Who you gonna call?

Nov 9, 2021 By Adam Blau In BigPanda

The world is moving fast, led by an ever-accelerating IT landscape. In recent years, two distinct types of teams have emerged that assist in driving this business transformation: DevOps/SRE teams that are in charge of driving rapid innovation of products and services, and IT Ops/NOC teams that focus on preventing outages and maintaining the high level of quality, reliability and serviceability that modern, discerning customers expect.

Read Post

BigPanda

Read more about Your Ops and DevOps teams need to work together, and fast. Who you gonna call?

How Playbooks improve customer service delivery, agent productivity

Nov 9, 2021 By Abhi Rele In ServiceNow

We all know one bad experience can impact a customer’s perception of—and even willingness to deal with—an organization going forward. That’s why so many companies, in virtually every industry, have made investing in customer experience (CX) a top priority, according to ResearchAndMarkets.com. The problem is, for any given organization, there are a number of customer service processes along the entire life span of an interaction that need to be looked at and made great.

Read Post

ServiceNow

Read more about How Playbooks improve customer service delivery, agent productivity

New Apps for PagerDuty's Datadog Integration

Nov 8, 2021 By PagerDuty In PagerDuty

Status Dashboard by PagerDuty and Incidents by PagerDuty are new apps available now in Datadog. See a live, shared view of system health to improve awareness of operational issues with Status Dashboard by PagerDuty. Acknowledge, troubleshoot, and resolve incidents with PagerDuty actions embedded directly in the Datadog interface to limit context switching among tools. Julia Nasser and Hadijah Creary join the stream to show off this powerful enhanced integration.

View Video

PagerDuty

Read more about New Apps for PagerDuty's Datadog Integration

Make sense of complex systems with Dynamic Service Graph by PagerDuty

Nov 8, 2021 By PagerDuty In PagerDuty

The Dynamic Service Graph breaks down silos between teams and provides organizations with a living, breathing asset that displays technical and business services and their relationships at scale. It allows teams to quickly grasp the state of services, visually digest the full impact radius of an issue, zero in on likely cause, and seamlessly facilitate cross-team collaboration.

View Video

PagerDuty

Read more about Make sense of complex systems with Dynamic Service Graph by PagerDuty

Leaning on Technology in The New Noisy: Managing Cloud, Change and Risk

Nov 8, 2021 By PagerDuty In PagerDuty

Your company’s “digital transformation” will be driven by new application designs and methods, new technology stacks, and new processes. To master it, and delivering next generation services through it, massively complex sets of signals and data need to be leveraged, processed, and acted on. Developers need integrated data and insights through that noise, while being able to leverage their tools of choice. All of this must be managed, even in spite of massive rates of change and innovation.

View Video

PagerDuty

Read more about Leaning on Technology in The New Noisy: Managing Cloud, Change and Risk

Getting Started with Collaboration and Sharing

Nov 8, 2021 By BigPanda In BigPanda

In this video we will show how to set up the sharing of incidents from your BigPanda environments with ticketing, chat, notification and orchestration systems.

View Video

BigPanda

Read more about Getting Started with Collaboration and Sharing

Getting Started with Changes and Root Cause Changes

Nov 8, 2021 By BigPanda In BigPanda

In this video we will discuss how to integrate change data with BigPanda to drive change-related root cause analysis. We call this feature Root-Cause Changes.

View Video

BigPanda

Read more about Getting Started with Changes and Root Cause Changes

Visualize and manage all of your services in one place with Dynamic Service Graph

Nov 8, 2021 By Hannah Culver In PagerDuty

In this digital era, technology systems are becoming increasingly complex. No longer can a single SME (subject matter expert) understand every facet of the system they run. Instead, much of this knowledge is siloed and exists as tribal knowledge within certain teams. Additionally, the rate of change is faster than ever, with code deploying and new services shipping at a rate unimaginable a few years ago.

Read Post

PagerDuty

Read more about Visualize and manage all of your services in one place with Dynamic Service Graph

Your xMatters Android Inbox - xMatters Support

Nov 8, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he walks through your xMatters inbox in the xMatters app on Android devices and how to take advantage of it best.

View Video

xMatters

Read more about Your xMatters Android Inbox - xMatters Support

What Does Everbridge Do?

Nov 5, 2021 By Everbridge In Everbridge

Everbridge protects what matters most to you. Assess risk, locate impacted people and assets, act rapidly, and analyze outcomes to drive continued optimization for your organization.

View Video

Everbridge

Read more about What Does Everbridge Do?

SLA vs. SLO vs. SLI: Understanding the Similarities and Differences

Nov 5, 2021 By JJ Tang In Rootly

An explanation of the meaning of SLA, SLO and SLI, and how SREs should use each concept to manage reliability.

Read Post

Rootly

Read more about SLA vs. SLO vs. SLI: Understanding the Similarities and Differences

What's New in the PagerDuty Terraform Provider - PagerDuty Garage (Oct 29, 2021)

Nov 4, 2021 By PagerDuty In PagerDuty

The Terraform PagerDuty provider is a plugin for Terraform that allows for the management of PagerDuty resources using HCL (HashiCorp Configuration Language). Manage your PagerDuty account with Infrastructure as Code. #infrastructureascode For more info on the PagerDuty provider for #Terraform, see the documentation on the Terraform Registry.

View Video

PagerDuty

Read more about What's New in the PagerDuty Terraform Provider - PagerDuty Garage (Oct 29, 2021)

Moogsoft & Dynatrace

Nov 4, 2021 By Moogsoft In Moogsoft

See how you can integrate Moogsoft with the Dynatrace application.

View Video

Moogsoft

Read more about Moogsoft & Dynatrace

How they SRE: Insights from the Cloudflare SRE team

Nov 3, 2021 By Pruthvi In Spike

Cloudflare is a global cloud services provider that is based all over the globe, from San Francisco, US to London, England to Sydney, Australia. Their mission, as stated front and center on their homepage, is to help build a better Internet. While that may read like hyperbole, their numbers are impressive - Cloudflare has over 126,000 paying customers and 95% of Internet Users in the developed world are within 50ms of their network.

Read Post

Spike

Read more about How they SRE: Insights from the Cloudflare SRE team

Enterprise Alert Remote Remediation and Escalation

Nov 3, 2021 By Derdack In Derdack

Clips from our interview with: Jeffrey M. Postolowski Sr. Director of Technology Services Information Technology Services Bridgeport Board of Education

View Video

Derdack

Read more about Enterprise Alert Remote Remediation and Escalation

Top Four xMatters DevOps Case Studies

Nov 3, 2021 By xMatters In xMatters

xMatters is a crucial tool for DevOps teams, and no one knows that better than our customers. Over the years we’ve published countless DevOps case studies, but when it comes to the test of time, some have stood up and have continued to make an impact.

Read Post

xMatters

Read more about Top Four xMatters DevOps Case Studies

Installing xMatters on Android - xMatters Support

Nov 3, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he teaches you how to install the xMatters app on your Android device.

View Video

xMatters

Read more about Installing xMatters on Android - xMatters Support

OnPage Integrates With Single Sign-On Solutions to Improve Secure Authentication

Nov 3, 2021 By OnPage Corporation In OnPage

WALTHAM, Mass., Nov. 3, 2021 — OnPage Corporation, a Boston-based incident management company, today announced the availability of new integrations with leading single sign-on (SSO) solutions Okta and OneLogin. The latest integrations allow for a secure authentication process when users log in to the OnPage system using their SSO account credentials.

Read Post

OnPage

Read more about OnPage Integrates With Single Sign-On Solutions to Improve Secure Authentication

November 2021 Update - Improved incident response with team escalation and more

Nov 3, 2021 By René In SIGNL4

Our November update introduces new team settings and, along with them, entirely new options for escalating Signls. This will allow you to make your incident response even more reliable. One application is to create a ‘managers on duty’ teams with full duty scheduling capabilities and escalate missed Signls to such 2nd level response team. As always, you can find all the details in this article.

Read Post

SIGNL4

Read more about November 2021 Update - Improved incident response with team escalation and more

How UK Healthcare Reduced Incident Response Times from Minutes to Seconds - xMatters Demo

Nov 2, 2021 By xMatters In xMatters

When there's a high severity incident at a hospital, it could be a life or death situation. So how do you get in contact with the right doctors and clinicians in such a busy environment when tensions are running high? Join Glenn Steketee, Technology Service Analyst at UK Healthcare, Sonu Sekhon, Customer Success Manager at xMatters, and Will Derksen, Product Advocate at xMatters, to discuss how xMatters reduced incident response times from minutes to seconds.

View Video

xMatters

Read more about How UK Healthcare Reduced Incident Response Times from Minutes to Seconds - xMatters Demo

Unlocking Climate Change Resilience Through Critical Event Management and Public Warning

Nov 2, 2021 By Everbridge In Everbridge

Across the globe, both public and private sectors are more concerned than ever about addressing climate change and its associated risks. “In the period 2000 to 2019, there were 7,348 major recorded disaster events claiming 1.23 million lives, affecting 4.2 billion people (many on more than one occasion) resulting in approximately US$2.97 trillion in global economic losses,” according to a report conducted by the UN Office for Disaster Risk Reduction (UNDRR).

Read Post

Everbridge

Read more about Unlocking Climate Change Resilience Through Critical Event Management and Public Warning

What's New in xMatters: The Ninja Release

Nov 1, 2021 By Sean Rousseau In xMatters

Get ready for something exciting coming your way! xMatters latest release, Ninja, is on the horizon and will be available in production next week. Named in honor of the classic video game Ninja Gaiden, this latest batch of xMatters updates is sure to pack a punch — pun definitely intended. This release rolls out exciting new features like an intelligent Service Dependencies map and integrations with the broader Everbridge platform, among many other things.

Read Post