August 2021

Situation Room: On-Call Team Faces Worst Case of Sunday Scaries

Aug 31, 2021 By Emily Arnott In Blameless

Picture this: it’s Sunday night. You’re relaxing in bed, in that sweet spot where you’re geared up for Monday, but the fun of the weekend hasn’t yet faded. As you idly scroll through content on your phone, you see a message preview pop up. It’s to your work email. That’s bad. It’s from the hosting company you contract. That’s really bad. They’re saying they accidentally deleted the production database. That’s “jump out of bed” bad.

Read Post

Blameless

Read more about Situation Room: On-Call Team Faces Worst Case of Sunday Scaries

What Does Everbridge Crisis Management Do for Your Organization?

Aug 31, 2021 By Everbridge In Everbridge

Everbridge Crisis Management provides organizations a single solution for business continuity, disaster recovery and emergency communication. In one application, crisis teams can coordinate all response activities, teams and resources to accelerate recovery times and maintain command and control when crises evolve into unanticipated scenarios.

View Video

Everbridge

Incident Management

Read more about What Does Everbridge Crisis Management Do for Your Organization?

How to Structure an IT Help Desk

Aug 31, 2021 By Christopher Gonzalez In OnPage

Managed service providers (MSPs) need an IT help desk to address and answer the technical questions of clients. In the modern MSP environment, the IT help desk is the primary source of contact between customers and knowledgeable, responsive support personnel. Successful help desks are customer oriented and encourage clients to report IT incidents when they occur.

Read Post

OnPage

Read more about How to Structure an IT Help Desk

Monthly Moo Update | September 2021

Aug 31, 2021 By Adam Frank In Moogsoft

This has been quite the summer to remember as we continue to witness our customers achieve remarkable efficiencies through automation such as deep integrations with change pipelines to suppress alerts during maintenance windows and correlating alerts to create incidents with dynamic and evolving descriptions that dramatically improve Incident management processes.

Read Post

Moogsoft

Read more about Monthly Moo Update | September 2021

Thank you for your fantastic reviews of our mobile alerting app!

Aug 31, 2021 By emily In SIGNL4

We would like to thank our loyal customers for the numerous reviews of SIGNL4! We are excited that you share your opinion on various rating platforms with other people and support us.

Read Post

SIGNL4

Read more about Thank you for your fantastic reviews of our mobile alerting app!

Has the firefighting stopped? The effect of COVID-19 on on-call engineers

Aug 30, 2021 By Joseph Mandros In PagerDuty

With digital becoming the primary channel for work, education, shopping, and entertainment in the last 18 months, it’s no surprise that workloads for technical teams and on-call engineers have increased. Data from PagerDuty’s inaugural platform insights report, The State of Digital Operations, highlights this reality. As of July 2021, the average number of events managed daily by PagerDuty is 37 million, with 61,000 of those being critical incidents.

Read Post

PagerDuty

Read more about Has the firefighting stopped? The effect of COVID-19 on on-call engineers

The Value of Hyper-local Risk Intelligence

Aug 30, 2021 By Everbridge In Everbridge

Every enterprise has a unique risk profile. This is based on a wide range of factors including geographic disposition, sector, the scope of security and resiliency plans, organizational size and structure, supply chain, and much more. Without the right customized tools tailored for your organization in place, it’s challenging to stay ahead of threats and disruptions to your people, places, operations, and digital systems.

Read Post

Everbridge

Read more about The Value of Hyper-local Risk Intelligence

New feature: Templates for Incident Management

Aug 27, 2021 By Pruthvi In Spike

At Spike.sh , we are obsessed with making incident management more accessible to dev teams everywhere. With this goal in mind, we are always looking for ways to reduce the friction while setting up the Spike.sh platform. When we saw customers asking our advice for creating effective on-call schedules and escalations, we knew we had to do more than just good documentation - we needed a way to share best practices with our customers in the product itself.

Read Post

Spike

Read more about New feature: Templates for Incident Management

3 Key Insights to Help You Build the Workplace for Today & Tomorrow

Aug 27, 2021 By Everbridge In Everbridge

Everbridge sat down with two leading experts to discuss how innovative technologies are improving worker safety and operational functionality, and how firms can keep up. With such demanding times for the business world, it’s easy for companies to become fixated on survival, rather than thriving. But businesses that use unprecedented circumstances as a time to innovate and invest in new technology as well as rescoping the use case of their existing technologies, will emerge stronger than ever.

Read Post

Everbridge

Read more about 3 Key Insights to Help You Build the Workplace for Today & Tomorrow

Balancing Healthcare Resilience with the Patient Experience

Aug 27, 2021 By Everbridge In Everbridge

For healthcare systems, building resilience for the future is learned from adapting and responding to critical events and factoring in circumstances that are often unique to the communities they serve such as the patient population, size of the hospital and/ or community, and scope of services.

Read Post

Everbridge

Read more about Balancing Healthcare Resilience with the Patient Experience

You Do the Math: Reliability Issues Triggered by Math Errors

Aug 26, 2021 By Mateus Gurgel In Rootly

Even seemingly minor math bugs in software code can have outsize consequences.

Read Post

Rootly

Read more about You Do the Math: Reliability Issues Triggered by Math Errors

Safety Experts Plan for Fall

Aug 26, 2021 By Everbridge In Everbridge

Everbridge recently hosted a Safety Experts Plan for Fall webinar, with an expert panel comprised of Dr. Rashid Chotani (Chief Medical Director/Senior Scientist, IEM), Steven J. Healy (President and CEO, Margolis Healy), Marisa R. Randazzo, Ph.D. (CEO and Founder, SIGMA Threat Management Associates) and James Podlucky (Industry Solutions Manager, Everbridge). The panel was moderated by Dan Pascale, Executive VP, Margolis Healy.

Read Post

Everbridge

Read more about Safety Experts Plan for Fall

How to Mitigate the Effects of Floods on Your Supply Chain

Aug 26, 2021 By Everbridge In Everbridge

Floods may now be an unfortunate counterweight to the wildfires that have come to characterize summers worldwide. In 2021 alone, floods wreaked havoc in Western Europe, China’s Henan province, and Tennessee and North Carolina in the United States. Hundreds of lives were lost, property damage ran in the billions, and global supply chains were thrown into disarray.

Read Post

Everbridge

Read more about How to Mitigate the Effects of Floods on Your Supply Chain

8 DevOps Best Practices for a High-Performance Team

Aug 26, 2021 By Noor-ul-Anam Ruqayya In Blameless

Wondering about DevOps best practices? If you are looking to improve and streamline your current process, we recommend these practices and how to implement them.

Read Post

Blameless

Read more about 8 DevOps Best Practices for a High-Performance Team

Ruby on Rails Polymorphic Select Dropdown

Aug 26, 2021 By Austin Miller In PagerTree

Today I want to show you how to build a polymorphic select box in Ruby on Rails. Seems trivial, but it’s not. Let me show you the way and save you some time.

Read Post

PagerTree

Read more about Ruby on Rails Polymorphic Select Dropdown

MTBF Is an Integral Part of Business Operations - Here's Why

Aug 25, 2021 By Kalen Wessel In xMatters

In today’s fast-paced digital world, your customers expect your services to be available 24 hours a day, seven days a week. If your services are unreliable, these customers will likely take their business elsewhere — and spread the word. To retain their business, you must understand and optimize your service and system health to ensure your services are reliable. Gauging your service and system health requires much more than knowing whether they’re on or off.

Read Post

xMatters

Read more about MTBF Is an Integral Part of Business Operations - Here's Why

What's new: Updates to Event Intelligence, mobile, and more!

Aug 25, 2021 By Vera Chan In PagerDuty

As we near the end of the Summer season, we’re excited to announce a new set of updates and enhancements to the PagerDuty platform. These updates will help our users and customers: Make sure to view the latest PagerDuty Pulse or learn more from our community team and developer advocates who have launched new programs to help you learn more about our latest products and best practices.

Read Post

PagerDuty

Read more about What's new: Updates to Event Intelligence, mobile, and more!

Call Handling - Relieve the burden of your service desk and on-call staff

Aug 25, 2021 By Derdack In Derdack

These days, I keep encountering inquiries from various customers on the topic of call handling. Due to the current transformation, triggered by the increased use of home offices, it is becoming more and more important to make on-call staff more accessible. Often the already overloaded service desk is used for this purpose. Of course, this leads to a) a deterioration in the quality of the service desk and b) delays between the receipt of the problem and the start of problem resolution.

Read Post

Derdack

Read more about Call Handling - Relieve the burden of your service desk and on-call staff

Automate your LogDNA + PagerDuty Incident Workflow

Aug 25, 2021 By Albert Feng In Mezmo

LogDNA integrates with your PagerDuty instance to help trigger incidents based on log data coming in from your ingestion sources. This allows your teams to quickly understand when there are issues with your application, and where in the logs you can investigate to understand root cause. To help further accelerate your team’s ability to understand the state of your applications, we are introducing the ability to automatically resolve those PagerDuty Incidents directly from LogDNA.

Read Post

Mezmo

Read more about Automate your LogDNA + PagerDuty Incident Workflow

Self-Compassion Instead of Self-Blame

Aug 24, 2021 By Emily Arnott In Blameless

The tech industry is competitive and not without challenges. People are always growing and improving by pushing their limits. Innovation comes in many forms. In order to foster a healthy culture while allowing people to flourish, organizations must carefully enact policies. Growth should be encouraged while discouraging competition and comparison. One of the core policies organizations implement to achieve these goals is blamelessness.

Read Post

Blameless

Read more about Self-Compassion Instead of Self-Blame

Best practices to help retailers make the grade for the holiday season

Aug 24, 2021 By Hannah Culver In PagerDuty

It’s hard to believe we’re already talking about the return to school, but it’s set to be a big one. In fact, this year promises to be the biggest in the last five years. The National Retail Federation expects back-to-school spending to reach $37.1B , up from $33.9B last year. Back-to-college spending is also expected to rise, reaching $71B this year. This increase is buoyed by parents and students gearing up for their first in-person classes after a year of virtual learning.

Read Post

PagerDuty

Read more about Best practices to help retailers make the grade for the holiday season

Introducing the Spike.sh Alert Reliability Engine

Aug 23, 2021 By Pruthvi In Spike

At Spike.sh, our mission is to help dev teams understand and resolve production issues faster. At the core of this is our Alert Reliability Engine, whose job is to make sure that a team member always gets an alert on their preferred channel. Currently, we support 7 channels - phone call, SMS, mobile push notifications, email, Slack, Microsoft Teams and Discord. We wanted to give you a peek into how we achieve high deliverability across these channels.

Read Post

Spike

Read more about Introducing the Spike.sh Alert Reliability Engine

How MBTA modernized incident response to reduce alert fatigue and improve collaboration

Aug 23, 2021 By PagerDuty In PagerDuty

Citizens utilize mobile and consumer-facing applications in everyday life, so it’s no surprise that they demand seamless access and high availability of government services online. Whether it’s making payments or applying for benefits, citizens and constituents alike expect these services to be available around the clock.

Read Post

PagerDuty

Read more about How MBTA modernized incident response to reduce alert fatigue and improve collaboration

Chapter Twelve: In Which Dinesh Starts an AI Community of Practice

Aug 23, 2021 By Helen Beal In Moogsoft

This is the twelfth chapter in The Observability Odyssey, a book exploring the role that intelligent observability plays in the day-to-day life of smart teams. In this chapter, our SRE, Dinesh, brings together his colleagues with an interest in AI.

Read Post

Moogsoft

Read more about Chapter Twelve: In Which Dinesh Starts an AI Community of Practice

PagerDuty Summit 2021 Highlights

Aug 20, 2021 By PagerDuty In PagerDuty

Check out all of the excitement from Summit 2021. #PagerDutySummit2021 #Summit2021 #PagerDutySummit

View Video

PagerDuty

Read more about PagerDuty Summit 2021 Highlights

Making Your On-call and Incident Management Program Stick

Aug 20, 2021 By David Caudill In Rootly

Maintenance of your incident management practice is as important as creation - find out what you can do to keep your engineering organization strong and consistent year over year.

Read Post

Rootly

Read more about Making Your On-call and Incident Management Program Stick

Live Coding on the PagerDuty Terraform Provider with Scott McAllister

Aug 20, 2021 By PagerDuty In PagerDuty

In this stream, Developer Advocate Scott McAllister creates a new feature in the PagerDuty Terraform provider.

View Video

PagerDuty

Read more about Live Coding on the PagerDuty Terraform Provider with Scott McAllister

Working with Signl4 Multi-Teams

Aug 20, 2021 By SIGNL4 In SIGNL4

Creating and Using the new Multi-Teams features inside of Signl4

View Video

SIGNL4

Read more about Working with Signl4 Multi-Teams

Have You Herd? | Episode 3: Observability from Bare Metal to Cloud

Aug 19, 2021 By Minami (Coirin) Rojas In Moogsoft

A summary of our third Moogsoft engineering Twitch Stream chatting about all things DevOps

Read Post

Moogsoft

Read more about Have You Herd? | Episode 3: Observability from Bare Metal to Cloud

How Squadcast Benefits On-call Engineers - Part 1

Aug 19, 2021 By Merlyn Shelley In Squadcast

It is difficult to stay completely reliable in an always-on world. So it's very important to choose the right Incident Management solution that can solve your problems. In this blog, we have highlighted the benefits of Squadcast and why you should adopt it. “Being on-call sucks!" Often incident response teams use this phrase when talking about their on-call experiences. Despite using best practices for managing infrastructure, incidents do occur from time to time.

Read Post

Squadcast

Read more about How Squadcast Benefits On-call Engineers - Part 1

Dynatrace and xMatters Make Seamless Efficiency Possible - xMatters Demo

Aug 19, 2021 By xMatters In xMatters

How can organizations integrate their tools into a platform that maximizes uptime and simplifies operations? Is it possible for the tools you already rely on to be more efficient? With Dynatrace and xMatters in tandem, the answer is yes! Join Rob Jahn, Technical Partner Manager at Dynatrace, Eric Maxwell, Solution Architect at xMatters, and Rutuja Rajwade, Partner Marketing Manager at xMatters, as they discuss how Dynatrace and xMatters can work together to make incident management and development processes more efficient.

View Video

xMatters

Read more about Dynatrace and xMatters Make Seamless Efficiency Possible - xMatters Demo

How the technology you choose influences CloudOps maturity

Aug 19, 2021 By Inga Weizman In PagerDuty

As the world becomes increasingly digital-first, it’s more important than ever for organizations to keep services always-on, innovate quickly, and deliver great customer experiences. Uptime is money, so it’s no surprise that many have made the shift to cloud in recent years in order to make use of its flexibility and scale—while controlling costs. And while 2020 wasn’t easy for any organization, those that are thriving have embraced the digital mindset.

Read Post

PagerDuty

Read more about How the technology you choose influences CloudOps maturity

DevOps & SRE Words Matter: How Our Language has Evolved

Aug 18, 2021 By Emily Arnott In Blameless

As the tech world changes, language changes with it. New technologies will always introduce new terms and descriptions to provide clear understanding. For example, the emergence of the cloud introduced language to describe the changing relationship between servers and clients. Then, of course, product providers will also dictate how their products are to be described, i.e. describing services as “cloud-native”.

Read Post

Blameless

Read more about DevOps & SRE Words Matter: How Our Language has Evolved

WIRES and xMatters: Efficient Collaboration On a National Scale

Aug 18, 2021 By Rachel Scholefield In xMatters

An update on how xMatters service reliability platform is improving animal rescue response times through WIRES in Australia. We are extremely grateful for xMatters support and are excited to share this update with the xMatters community. We have made so much progress with our wildlife rescue response systems since the devastating bushfires of 2019 and 2020, despite the continuing challenges of COVID-19.

Read Post

xMatters

Read more about WIRES and xMatters: Efficient Collaboration On a National Scale

Managed Service Provider - How AlertOps Helps MSP Scale Digital Transformation Initiatives.

Aug 18, 2021 By AlertOps In AlertOps

In an era where speed, productivity, and user experiences matter most what are the incident management capabilities managed service provider need most to grow, transform and mature their digital operations, processes and serve more organizations, faster and more efficiently. Many of today’s enterprises still have operations that are largely manual, reactive and lack the in-house resources and expertise to undertake a digital transformation initiative.

Read Post

AlertOps

Read more about Managed Service Provider - How AlertOps Helps MSP Scale Digital Transformation Initiatives.

What's New: Introducing Delay Notifications to Control Alert Fatigue

Aug 18, 2021 By Ritika Bramhe In OnPage

The OnPage team is pleased to announce a new feature to the enterprise web console: Delay Notifications. With this new addition, organizations have the option to queue messages for specific time periods, delivering messages at the end of the Delay Notification schedule. The latest feature is designed to alleviate alert fatigue and improve work-life balance for incident respondents.

Read Post

OnPage

Read more about What's New: Introducing Delay Notifications to Control Alert Fatigue

The Top 4 Key Levers to Build Towards Long-Lasting Digital Operations Maturity

Aug 17, 2021 By PagerDuty In PagerDuty

Digital operations maturity is a journey. The first step is to understand where you are, where you want to get to, and what’s keeping you from getting there. Only then can you make strategic decisions and lay out a plan for how to approach any hurdles and land where you want your organization to be. For many organizations, upleveling operational maturity requires investment in driving cultural change with fundamental shifts to operating models.

Read Post

PagerDuty

Read more about The Top 4 Key Levers to Build Towards Long-Lasting Digital Operations Maturity

Full-cycle observability with the Elastic Stack and Lightrun

Aug 17, 2021 By Daliya Spasova In Elastic

An application running in production is a difficult beast to tame. Most experienced developers–ones who spent enough late nights or Saturday mornings trying to break apart a nasty production bug–will try and create the clearest possible picture for their later selves while writing their code, so that they could understand what’s actually going on in the system during an incident.

Read Post

Elastic

Read more about Full-cycle observability with the Elastic Stack and Lightrun

Chapter Ten: In Which Sarah Resigns from Animapanions and Heads Off to Start Up a Competitor

Aug 16, 2021 By Helen Beal In Moogsoft

This is the tenth chapter in The Observability Odyssey, a book exploring the role that intelligent observability plays in the day-to-day life of smart teams. In this chapter, our DevOps Engineer, Sarah, throws in the towel at C&Js and moves on to build her own business.

Read Post

Moogsoft

Read more about Chapter Ten: In Which Sarah Resigns from Animapanions and Heads Off to Start Up a Competitor

Chapter Eleven: In Which James Speaks with the Industry Analysts

Aug 16, 2021 By Helen Beal In Moogsoft

This is the eleventh chapter in The Observability Odyssey, a book exploring the role that intelligent observability plays in the day-to-day life of smart teams. In this chapter, our IT Ops Leader, James, speaks with the analysts about what’s happening in the AIOps space.

Read Post

Moogsoft

Read more about Chapter Eleven: In Which James Speaks with the Industry Analysts

Getting Started with Site Reliability Engineering

Aug 16, 2021 By Robert Ross In FireHydrant

Site Reliability Engineer (SRE) is one of the fastest growing jobs in tech, with Linkedin reporting 34% growth YoY in 2020 and over 9000 openings in their Emerging Jobs Report. If you’re new to SRE and exploring it as a career path, understand that it can be a challenging but rewarding experience. Here are some quick tips on how you can get started with SRE and jump-start a rewarding career.

Read Post

FireHydrant

Read more about Getting Started with Site Reliability Engineering

Strategies to Strengthen Nurse Mental Health and Safety

Aug 13, 2021 By Ritika Bramhe In OnPage

No job is easy, but the job of a nurse is even more challenging, especially during a global health crisis. Nurses are at a higher risk of developing burnout due to the psychological trauma and cognitive overload that comes with the nursing profession. The situation is further exacerbated when nurses take on more responsibility during a pandemic or other large-scale incidents.

Read Post

OnPage

Read more about Strategies to Strengthen Nurse Mental Health and Safety

SLOs, SLIs, and where to find them with Jacob Plicque III

Aug 12, 2021 By Grafana In Grafana

Identifying the right the right Service-Level Indicators is mission-critical for any SRE team responsible for meeting Service-Level Objectives and reporting on them. Find out how to sift through mountains of metrics and fill gaps in your data in order to visualize SLIs that actually matter for effective error budget tracking and actionable alerts in Grafana. Presented by: Jacob Plicque III, Senior Engineer at Grafana Labs at Grafana East Coast Virtual Meetup - August 2021

View Video

Grafana

Read more about SLOs, SLIs, and where to find them with Jacob Plicque III

Real-time digital operations management puts connected vehicles on the road to success

Aug 12, 2021 By PagerDuty In PagerDuty

As technology advances and applications for the Internet of Things (IoT) continue to expand, industrial and manufacturing companies are embedding more digital systems into their operations. From smart factories and intelligent shipping to automation and 3D printing, Industry 5.0 has arrived.

Read Post

PagerDuty

Read more about Real-time digital operations management puts connected vehicles on the road to success

Lone Workers vs. Remote Workers: Knowing the Difference and Keeping Both Safe

Aug 12, 2021 By Everbridge In Everbridge

The Covid-19 pandemic increased opportunities for remote work four to five times more than before, according to a report from McKinsey & Co. Although many office-based workers had no choice but to leave their desk jobs and make the move to work from home in early 2020, remote work appears to be here to stay. The rapid transformation brought forward by the pandemic has muddied the definition of remote workers versus lone workers, but it’s essential not to confuse the two.

Read Post

Everbridge

Read more about Lone Workers vs. Remote Workers: Knowing the Difference and Keeping Both Safe

A Complete Guide to DevOps (Explained Simply)

Aug 12, 2021 By Blameless Community In Blameless

Wondering what DevOps is all about? We will explain what it is, how it works, why it matters, and how it can help your organization.

Read Post

Blameless

Read more about A Complete Guide to DevOps (Explained Simply)

Everbridge for Financial Services

Aug 11, 2021 By Everbridge In Everbridge

Effective communication in banking and all financial services drives behavioral change, reduces risk, improves employee wellness, employee retention, and productivity. For better internal communications in financial services, discover Everbridge.

View Video

Everbridge

Incident Management

Read more about Everbridge for Financial Services

Charting a Resilience Roadmap for 2021 - PwC's Global Crisis Survey Results with Everbridge

Aug 11, 2021 By Everbridge In Everbridge

Learn more at everbridge.com

View Video

Everbridge

Incident Management

Read more about Charting a Resilience Roadmap for 2021 - PwC's Global Crisis Survey Results with Everbridge

Examples of Critical Event Management in 2021 - PwC's Global Crisis Survey Results with Everbridge

Aug 11, 2021 By Everbridge In Everbridge

Learn more at everbridge.com

View Video

Everbridge

Incident Management

Read more about Examples of Critical Event Management in 2021 - PwC's Global Crisis Survey Results with Everbridge

Breaking down silos with CEM - PwC's Global Crisis Survey Results with Everbridge

Aug 11, 2021 By Everbridge In Everbridge

learn more at everbridge.com

View Video

Everbridge

Incident Management

Read more about Breaking down silos with CEM - PwC's Global Crisis Survey Results with Everbridge

How to Avoid the Executive 'Swoop and Poop' and Other Best Practices for Operational Maturity

Aug 11, 2021 By Hannah Culver In PagerDuty

We’re eating at restaurants again. We’re seeing family after too long apart. Some of us may even be returning to the office. But, that doesn’t mean that the pressure is off for digital services, and growing in operational maturity still remains top of mind. While the digital transformations have been taking place for the last two decades, COVID-19 added pressure to speed initiatives.

Read Post

PagerDuty

Read more about How to Avoid the Executive 'Swoop and Poop' and Other Best Practices for Operational Maturity

3 Focus Areas for Improving Business Resilience

Aug 11, 2021 By Everbridge In Everbridge

More than 2,800 senior executives in organizations of all sizes across 29 industries and 73 countries weighed in on their 2020 crisis response plans in PricewaterhouseCooper’s (PwC) annual impact survey. This is a valuable insight into resiliency planning, business operations, and the future of the workplace.

Read Post

Everbridge

Read more about 3 Focus Areas for Improving Business Resilience

Are You Spending Enough on Cybersecurity?

Aug 11, 2021 By Christopher Gonzalez In OnPage

Cybercriminals do not discriminate against the organization, people or industry they target. These actors look to exploit vulnerabilities in resources to intercept valuable data from small and medium-sized businesses (SMBs). Cyberattacks are inevitable, and organizations must have the right controls and information security systems to mitigate the impact of an attack.

Read Post

OnPage

Read more about Are You Spending Enough on Cybersecurity?

FireHydrant Platform Demo - August 2021

Aug 11, 2021 By FireHydrant In FireHydrant

FireHydrant is the only comprehensive reliability platform that allows teams to achieve reliability at scale by creating speed and consistency across the entire incident response lifecycle.

View Video

FireHydrant

Read more about FireHydrant Platform Demo - August 2021

Improving your team's on-call experience

Aug 10, 2021 By Max Rozen In OnlineOrNot

Your engineers probably dislike going on-call for your services. Some might even dread it. It doesn't have to be this way. With a few changes to how your team runs on-call, and deals with recurring alerts, you might find your team starting to enjoy it (as unimaginable as that sounds). I wrote this article as a follow-up to Getting over on-call anxiety.

Read Post

OnlineOrNot

Read more about Improving your team's on-call experience

Have You Herd? Episode 3 | Observability from Bare Metal to Cloud

Aug 10, 2021 By Moogsoft In Moogsoft

Join us for the third episode of Have You Herd? as our team breaks down the observability journey from the old days of bare metal racks all the way to what may come in the future. Also as bonus fun! a new "guess the cow" game to kick us off!

View Video

Moogsoft

Read more about Have You Herd? Episode 3 | Observability from Bare Metal to Cloud

SREview Issue #16 August 2021

Aug 10, 2021 By Blameless Community In Blameless

We’re kicking off August with some thrilling news: Blameless has closed a $30M Series B fund raise! Learn more about how we’re entering the next phase of our journey to advance reliability for engineering teams here. We’re so grateful to our customers, collaborators, and the entire SRE community for their support! Let’s dive in with our favorite content for the month!

Read Post

Blameless

Read more about SREview Issue #16 August 2021

Supercharging incident response with runbook automation

Aug 10, 2021 By PagerDuty In PagerDuty

The global pandemic is estimated to have accelerated digital transformation by at least seven years—and it’s showing no signs of stopping. In fact, companies are investing even more into software-driven experiences. A recent Gartner forecast points to worldwide IT spending increasing 8.4% to $4.1 trillion in 2021, with much of that spend on mission-critical, customer-facing services.

Read Post

PagerDuty

Read more about Supercharging incident response with runbook automation

We've raised a $23M Series B to help us get to a world where all software is reliable

Aug 10, 2021 By Robert Ross In FireHydrant

At FireHydrant, we envision a world where all software is reliable, and we’re on a mission to help every company that builds or operates software get closer to 100% reliability. Today, we’re thrilled to announce that we’ve raised $23 million to help us further our goal.

Read Post

FireHydrant

Read more about We've raised a $23M Series B to help us get to a world where all software is reliable

Coffee Break Webinar Series: "Intelligent Observability - Blamefree Retrospectives"

Aug 9, 2021 By Taylor Urban In Moogsoft

A selection of questions and answers from our recent webinar on leveraging AIOps to run sustainable, blameless retrospectives.

Read Post

Moogsoft

Read more about Coffee Break Webinar Series: "Intelligent Observability - Blamefree Retrospectives"

Timely Delivery with Enterprise Alert

Aug 9, 2021 By Derdack In Derdack

Murphy’s Law states that anything that can go wrong, will go wrong. The challenge for most businesses is putting the right method of communication in place for when the inevitable happens. The only way to handle this is to expect the worst and then prepare for it. A key factor in deciding for any alerting solution is can my team be notified properly when a major outage happens .

Read Post

Derdack

Read more about Timely Delivery with Enterprise Alert

Announcing our $6M investment to double down on IT incident and Reliability needs

Aug 6, 2021 By Amiya Adwitiya In Squadcast

When Squadcast was founded back in 2018, we had a concise yet clear goal—we wanted to make it as easy as possible for companies to manage their IT incident and reliability needs. In the spirit of continuing that mission, today I’m excited to announce our $6M fundraise led by DNX Ventures and backed by Wipro Ventures, Nexus Ventures, and Chiratae Ventures. We’re also pleased to announce the addition of DNX and Q Motiwala to our Board of Directors.

Read Post

Squadcast

Read more about Announcing our $6M investment to double down on IT incident and Reliability needs

Timely Delivery with Enterprise Alert

Aug 6, 2021 By Derdack In Derdack

Setting up Time-Based notification profiles inside of Enterprise Alert to ensure users receive alerts in the most timely manner possible.

View Video

Derdack

Read more about Timely Delivery with Enterprise Alert

Incident Management Goes to the Olympics

Aug 5, 2021 By Quentin Rousseau In Rootly

A look at outages and disruptions to the IT systems that power the Olympics, from 1996 to today.

Read Post

Rootly

Read more about Incident Management Goes to the Olympics

Everbridge is the place to be

Aug 5, 2021 By Everbridge In Everbridge

Culture is about more than just a fancy office, benefits or team activities. It’s about the people. Our Bridgers build and own the company culture, enforce our values, and their passion fuels our continued innovation and growth. We wouldn’t be where we are without them, and our growth and great culture are because of what we’ve achieved as a team together! Individually we are amazing but together we are remarkable.

View Video

Everbridge

Incident Management

Read more about Everbridge is the place to be

What's the ROI? How Operational Maturity Improves Customer and Team Satisfaction

Aug 5, 2021 By Hannah Culver In PagerDuty

Are we looking at the new normal now? In the last 18 months, organizations all over the world were compelled to undergo a rapid digital transformation and mature their operations to support services that were under unprecedented strain. Digital transformation allows companies to embark on large-scale cloud migrations and adopt modern development methods like DevOps and Agile.

Read Post

PagerDuty

Read more about What's the ROI? How Operational Maturity Improves Customer and Team Satisfaction

Demystifying DevOps and SRE

Aug 4, 2021 By James Samuel In Squadcast

How different are DevOps and SRE? Are they related to each other? In this blog, James Samuel sheds light on the similarities & differences between SRE & DevOps followed by the possible ways to structure an SRE team in your organization. One of the terms that people often find confusing is SRE and DevOps. People often ask, should I hire a DevOps Engineer or a Site Reliability Engineer? What is the difference between SRE and DevOps and which one do I need? In this post, I attempt to shed some light.

Read Post

Squadcast

Read more about Demystifying DevOps and SRE

New Features: My On-Call Shifts, Critical Alerts, Support-Hours Based Call Routing, and More

Aug 4, 2021 By iLert In iLert

This post covers some of the highlights that we have released in the last 6 months.

Read Post

iLert

Read more about New Features: My On-Call Shifts, Critical Alerts, Support-Hours Based Call Routing, and More

How PagerDuty Helps Manage Hybrid Infrastructure and Complex Ops Across Industries

Aug 4, 2021 By PagerDuty In PagerDuty

If there’s one thing we learned from the 80+ sessions from Summit 2021, it’s that across the industries, companies are continuing to accelerate innovation in a bid to meet growing customer expectations of always-on services across all channels. In financial services, disrupting traditional banking or rethinking access to advisory services comes with operational and regulatory challenges.

Read Post

PagerDuty

Read more about How PagerDuty Helps Manage Hybrid Infrastructure and Complex Ops Across Industries

Contextual Intelligence and Observability: Without the Former, You Really Don't Have the Latter

Aug 4, 2021 By Richard Whitehead In Moogsoft

Observability is a hot term in the industry, but don’t let it fool you: having visibility into your organization's apps and services only gives you partial clarity into a system’s overall performance. To get a full understanding of your monitoring data, you need to apply contextual intelligence.

Read Post

Moogsoft

Read more about Contextual Intelligence and Observability: Without the Former, You Really Don't Have the Latter

New Product Integration! Microsoft Teams Video

Aug 3, 2021 By Emily Arnott In Blameless

On the heels of our Microsoft Teams integration release to streamline incident management, we’re excited to share that we now support Microsoft Teams Video capabilities. We generate Microsoft Teams video conference links for each Blameless incident for fast and easy collaboration. Microsoft Teams Video joins Zoom, Google Meet, and GoToMeeting in our video integration suite.

Read Post

Blameless

Read more about New Product Integration! Microsoft Teams Video

Kemp Flowmon ADS and Check Point Integration: Automated incident detection and response full video

Aug 2, 2021 By Flowmon In Flowmon

See how Kemp Flowmon ADS can work in tandem with a Check Point firewall to automatically quarantine an IP address upon security event detection.

View Video

Flowmon

Read more about Kemp Flowmon ADS and Check Point Integration: Automated incident detection and response *full video*

Hear From Product PagerDuty for Customer Service Operations Lightning Talk

Aug 2, 2021 By PagerDuty In PagerDuty

Learn about what's new with PagerDuty for Customer Service Operations from the Summit 2021 Launch. Our Product team shares how you can benefit from our latest updates and enhancements and enjoy demos that were recorded live from Summit 2021 featuring the PagerDuty Salesforce Service Cloud Integration V3, New Customer Service SKU, and Round Robin Workflows (Round Robin Scheduling).

View Video

PagerDuty

Read more about Hear From Product PagerDuty for Customer Service Operations Lightning Talk

PagerDuty Pulse Q1 FY22 Full Webinar

Aug 2, 2021 By PagerDuty In PagerDuty

In this edition of PagerDuty Pulse, you’ll get to view our most recent platform updates and enhancements (March 2021 – June 2021) that extend from AIOPs and automation to a variety of new integrations. Teams must leverage PagerDuty and Modern Digital Operations to automate the day-to-day toil of repetitive tasks, master modern operations with full-service ownership, seamlessly collaborate across the organization, and accelerate enterprise-wide response by enabling customer service operations and business stakeholders.

View Video

PagerDuty

Read more about PagerDuty Pulse Q1 FY22 Full Webinar

Less is more: Incident management and monitoring in hybrid IT infrastructures

Aug 2, 2021 By iLert In iLert

Many companies are continuously modernizing their infrastructure – but there is no standard way for the perfect IT infrastructure. Still, hybrid architectures have become the status quo in enterprises. Almost all organizations have migrated at least parts of their assets to the cloud or run applications as cloud services. At the same time, businesses want to dovetail their IT architecture with software development and are therefore embracing dynamic infrastructures. ‍

Read Post

iLert

Read more about Less is more: Incident management and monitoring in hybrid IT infrastructures

Resilience in Action E9: Vulnerability, Compassion, and Post-Incident Reviews in the Emergency Room with Dr. Al'ai Alvarez

Aug 2, 2021 By Christina Tan In Blameless

‍ What can software engineers learn from post-incident reviews that physicians do in the emergency room? In our ninth episode, Christina, member of the Blameless strategy team, guest-hosts the podcast to interview both Kurt Andersen and Al'ai Alvarez, MD (@alvarezzzy). Dr. Alvarez is an assistant clinical professor of Emergency Medicine at Stanford. Clinically, he’s an emergency physician.

Read Post

Blameless

Read more about Resilience in Action E9: Vulnerability, Compassion, and Post-Incident Reviews in the Emergency Room with Dr. Al'ai Alvarez

Operations | Monitoring | ITSM | DevOps | Cloud

August 2021