July 2021

What is Incident Management in IT and Why does it matter?

Jul 30, 2021 By Gurubaran Baskaran In CloudFabrix

Incident management is the process of identifying and resolving problems that occur in IT services. Incident Management is also used as a metric to measure the health of the IT Service Desk. Let’s discuss what incident management is, why it matters to your business, and how you can apply it to your organization.

Read Post

CloudFabrix

Read more about What is Incident Management in IT and Why does it matter?

The Unique Reliability Engineering Requirements of Microservices

Jul 30, 2021 By JJ Tang In Rootly

Although the fundamental concepts of site reliability engineering are the same in any environment, SREs must adapt practices to different technologies, like microservices.

Read Post

Rootly

Read more about The Unique Reliability Engineering Requirements of Microservices

Splunk On-Call prevents and cuts downtime episode length by half

Jul 30, 2021 By Splunk In Splunk

Your Answer: Escalate the right alerts to the right on-call people for fast collaboration and issue resolution with Splunk On-Call. Reduce burn-out and make on-call suck less with a complete ChatOps experience that's integrated with your IT stack and incident reporting.

View Video

Splunk

Read more about Splunk On-Call prevents and cuts downtime episode length by half

Mattermost Playbooks Roadmap Update June 2021

Jul 30, 2021 By Mattermost In Mattermost

Mattermost PM Ian Tao shares the roadmap for incident collaboration functionality -- now called Playbooks -- on the Mattermost platform.

View Video

Mattermost

Read more about Mattermost Playbooks Roadmap Update June 2021

Chapter Nine: In Which Dinesh Experiments with Chaos Engineering

Jul 30, 2021 By Helen Beal In Moogsoft

Another day, another drama! This one, though, is very much of my own making. I have been wanting to try my hand at a bit of chaos engineering for some time now but C&Js just hasn’t been ready. Sarah’s been up for it too, though, at Animapanions. And now that our CIO, Charlie has seen MTTR drop across every single technology team, thanks to the rollout of Moogsoft and the new incident management system (kudos to James), it’s pilot day.

Read Post

Moogsoft

Read more about Chapter Nine: In Which Dinesh Experiments with Chaos Engineering

What are the Four Golden Signals?

Jul 29, 2021 By Blameless In Blameless

SRE’s Golden Signals are four key metrics used to monitor the health of your service and underlying systems. We will explain what they are, and how they can help you improve service performance.

Read Post

Blameless

Read more about What are the Four Golden Signals?

Dun & Bradstreet Reduces Mean Time to Resolution with xMatters

Jul 29, 2021 By xMatters In xMatters

How does a business continue to improve its incident management processes, when it’s already using some of the best tools on the market? Join Nick Romanelli, Site Reliability Engineering Lead at Dun & Bradstreet, and Zoe Na, Customer Success Manager at xMatters, as they discuss how Dun & Bradstreet has been able to use xMatters to reduce MTTR and streamline major incident management. With their innovative use of Flow Designer, Dun & Bradstreet have created unique workflows that you’re going to want to know about!

View Video

xMatters

Incident Management

Read more about Dun & Bradstreet Reduces Mean Time to Resolution with xMatters

Most frequently asked questions surrounding Google's Cloud Operations Sandbox

Jul 29, 2021 By Nir Sharma In Squadcast

Cloud Operations Sandbox serves as a simulation tool for budding SREs to learn the best practices from Google and apply them to real cloud services. In this blog, we have compiled a list of FAQs surrounding the use of Google's Cloud Operations Sandbox. The Google SRE sandbox provides an easy way to get started with the core skills you need to become a SRE.

Read Post

Squadcast

Read more about Most frequently asked questions surrounding Google's Cloud Operations Sandbox

Hear From Product Automation & AIOps Lightning Talk

Jul 28, 2021 By PagerDuty In PagerDuty

Learn about what's new with PagerDuty Runbook Automation & AIOps from the Summit 2021 Launch. Our Product team shares how you can benefit from our latest updates and enhancements and enjoy demos that were recorded live from Summit 2021 featuring PagerDuty Runbook Actions, Customer Change Event Transformer, Change Correlation, and Outlier Incident.

View Video

PagerDuty

Read more about Hear From Product Automation & AIOps Lightning Talk

Hear From Product Incident Response Lightning Talk

Jul 28, 2021 By PagerDuty In PagerDuty

Learn about what's new with PagerDuty Incident Response from the Summit 2021 Launch. Our Product team shares how you can benefit from our latest updates and enhancements and enjoy demos that were recorded live from Summit 2021 featuring PagerDuty Incident Context in MS Teams, Slack Insights previews, Stakeholder Updates in ChatOps, Priority-based Business Service Subscription, Past Incidents on Mobile, Add Responder Notification Rules.

View Video

PagerDuty

Read more about Hear From Product Incident Response Lightning Talk

Three Key Takeaways from The State of Digital Operations Report 2021

Jul 28, 2021 By Hannah Culver In PagerDuty

2020 heralded a year of increased complexity and customer demands, which isn’t going away. In this new normal, organizations will still be tasked with keeping up this break-neck pace. So, what did digital operations look like in 2020 compared to 2019?

Read Post

PagerDuty

Read more about Three Key Takeaways from The State of Digital Operations Report 2021

7 Ways Your Status Page Can Save You

Jul 27, 2021 By Emily Blitstein In uptime

Having a Status Page is like having a dog. A dog alerts you to an incident; sudden noise, approaching neighbor, squirrel… A dog sounds the alarm on an intruder. A dog even alerts you to maintenance by barking at every handyman, garbage truck, and gardener within sight. As a dog fetches the same stick over and over, so does a status page fetch the attention of your users – especially during a live incident – with each browser refresh they wait for the status to change.

Read Post

uptime

Read more about 7 Ways Your Status Page Can Save You

Reliability Matters. Blameless is Growing with Series B $30M Funding

Jul 27, 2021 By Lyon Wong In Blameless

When Blameless started in 2018, the team set out on a mission to help all engineers achieve reliability with less toil and risk. Three years in, that mission has become more important than ever. What has changed is the rate of SRE adoption, now the fastest growing team and practice inside engineering. This represents a clear recognition of the many upsides that an SRE practice brings with its combination of continuous learning, velocity, and resilience.

Read Post

Blameless

Read more about Reliability Matters. Blameless is Growing with Series B $30M Funding

What's New: Introducing Next-Gen ChatOps With PagerDuty and Slack

Jul 27, 2021 By Hadijah Creary In PagerDuty

In this new world of digital everything, new application versions usually mean that you’re going to get bigger and better features, more capabilities, and an uplifted user experience, right? When I talk to customers, many can’t wait to upgrade the PagerDuty integrations that they depend on to test new features. If you’re a PagerDuty for Slack user, the next-generation version of our Slack integration will certainly be an exciting development.

Read Post

PagerDuty

Read more about What's New: Introducing Next-Gen ChatOps With PagerDuty and Slack

Getting over on-call anxiety

Jul 26, 2021 By Max Rozen In OnlineOrNot

You've joined a company, or worked there a little while, and you've just now realised that you'll have to do on-call. You feel like you don't know much about how everything fits together, how are you supposed to fix it at 2am when you get paged? So you're a little nervous. Understandable. Here are a few tips to help you become less nervous.

Read Post

OnlineOrNot

Read more about Getting over on-call anxiety

Experiencing Turbulence? Hypercare Helps Travel and Hospitality Firms Manage Sky-High Demand

Jul 26, 2021 By PagerDuty In PagerDuty

Many sectors suffered during the COVID-19 pandemic, but the travel and hospitality industry was struck particularly hard as the world went into lockdown and governments urged us to stay home. According to the International Air Transport Association, global air passenger demand in 2020 was down a record 65.9% from the previous year, and the tourism industry saw an estimated loss of 100.8 million jobs worldwide.

Read Post

PagerDuty

Read more about Experiencing Turbulence? Hypercare Helps Travel and Hospitality Firms Manage Sky-High Demand

How to Reduce Alert Fatigue: Preventing Noisy Alerts and Error Messages

Jul 26, 2021 By LogDNA In Mezmo

Monitoring solutions are a vital component in managing an application’s environment. From the systems layer all the way up to the end user’s connection to the app, you want to find out how the platform is performing. Indicators like CPU, memory, the number of connections, and overall health help teams make informed decisions for guaranteeing uptime. Teams monitor metrics (short-term information) and logs (long-term information) mainly from a reactive perspective.

Read Post

Mezmo

Read more about How to Reduce Alert Fatigue: Preventing Noisy Alerts and Error Messages

How Grafana helps organizations manage SLOs across multiple monitoring data sources

Jul 23, 2021 By Michelle Tan In Grafana

“SLO is a favorite word of SREs,” Grafana Labs Principal Software Engineer Björn “Beorn” Rabenstein said during his talk at KubeCon + CloudNativeCon NA 2019. “Of course, it’s also great for design decisions, to set the right goals, and to set alerting in the right way. It’s everything that is good.” So what happens when things go bad?

Read Post

Grafana

Read more about How Grafana helps organizations manage SLOs across multiple monitoring data sources

OnPage Corporation Celebrates 10 Years of Growth and Innovation

Jul 22, 2021 By OnPage In OnPage

Incident Alert Management Company Is Celebrating the 10th Anniversary of Its Founding in 2021.

Read Post

OnPage

Read more about OnPage Corporation Celebrates 10 Years of Growth and Innovation

Can I Make Comments in ServiceNow Incident Tickets with xMatters? - Ask Adam

Jul 22, 2021 By xMatters In xMatters

Don't settle for those default comments the integration makes in ServiceNow. Get creative and make your own comments in ServiceNow incidents.

View Video

xMatters

Read more about Can I Make Comments in ServiceNow Incident Tickets with xMatters? - Ask Adam

What's New: Updates to Event Intelligence, Integrations, and More!

Jul 22, 2021 By Vera Chan In PagerDuty

If you thought that the product announcements from PagerDuty’s largest event of the year, PagerDuty Summit 2021, was all we had in store for you, think again! We’re excited to announce that the July Release comes with a new set of updates and enhancements to the PagerDuty platform! You can learn about our latest capabilities via the Q1 PagerDuty Pulse or read below for the highlights.

Read Post

PagerDuty

Read more about What's New: Updates to Event Intelligence, Integrations, and More!

Operational Resilience: Grow Your Business Despite Increasing Threats

Jul 22, 2021 By Everbridge In Everbridge

While most businesses have an emergency preparedness plan in place, organizations have to wonder if their current plans are enough to defend against the growing list of major incidents and critical events affecting business. According to the 2020-21 Major Incident Management Annual report, an emergency preparedness plan isn’t enough to combat the growing threat landscape. To combat the rise in critical events, organizations must maximize operational resilience.

Read Post

Everbridge

Read more about Operational Resilience: Grow Your Business Despite Increasing Threats

Monitoring and Alerting 101: Monitoring Best Practices

Jul 22, 2021 By Ritika Bramhe In OnPage

An effective monitoring system is paramount to smooth business operations. As the need for a fast, responsive software experience gains momentum, monitoring becomes an indispensable driving force. Monitoring systems enable IT teams to proactively observe the health and responsiveness of critical environments and applications. Without monitoring, organizations must depend on customers or internal departments to receive notice of system issues.

Read Post

OnPage

Read more about Monitoring and Alerting 101: Monitoring Best Practices

PD Summit21: Transforming Infrastructure Teams Through Observability

Jul 22, 2021 By PagerDuty In PagerDuty

What is this ""observability"" thing that everyone is talking about? Observability allows you to navigate the dark unknowns with echolocation while others attempt to fly blindly without it. Are your dashboards all green, but you still have an issue brewing? Do you need instant feedback based on the Core Analysis loop? Are your engineers tired of waking up at 3 AM for the expected issues? Is there a lack of time for experimentation? Generate your own answers and create a meaningful course of action with observability.

View Video

PagerDuty

Read more about PD Summit21: Transforming Infrastructure Teams Through Observability

PD Summit21: The Netflix Reliability Story: A Brief History of How We Evolved Resilience to Failure

Jul 22, 2021 By PagerDuty In PagerDuty

In Netflix engineering, we’re driven by ensuring Netflix is there when you need it to be. We strive to provide a service that people love and can enjoy anytime, anywhere. An important foundation for bringing our customers joy is a strong focus on reliability that ensures Netflix will be available when they need it. In this talk, I’ll tell the story of how we've grown our reliability practices over time to meet the changing demands of microservices and distributed computing.

View Video

PagerDuty

Read more about PD Summit21: The Netflix Reliability Story: A Brief History of How We Evolved Resilience to Failure

PD Summit21: Adopting and Maturing to Service Ownership with PagerDuty and Rundeck

Jul 22, 2021 By PagerDuty In PagerDuty

Among the common goals of today's engineering and operations teams is to adopt a culture of service ownership: ""You build it, you own it."" As with many ancillary objectives to driving DevOps across an organization, this is easier said than done. Sometimes this is in small part due to the technology stack/architecture of a given company. But more often than not, this is because teams lack the human-to-technology mechanisms that allow for a culture of service ownership.

View Video

PagerDuty

Read more about PD Summit21: Adopting and Maturing to Service Ownership with PagerDuty and Rundeck

PD Summit21: Migrating to L1 Support to PagerDuty

Jul 22, 2021 By PagerDuty In PagerDuty

Learn how Maersk transitioned from operating with an L1 support team to using PagerDuty to drive an efficient operational support model. In this talk you will learn how implementing PagerDuty within the platform SRE team was part of a major re-org with the goal of driving a new operations model for a highly available (99.999%) platform that lead to outstanding results. At Maersk, we saw increased efficiencies and reduced TTR along with other significant advantages of using PagerDuty from both on-call and management perspectives.

View Video

PagerDuty

Read more about PD Summit21: Migrating to L1 Support to PagerDuty

PD Summit21: AWS and PagerDuty: Better Together -- A Digital Transformation Journey

Jul 22, 2021 By PagerDuty In PagerDuty

PagerDuty’s platform for real-time operations helps teams manage a complex transition from siloed and centralized approaches to multiple, distributed teams supporting a hybrid cloud infrastructure. To make this journey successful, one thing is clear: your people, technology, and operational processes need to be aligned in real time. That’s why we’re continuing to invest in our partnership with AWS. The integrations we’re bringing to market have always been centered on unlocking AWS’s unprecedented scale and agility for our joint customers.

View Video

PagerDuty

Read more about PD Summit21: AWS and PagerDuty: Better Together -- A Digital Transformation Journey

PD Summit21: Sumo Logic: Streamline Incident Management to Drive Application Modernization

Jul 22, 2021 By PagerDuty In PagerDuty

As application modernization drives an increase in complexity, managing the signals they generate becomes increasingly important in order to manage alert fatigue, mantain reliability, and accelerate innovation. Sumo Logic provides a unique, two-way integration with PagerDuty that collects incident messages from PagerDuty and populates pre-configured dashboards to provide a complete view of their alerts by displaying top incidents, escalations, teams and urgency, as well as providing the capability for users to send notifications to PagerDuty when critical conditions in their applications or infrastructure are detected in Sumo Logic.

View Video

PagerDuty

Read more about PD Summit21: Sumo Logic: Streamline Incident Management to Drive Application Modernization

PD Summit21: MUX: Video Observability: Operational Alerting for Responding to Issues In Real-time

Jul 22, 2021 By PagerDuty In PagerDuty

Streaming video accounts for the majority of internet traffic and your applications and infrastructure almost certainly include video. Mux Data allows you to easily monitor the real-time quality of experience delivered to your video viewers and integrating with PagerDuty you can automate a response and reduce the time to resolution when something goes wrong. We will cover the basics of video monitoring and how integrating with PagerDuty can ensure a great experience for viewers.

View Video

PagerDuty

Read more about PD Summit21: MUX: Video Observability: Operational Alerting for Responding to Issues In Real-time

PD Summit21: Responding to Chaos with Gremlin and PagerDuty

Jul 21, 2021 By PagerDuty In PagerDuty

Incident response is something you hope to never need, but when you do, you want it to go smoothly and seamlessly. Normally the knowledge of how to handle incidents within your company will be built up over time, getting better with each incident. While tools such as PagerDuty's Major Incidents Application can help you recover quickly, the process you follow is just as important. This documentation will allow you to learn from the start something which has taken us years to build up. Giving you a head start on how to deal with a major incident in a way which leads to the fastest possible incident recovery.

View Video

PagerDuty

Read more about PD Summit21: Responding to Chaos with Gremlin and PagerDuty

PD Summit21: Strengthen Stakeholder Communications with PagerDuty Business Response

Jul 21, 2021 By PagerDuty In PagerDuty

Executives interrupting incident calls? Distracted technical responders? Uninformed customer support team? No visibility into ongoing incidents in your organization at all?

View Video

PagerDuty

Read more about PD Summit21: Strengthen Stakeholder Communications with PagerDuty Business Response

When You Do DevSecOps, Don't Forget the SREs

Jul 21, 2021 By Quentin Rousseau In Rootly

It's time to break down the silos separating SREs from security engineers.

Read Post

Rootly

Read more about When You Do DevSecOps, Don't Forget the SREs

Evolving in CloudOps Maturity? Investing in People and Teams Pays Off

Jul 21, 2021 By Inga Weizman In PagerDuty

CloudOps is on the up. This is in part due to the rapid acceleration of the shift to cloud that was caused by the pandemic. The shift allowed companies to innovate faster, enjoy greater flexibility and scalability, and become more cost efficient. Many organizations who rapidly adopted cloud or increased their usage now realize that they need to better manage their cloud investments in order to fully embrace these benefits.

Read Post

PagerDuty

Read more about Evolving in CloudOps Maturity? Investing in People and Teams Pays Off

PagerDuty Summit Keynote Demo: A Day in the Life Of

Jul 20, 2021 By PagerDuty In PagerDuty

Enjoy this product demo that highlights a subset of new capabilities from PagerDuty's Summit 2021 Product Launch as well as some of PagerDuty's core capabilities within a retail industry IoT use case workflow.

View Video

PagerDuty

Read more about PagerDuty Summit Keynote Demo: A Day in the Life Of

HUG Relies on PagerDuty When Healthcare Incidents Arise

Jul 20, 2021 By PagerDuty In PagerDuty

The Geneva University Hospital (HUG) is one of the five university hospitals in Switzerland and one of the largest hospitals in Europe. Pierryves Fournier, SRE Team Lead at HUG, explains how PagerDuty and Rundeck help automate his team's incident response process, empowering the right action when seconds matter.

View Video

PagerDuty

Read more about HUG Relies on PagerDuty When Healthcare Incidents Arise

Maximize Digital Service Uptime with Google Cloud Operations and xMatters

Jul 20, 2021 By xMatters In xMatters

Looking to speed up app development without sacrificing security? Maybe streamline operations with release channels? Even manage infrastructure with Google SREs? With Google Cloud Operations and xMatters, you can do all those things and more.

View Video

xMatters

Read more about Maximize Digital Service Uptime with Google Cloud Operations and xMatters

Achieving Best in Enterprise Resilience is a Competitive Differentiator

Jul 20, 2021 By Everbridge In Everbridge

The Best in Enterprise Resilience™ Certification program affirms your organization’s readiness to manage critical events across a number of domains.

Read Post

Everbridge

Read more about Achieving Best in Enterprise Resilience is a Competitive Differentiator

What's New With Runbook Automation: Rundeck 3.4.1

Jul 19, 2021 By Vera Chan In PagerDuty

Technical teams are under more pressure than ever to move faster, protect revenue and availability, and push mean time to resolve (MTTR) ever lower. However, teams frequently find themselves encumbered by complex, repetitive, and manual tasks, rather than innovating. When urgent incidents arise, organizations often have to wait for specific developers or subject matter experts (SMEs) to deploy a fix.

Read Post

PagerDuty

Read more about What's New With Runbook Automation: Rundeck 3.4.1

Upcoming trends in DevOps and SRE

Jul 15, 2021 By Biju Chacko In Squadcast

DevOps and SRE are domains with rapid growth and frequent innovations. With this blog you can explore the latest trends in DevOps, SRE and stay ahead of the curve. The past decade has seen widespread adoption of DevOps methodologies in software development. Unsurprisingly, as the needs of users change, DevOps techniques have evolved as well. In this blog we will look at the trends that are most likely to have a significant impact in the coming years.

Read Post

Squadcast

Read more about Upcoming trends in DevOps and SRE

PagerDuty Customer Mashup Video

Jul 15, 2021 By PagerDuty In PagerDuty

Organizations need a solution that’s designed for today’s dynamic digital reality. Hear customers like Carrefour Bank, IG, The Trevor Project, Vodafone, and Zoom explain how PagerDuty empowers them in an always-on, real-time world.

View Video

PagerDuty

Read more about PagerDuty Customer Mashup Video

De-Siloing Incident Management: How to Make Reliability Engineering Everyone's Job

Jul 15, 2021 By JJ Tang In Rootly

4 best practices for breaking down silos and establishing a culture of shared responsibility toward reliability.

Read Post

Rootly

Read more about De-Siloing Incident Management: How to Make Reliability Engineering Everyone's Job

Coffee Break Webinar Series: "Intelligent Observability - What the Analysts Say"

Jul 15, 2021 By Taylor Urban In Moogsoft

We know commitment issues are the real deal, especially when it comes to significant and costly tech investments. Understanding how the market is performing and what’s up ahead is critical for investing in AIOps. Our crew is here to help you through the challenging decision-making days and offer up the best analyst guidance.

Read Post

Moogsoft

Read more about Coffee Break Webinar Series: "Intelligent Observability - What the Analysts Say"

Pragmatic Incident Response: 3 Lessons Learned from Failures

Jul 15, 2021 By Robert Ross In FireHydrant

In my past experience as an SRE I’ve learned some valuable lessons about how to respond and learn from incidents. Declare and run retros for the small incidents. It's less stressful, and action items become much more actionable. Decrease the time it takes to analyze an incident. You'll remember more, and will learn more from the incident. Alert on pain felt by people — not computers. The only reason we declare incidents at all is because of the people on the other side of them.

Read Post

FireHydrant

Read more about Pragmatic Incident Response: 3 Lessons Learned from Failures

BigPanda Event Enrichment Engine

Jul 14, 2021 By BigPanda In BigPanda

Success of AIOps tools, relies heavily on the quality of data fed to their AI/ML algorithms. BigPanda’s best-in-class Event Enrichment Engine offers cross-domain enrichment capabilities at scale to assure AIOps success.

View Video

BigPanda

Read more about BigPanda Event Enrichment Engine

Enabling Faster Incident Response and Mitigating Security Risks in Financial Services

Jul 14, 2021 By Joe Pusateri In PagerDuty

Software is eating the world. Digital Transformation is top of mind for companies looking to meet ever-growing consumer demands and digitize manual processes. This isn’t unique to the technology industry. Ecommerce, finance, healthcare, and other industries are all moving in this direction.

Read Post

PagerDuty

Read more about Enabling Faster Incident Response and Mitigating Security Risks in Financial Services

BigPanda's Event Enrichment Engine: The secret ingredient for AIOps

Jul 14, 2021 By Mohan Kompella and Darren Fox In BigPanda

James Beard, the pioneer of television cooking shows, once asked, “Where would we be without salt?”. Salt is often underrated, even though it is the ingredient that has the greatest impact on food and flavor in the modern world. It has its own taste, but also balances and enhances the flavor of other ingredients. Salt boosts sweetness and blocks bitterness, it has scientifically proven capabilities to intensify flavor compounds that are too subtle to detect (i.e.

Read Post

BigPanda

Read more about BigPanda's Event Enrichment Engine: The secret ingredient for AIOps

Monthly Moo Update | July 2021

Jul 14, 2021 By Adam Frank In Moogsoft

We hope June was as good to you as it was to us. Our latest updates, available now, will keep you relaxing poolside this summer knowing that your monitoring, event correlation, and incident workflows are all connected and automated through the cloud. If you’re not relaxing with a little cloud coverage keeping you cool, then come check out Moogsoft to see how you can keep your services available and your customers happy, so you can get to relax with a little more time in your day.

Read Post

Moogsoft

Read more about Monthly Moo Update | July 2021

What is a Blameless Postmortem?

Jul 13, 2021 By Noor-ul-Anam Ruqayya In Blameless

Do blameless retrospectives (or postmortems) help your team? We will explain what they are, if they really work, and how to do them right. A blameless postmortem (or retrospective) is a post-incident document that helps teams figure out why an incident happened, and brainstorm how to improve the process to prevent similar incidents from happening again. In most engineering organizations, everyone agrees that in complex systems, failure is inevitable.

Read Post

Blameless

Read more about What is a Blameless Postmortem?

How Linaro Reduced Triage & Call-Out Time with Flow Designer - xMatters Demo

Jul 13, 2021 By xMatters In xMatters

A test server fails and your customers are relying on it, how long does it take your team to get it back up and running? Does that answer differ depending on the hour of the day, or maybe the day of the week? It doesn’t have to. Join Philip Colmer, Director of Information Services at Linaro, Laura Meadows, VP EMEA at xMatters, and Stephen Walters, Solutions Architect at xMatters, as they discuss the innovative ways Linaro has utilized Flow Designer to reduce triage and call-out time!

View Video

xMatters

Incident Management

Read more about How Linaro Reduced Triage & Call-Out Time with Flow Designer - xMatters Demo

The 9 most common MSSP security services

Jul 12, 2021 By Eyal Katz In Exigence

When considering the fact that 2020 was a record breaker in the number of cyberattacks that occurred and the resulting cost to organizations that was incurred, it is clear that the state of cybersecurity readiness is not very encouraging, to say the least.

Read Post

Exigence

Read more about The 9 most common MSSP security services

Happy 10th birthday, OnPage!

Jul 9, 2021 By OnPage In OnPage

View Video

OnPage

Read more about Happy 10th birthday, OnPage!

Microsoft Azure & xMatters Flow Designer Integration - Product Integrations Made Easy

Jul 9, 2021 By xMatters In xMatters

Join Christine Astle, Product Manager at xMatters, as she discusses how easy it can be to integrate Microsoft Azure and xMatters Flow Designer. With a little more than a few clicks, Azure can be integrated into any workflow you build to simplify processes.

View Video

xMatters

Read more about Microsoft Azure & xMatters Flow Designer Integration - Product Integrations Made Easy

Using CC&C Platforms to Transform Metrics Into Valuable Insights

Jul 9, 2021 By Christopher Gonzalez In OnPage

Healthcare institutions are increasingly implementing clinical communication and collaboration (CC&C) platforms to improve the productivity of care teams. Automated CC&C platforms perfect care orchestration plans to ensure providers have the means to satisfy the ever-changing needs of patients. Key features of CC&C platforms include real-time, secure mobile messaging and alerting; digital, intelligent on-call schedules; time-stamped message statuses; and automated alert escalations.

Read Post

OnPage

Read more about Using CC&C Platforms to Transform Metrics Into Valuable Insights

Previstar Vaccine Inventory Management

Jul 8, 2021 By Everbridge In Everbridge

Previstar Vaccine Inventory Management presented by Himadri Banerjee, CTO

View Video

Everbridge

Incident Management

Read more about Previstar Vaccine Inventory Management

Error Budgets That Work for You. Plus Support for New Relic Metrics and NR Query Language

Jul 8, 2021 By Blameless Community In Blameless

Error Budgets That Work for You. Plus Support for New Relic Metrics and NR Query Language Did you know that error budget policy is the key to making SLOs actionable? In fact, Twitter’s engineering team did not successfully adopt SLOs until they introduced error budgets. SLOs enable teams to quantify customer happiness, and error budgets enable teams to make data-backed tradeoffs between reliability and feature velocity. We believe that teams optimizing for reliability must adopt both.

Read Post

Blameless

Read more about Error Budgets That Work for You. Plus Support for New Relic Metrics and NR Query Language

Rootly Announces $3.2 Million in Seed Funding from XYZ Venture Capital, 8VC, & Y Combinator

Jul 8, 2021 By Quentin Rousseau In Rootly

Rootly is on a mission to create a world where maintaining reliability is frictionless, delightful, and accessible to anyone. Making resolving and learning from incidents every organizations superpower.

Read Post

Rootly

Read more about Rootly Announces $3.2 Million in Seed Funding from XYZ Venture Capital, 8VC, & Y Combinator

PagerDuty Summit 2021: Jennifer Tejada Opening Keynote

Jul 8, 2021 By PagerDuty In PagerDuty

To succeed in a world of digital first customer experiences, operations must also be digital first. Join PagerDuty CEO Jennifer Tejada for her keynote at PagerDuty Summit 2021.

View Video

PagerDuty

Read more about PagerDuty Summit 2021: Jennifer Tejada Opening Keynote

Cherwell Monitoring in Production

Jul 7, 2021 By Nathan Foreman In Cookdown

I have been working on a couple of monitoring ideas for Cherwell. I didn’t see anything with a quick online search, and I enjoy authoring MPs to monitor applications, it is the closest I’ll get to 007. I’ve hit a major hurdle and I need to ask for a hand from the community. We have a lab environment that’s worked great while developing the Cherwell integration for Connection Center, however, it is not a good simulation for an actual deployment.

Read Post

Cookdown

Read more about Cherwell Monitoring in Production

July 2021 Update: Users can be members of multiple teams

Jul 7, 2021 By René In SIGNL4

The time has come! Users in SIGNL4 can now be a member of multiple teams. This allows for staff to be on duty in multiple groups or departments in parallel and to receive related alert notifications for incidents that occur in the different teams. In addition, you can now also send Signls to multiple teams. All details are available in this article.

Read Post

SIGNL4

Read more about July 2021 Update: Users can be members of multiple teams

The Incident Review: 4 Incidents in Outer Space

Jul 6, 2021 By JJ Tang In Rootly

From network problems to computer failures, a variety of incidents can disrupt operations for systems in outer space.

Read Post

Rootly

Read more about The Incident Review: 4 Incidents in Outer Space

Can MS Teams be Connected to xMatters to Post Messages? - Ask Adam

Jul 6, 2021 By xMatters In xMatters

Using MS Teams Connectors to post messages from xMatters is a simple way to keep people updated about what's going on in your process. You can configure the Connectors in Teams yourself without the help of your Teams admin when you know where to look! When I setup the Connector I used this image to represent xMatters. I'd love to hear any ideas you have for more videos, what you're working on, or anything you've done but would've really liked to have some help with in the comments below!

View Video

xMatters

Read more about Can MS Teams be Connected to xMatters to Post Messages? - Ask Adam

Duty Schedule Import from Third-Party Systems

Jul 5, 2021 By Ronald In SIGNL4

SIGNL4 offers powerful duty scheduling for routing alerts to the right people at the right time. In some cases, customers use other tools as leading system for duty scheduling, e.g. SAP, Excel, etc. Here we describe how to import duty schedules from .csv files. If you use other tools or other formats you can first export your scheduled into a .csv file and proceed from there.

Read Post

SIGNL4

Read more about Duty Schedule Import from Third-Party Systems

Automatic BMC Incident creation with Enterprise Alert

Jul 2, 2021 By Derdack In Derdack

How to use Enterprise Alert to automatically create incidents inside of BMC.

View Video

Derdack

Read more about Automatic BMC Incident creation with Enterprise Alert

Enterprise Alert's Automation Engine: Creating BMC Incidents

Jul 2, 2021 By Derdack In Derdack

Recently we have received a lot of requests for Enterprise Alert to not only alert on critical situations but to also take a proactive approach to initiate, record and track those situations through ITSM tools such as ServiceNow and BMC Remedy. This post will center around what happens when critical systems fail and tickets are not being created in BMC due to a break in the workflow.

Read Post

Derdack

Read more about Enterprise Alert's Automation Engine: Creating BMC Incidents

Elephant in the Blameless War Room: Accountability

Jul 1, 2021 By Christina Tan In Blameless

We’ve always advocated that every company can benefit from a blameless culture . Fostering a blameless culture can profoundly boost your organization in powerful ways, from employee retention to developer velocity and innovation. However, there’s an elephant in the room when we talk about blamelessness with executives: accountability. When things go wrong, people still need to get fired, right?

Read Post

Blameless

Read more about Elephant in the Blameless War Room: Accountability

Investigating the Scene of an Incident: Using a Time-Traveling Topology to Create Escalation Graphs

Jul 1, 2021 By Lodewijk Bogaards In StackState

Yes, time travel is possible...through data. My ability to time travel began when I started coding at age 10. Back then, all of my code ran on my own little computer. Like many ten-year-olds, I coded to create and play games. I also coded cool graphics to accompany music to impress my friends and utilities for copying. I launched my first commercial website in 1996 and made 25 guilders, which was good money for a 15-year old. Life was so easy.

Read Post

StackState

Read more about Investigating the Scene of an Incident: Using a Time-Traveling Topology to Create Escalation Graphs

Chapter Eight: In Which James Embarks on a Service Desk Migration to Improve Incident Management with AIOps

Jul 1, 2021 By Helen Beal In Moogsoft

It’s been a month since Dinesh and I humbly high-fived leaving the meeting with Charlie and Lucia and they gave us the green light to roll Moogsoft out across the whole of C&Js and I’m feeling a little weary. Change is hard. I’ve also made it harder on myself by persuading Charlie we should also migrate our service desk solution.

Read Post

Moogsoft

Read more about Chapter Eight: In Which James Embarks on a Service Desk Migration to Improve Incident Management with AIOps

Operations | Monitoring | ITSM | DevOps | Cloud

July 2021