Monthly Archive

SRE vs. SWE: Similarities and Differences

Oct 29, 2021 By Quentin Rousseau In Rootly

SREs and SWEs complement each other, but they perform different tasks and focus on different priorities.

Read Post

Rootly

Read more about SRE vs. SWE: Similarities and Differences

Managing Your xMatters Subscriptions - xMatters Support

Oct 29, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he highlights how subscriptions in xMatters work, and how to get the most out of yours.

View Video

xMatters

Incident Management

Read more about Managing Your xMatters Subscriptions - xMatters Support

Panel: Handling Incident Response - Dash 2021 (Datadog, PagerDuty)

Oct 28, 2021 By Datadog In Datadog

When customer-impacting downtime happens, it’s crucial that responders are prepared and can resolve these issues as quickly as possible. Knowing the right tools to use, from wherever you are working from, will help to have a well-defined strategy in place to come together as a team, work the problem, and get to a solution quickly. In this roundtable discussion, PagerDuty and Datadog engineers chat about incident responses and how we use all the tools at our disposal to respond quickly and effectively.

View Video

Datadog

Read more about Panel: Handling Incident Response - Dash 2021 (Datadog, PagerDuty)

Is it a ghost or is it Flow Designer?

Oct 27, 2021 By xMatters In xMatters

Maybe it’s the time of year or the change in temperature, but sometimes using xMatters Flow Designer can seem a little… spooky? Maybe it’s the unlimited capability it offers, or maybe it’s that it can make changes for you without you being aware they’re taking place. But every once in a while, we’re not sure if we’ve just set up workflows too effectively, or that something a touch paranormal is happening with xMatters.

Read Post

xMatters

Read more about Is it a ghost or is it Flow Designer?

Viewing Your xMatters Schedule - xMatters Support

Oct 27, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he outlines how you can utilize the schedule tab in xMatters to view your on-call status, ensure your schedule is input correctly, and find coverage when you’re not available.

View Video

xMatters

Read more about Viewing Your xMatters Schedule - xMatters Support

Improve your on-call experience with Datadog mobile dashboard widgets

Oct 26, 2021 By Sacha Guyon In Datadog

Life happens—even when you’re on-call. You can’t take your laptop everywhere, but whether you’re on the train, at dinner, or at the gym, you can count on the Datadog mobile app for access to key data about the status and performance of your applications. Now, you can use Datadog mobile widgets to build an on-call mobile dashboard directly on your phone’s home screen, so it’s even easier to track the data you care about from anywhere.

Read Post

Datadog

Read more about Improve your on-call experience with Datadog mobile dashboard widgets

Customer Service Ops & PagerDuty Zendesk Integration v3 Full Case Ownership Use Case

Oct 26, 2021 By PagerDuty In PagerDuty

PagerDuty's Zendesk Integration enhances communication between engineering and support teams by providing visibility to high-impact incidents via the PagerDuty Status Dashboard that is integrated into the Zendesk interface. Automate workflows for a fast-paced support team and provide the right level of information so they can interact knowledgeably with their customers while also reducing time and effort.

View Video

PagerDuty

Read more about Customer Service Ops & PagerDuty Zendesk Integration v3 Full Case Ownership Use Case

PD, Salesforce Service Cloud, Slack: Proactive Case Escalation & Slack-First Intelligent Swarming

Oct 26, 2021 By PagerDuty In PagerDuty

Learn about and see how PagerDuty, Salesforce Service Cloud, and Slack empower collaboration across your organization to accelerate time to resolution. Proactively improve customer satisfaction in real time and break down silos to connect customer service teams with engineering teams to address incidents quickly when seconds matter. Enjoy greater control when resolving issues and anticipating customers' needs through an incident command console that gives customer service agents and stakeholders instant updates on critical, customer-impacting issues.

View Video

PagerDuty

Incident Management

Read more about PD, Salesforce Service Cloud, Slack: Proactive Case Escalation & Slack-First Intelligent Swarming

Five steps to better customer communication

Oct 26, 2021 By Chris Evans In Incident.io

When you’re deep into an incident and there’s alerts firing, decisions to be made, and people to escalate to, it’s easy for outward communication with your customers to fall off the priority list. In many regards this makes sense; it seems natural to put all of your focus and energy into minimising the impact and getting things back on track as soon as possible.

Read Post

Incident.io

Read more about Five steps to better customer communication

What's New: Extending our Datadog Capabilities With New PagerDuty Widgets

Oct 26, 2021 By Hadijah Creary In PagerDuty

In the last two years, we have seen the rise of remote and hybrid work, and with that, a proliferation of tools and apps needed to support critical communication and collaboration. Finding that app-life balance has become increasingly complex, so simplifying “how” we work is key for every organization.

Read Post

PagerDuty

Read more about What's New: Extending our Datadog Capabilities With New PagerDuty Widgets

Strategies to Reduce Hospital Readmission Rates

Oct 26, 2021 By Ritika Bramhe In OnPage

The Centers for Medicare & Medicaid Services (CMS) scrutinizes hospital readmission rates across the U.S. each year, and it levies financial penalties on organizations that overshoot acceptable hospital readmission rates. As healthcare systems across the country embark on a journey to introduce patient-centric models to their organizations, they must align their resources with ever-changing regulations for them to thrive.

Read Post

OnPage

Read more about Strategies to Reduce Hospital Readmission Rates

Now Available: Private Slack Channels

Oct 26, 2021 By Julia Tran In FireHydrant

Ever heard the saying “Too many cooks”? If you’ve responded to incidents, you’ll likely understand the parallels. There are cases when incident command on a public channel isn’t the best option: Whatever your reason, we’ve got you covered. Now available, users can spin up a private slack channel for an incident. Read more how to do this here.

Read Post

FireHydrant

Read more about Now Available: Private Slack Channels

Differences between Site Reliability Engineer Vs. Software Engineer Vs. Cloud Engineer Vs. DevOps Engineer

Oct 26, 2021 By Squadcast Community In Squadcast

The evolution of Software Engineering over the last decade has lead to the emergence of numerous job roles. So how different is a Software Engineer, DevOps Engineer, Site Reliability Engineer and a Cloud Engineer from each other? In this blog, we drill down and compare the differences between these roles and their functions.

Read Post

Squadcast

Read more about Differences between Site Reliability Engineer Vs. Software Engineer Vs. Cloud Engineer Vs. DevOps Engineer

Why ChatOps & Incident Management are the Perfect Pair

Oct 25, 2021 By Megan Lo In xMatters

ChatOps has become an integral part of software development and IT operations, as teams rely on automated notifications to take the place of manual alerts. In the past, if there was an alert, someone would need to manually find that notification. Then, they would have contact team members to notify them one by one so they could start working on a resolution. In this complex network of communications, it was easy to lose information, duplicate work, and simply waste time coordinating the team.

Read Post

xMatters

Read more about Why ChatOps & Incident Management are the Perfect Pair

Service Profile: Activity Tab Updates

Oct 25, 2021 By PagerDuty In PagerDuty

PagerDuty's new service profile enhancements allow you to better command and control incidents directly from the Service Profile. Now you can perform bulk actions on incidents like acknowledge or resolve, search by incident ID, add and view change integrations, browse resolved incidents, view related escalation policies from the service profile header, and more.

View Video

PagerDuty

Incident Management

Read more about Service Profile: Activity Tab Updates

Managing Your Devices - xMatters Support

Oct 25, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he showcases how to utilize the devices tab in xMatters, how to add devices and tips for setting them up properly.

View Video

xMatters

Incident Management

Read more about Managing Your Devices - xMatters Support

Next Generation Slack Migration Tool and Stakeholder Updates Demo

Oct 22, 2021 By PagerDuty In PagerDuty

Learn more about PagerDuty's Collaboration Applications that help you streamline incident remediation. Enjoy these demos of our latest updates to our PagerDuty Slack and Microsoft Teams Applications including the Webhook Migration Tool, Stakeholder Updates, and Resolution Notes.

View Video

PagerDuty

Read more about Next Generation Slack Migration Tool and Stakeholder Updates Demo

An Introduction to Incident Response Roles

Oct 22, 2021 By JJ Tang In Rootly

Learn about the key roles within an incident response team, as well as optional incident roles you may not have thought about.

Read Post

Rootly

Read more about An Introduction to Incident Response Roles

Automated Diagnostics for Incident Response Demo

Oct 21, 2021 By PagerDuty In PagerDuty

Learn about how you can speed up resolution times with Automated Diagnostics. Automate away as much manual toil as possible to increase team productivity so teams can work more productively. Learn about how teams across the organization can embrace workflows that help to diagnose and remediate incidents.

View Video

PagerDuty

Read more about Automated Diagnostics for Incident Response Demo

Runbook Automation: Change Customer Portal Name Demo

Oct 21, 2021 By PagerDuty In PagerDuty

Learn how service owners can leverage PagerDuty Runbook Automation to streamline common asks from their internal business users. This demo introduces an example where an account manager requests for a customer portal name to be changed.

View Video

PagerDuty

Read more about Runbook Automation: Change Customer Portal Name Demo

Runbook Automation: Managing an Incident in PagerDuty

Oct 21, 2021 By PagerDuty In PagerDuty

Learn how PagerDuty can help an organization manage incidents and get ahead of customer issues that occur with their product website. #IncidentResponse #CustomerSupport

View Video

PagerDuty

Read more about Runbook Automation: Managing an Incident in PagerDuty

Runbook Automation: Rundeck ETL Pipeline Demo

Oct 21, 2021 By PagerDuty In PagerDuty

Learn how a service owner of an internal service can leverage PagerDuty Runbook Automation to provide end-users with secure access to self-service mechanisms to allow them to get answers to questions without needing to interrupt or assign tickets to service owners.

View Video

PagerDuty

Incident Management

Read more about Runbook Automation: Rundeck ETL Pipeline Demo

Runbook Automation: Rundeck Jira Retrieve Info Demo

Oct 21, 2021 By PagerDuty In PagerDuty

Learn how to PagerDuty Runbook Automation allows you to Trigger Rundeck jobs within JIRA to pull information from various sources to pass requested information back into JIRA.

View Video

PagerDuty

Read more about Runbook Automation: Rundeck Jira Retrieve Info Demo

Runbook Automation: Rundeck Service Ownership Demo

Oct 21, 2021 By PagerDuty In PagerDuty

Learn how PagerDuty Runbook Automation enables developers and service owners to equip other engineers, such as operations engineers or other developers with mechanisms to help them support their services. Service owners can allow other team members to help them in supporting their services via automated runbooks that enable others to apply short term fixes–reducing escalation to service owners.

View Video

PagerDuty

Read more about Runbook Automation: Rundeck Service Ownership Demo

Rundeck Ruleset Designer

Oct 21, 2021 By PagerDuty In PagerDuty

The UI-based Ruleset Designer helps users better visualize potential pathways according to step rules and conditions. Leverage the Rundeck GUI to design Rulesets, generate the rules automatically, and easily view how your jobs will progress based on rules and conditions set for each step.

View Video

PagerDuty

Read more about Rundeck Ruleset Designer

Slack Insights Previews

Oct 21, 2021 By PagerDuty In PagerDuty

Share Insights through Slack on an ad hoc basis to improve collaboration and shareability. Drop links to specific filtered views from insight tables and have summarized view unfurl in slack (in both DMs and channels) for immediate value.

View Video

PagerDuty

Read more about Slack Insights Previews

Your xMatters Profile - xMatters Support

Oct 21, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he outlines how you can navigate your xMatters profile to view and edit your personal information in xMatters.

View Video

xMatters

Incident Management

Read more about Your xMatters Profile - xMatters Support

When built-in alerting is not enough

Oct 21, 2021 By Derdack In Derdack

Many ITOM or ITSM tools come with built-in features for alerting and notifications and are able to send at least an email or text notification upon incidents to operations teams. But is this enough reliability to respond to and handle major and critical incidents? Recently, we have been surprised to see more and more monitoring tools listed as alerting tools on review platforms like G2.

Read Post

Derdack

Read more about When built-in alerting is not enough

OnPage Clinical Communication and Collaboration Platform

Oct 20, 2021 By OnPage In OnPage

Modern healthcare teams require a modern solution to streamline clinical communications and medical workflows. In life and death situations, it’s critical that physicians receive immediate alerts and messages to provide patient care promptly. OnPage is the industry’s most trusted clinical communications platform. OnPage is more reliable and secure than traditional pagers. The system enables care teams to easily communicate and achieve maximum patient satisfaction.

View Video

OnPage

Read more about OnPage Clinical Communication and Collaboration Platform

Postmortem Pitfalls

Oct 20, 2021 By Chris Evans In Incident.io

Last week, we spent some time talking to Gergely Orosz about our thoughts on what happens when an incident is over, and you're looking back on how things went. If you haven't read it already, grab a coffee, get comfortable, and read Gergely's full post Postmortem Best Practices here. But before you do that, here's some bonus material on some of our points.

Read Post

Incident.io

Read more about Postmortem Pitfalls

A developer's guide to programatically overcome fear of failure

Oct 20, 2021 By Mandeep Kaur In PagerDuty

People are more than happy to talk about their successes, but if you ask them about their failures, they can be much more hesitant to share. Failure is a subject that, interestingly enough, is entangled with the emotion of shame. Yet it’s integral to achieving anything novel, and the learnings that come from failure are unparalleled. So, let’s find ways to get more comfortable with failing, and figure out why people fear it.

Read Post

PagerDuty

Read more about A developer's guide to programatically overcome fear of failure

Incident Management Metrics That Matter - 2021

Oct 20, 2021 By AlertOps In AlertOps

What are the Key Incident Management metrics/ KPI ‘s? How important is it to track Your Team’s Performance? If you are not doing so already the time is right to get your finger on the pulse by better understanding and managing your organizations incident management key metrics. How a company manages IT Incidents matters and most importantly the process has the power to impact sales – recent studies indicate 52% of U.S.

Read Post

AlertOps

Read more about Incident Management Metrics That Matter - 2021

Uptime/SLA calculator: what is an SLA and how to calculate it?

Oct 19, 2021 By Sancho Lerena In Pandora FMS

A Service Level Agreement (SLA) is a document that details the expected level of service guaranteed by a vendor or product. This document generally sets out metrics such as uptime expectations and any payoffs if these levels are not met. For example, if a provider advertises an uptime of 99.9% and exceeds 43 minutes and 50 seconds of service downtime, technically the SLA has been breached and the customer may be entitled to some type of remuneration depending on the agreement.

Read Post

Pandora FMS

Read more about Uptime/SLA calculator: what is an SLA and how to calculate it?

Intelligent Alert Grouping: What It Is and How To Use It

Oct 18, 2021 By Quintessence Anx In PagerDuty

It’s 2 AM and you’re paged when you’re still awake – how well can you find what you need to fix the latest mistake? When the incident begins it might only be impacting a single service, but as time progresses, your brain boots, the coffee is poured, the docs are read, and all the while as the incident is escalating to other services and teams that you might not see the alerts for if they’re not in your scope of ownership.

Read Post

PagerDuty

Read more about Intelligent Alert Grouping: What It Is and How To Use It

PagerDuty Pulse Q1 + Q2 FY2022

Oct 15, 2021 By PagerDuty In PagerDuty

View Video

PagerDuty

Read more about PagerDuty Pulse Q1 + Q2 FY2022

Your xMatters Inbox - xMatters Support

Oct 14, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he showcases how to navigate your xMatters inbox, view messages past and present, and tips for getting the most out of its features and possibilities!

View Video

xMatters

Incident Management

Read more about Your xMatters Inbox - xMatters Support

What Operational Maturity Looks Like Today With PagerDuty's Kyle Duffy

Oct 14, 2021 By Hannah Culver In PagerDuty

Companies that underwent accelerated digital transformations during the past 18 months are looking to understand how they can improve their operational maturity to handle the increase in complexity. This is paramount to an organizations’ future success.

Read Post

PagerDuty

Read more about What Operational Maturity Looks Like Today With PagerDuty's Kyle Duffy

4 Pressures at Tech Companies xMatters Can Help Relieve

Oct 13, 2021 By Michael Geller In xMatters

Technology companies are at the forefront of innovation, changing the way consumers and the general public interact with their everyday lives. As the late Stan Lee so wisely stated, “with great power comes great responsibility,” and this heightened pressure often leaves little room for error when an issue arises—which happens more often than you’d think.

Read Post

xMatters

Read more about 4 Pressures at Tech Companies xMatters Can Help Relieve

OnPage for Clinical Communication and Collaboration

Oct 13, 2021 By OnPage In OnPage

View Video

OnPage

Read more about OnPage for Clinical Communication and Collaboration

Process binds technology and people in cloud maturity success

Oct 13, 2021 By Inga Weizman In PagerDuty

This is the final blog in our series focusing on CloudOps maturity, where we’ve been looking at the key findings from a recent IDC study, commissioned by PagerDuty. In our previous blogs, we discussed the people-based transformations and the technological changes that organizations must undergo to mature their CloudOps practices.

Read Post

PagerDuty

Read more about Process binds technology and people in cloud maturity success

AIOps - What It Is, Why It Matters, and Advice for Adopting It

Oct 12, 2021 By Rutuja Rajwade In xMatters

The link between DevOps and artificial intelligence for operations (AIOps) has only started to become clear within the last few years. Monitoring and alerting has evolved from a "black box approach," where you don't actually know what's happening, into observability, where you have access to data that provides everything you possibly need to know about your IT systems. How does AIOps come into play? AIOps is the practice of applying artificial intelligence, machine learning, and advanced analytics to automate and improve IT operations. Since it entered as a formal discipline with Gartner in 2016, IT teams have been trying to figure out how to employ it to make their lives easier.

Read Post

xMatters

Read more about AIOps - What It Is, Why It Matters, and Advice for Adopting It

PagerDuty Partner Twitch Stream with Gremlin

Oct 12, 2021 By PagerDuty In PagerDuty

Chaos Engineering can help you improve your incident response workflows. Don’t wait until an event happens to flex your response muscles, create real-world scenarios with Gremlin and practice your response with PagerDuty.

View Video

PagerDuty

Read more about PagerDuty Partner Twitch Stream with Gremlin

Incident Management Process- 6 Tips to Better Prepare Your IM Process for The Holiday Season.

Oct 12, 2021 By AlertOps In AlertOps

Holiday retail sales are likely to increase between 7% and 9% in 2021, according to Deloitte’s annual holiday retail forecast with holiday sales totaling $1.28 to $1.3 trillion during the November to January timeframe. Deloitte also forecasts that e-commerce sales will grow by 11-15%, year-over-year, during the 2021-2022 holiday season.

Read Post

AlertOps

Read more about Incident Management Process- 6 Tips to Better Prepare Your IM Process for The Holiday Season.

How Patient-Centered Care Improves Patient Outcomes

Oct 12, 2021 By Christopher Gonzalez In OnPage

The patient-centered care (PCC) model enhances the way providers interact with patients during the care delivery process. Clinicians that show compassion and empathy toward patients are more likely to achieve meaningful, positive doctor-patient relationships. Indeed, care teams that prioritize PCC have a proven approach to improving patient satisfaction and increasing patient retention.

Read Post

OnPage

Read more about How Patient-Centered Care Improves Patient Outcomes

Should you care about AIOps? Obviously.

Oct 12, 2021 By ScienceLogic In ScienceLogic

There's a lot of hype in the marketplace about AIOps right now, and there's a lot of people who've got some interesting ideas about what it should be. The most common idea that I hear is that it's essentially a layer of AI magic that sits across everything that you've got in your IT tooling today and then make sense of all of that for you and then we'll decrease the number of incidents you have and reduce your MTTR...

View Video

ScienceLogic

Read more about Should you care about AIOps? Obviously.

How Your ITSM Tool & PagerDuty Make a Dynamic Duo for Real-Time Work

Oct 11, 2021 By Hannah Culver In PagerDuty

There’s an incident. Your teams need to communicate with the development team that owns the service, but that team is too busy to stop and chat. Meanwhile, you in central IT have business leaders asking for updates, angry internal users calling the help desk, and customer service representatives asking for information. You have hundreds of tickets all pertaining to the incident in your ticketing system.

Read Post

PagerDuty

Read more about How Your ITSM Tool & PagerDuty Make a Dynamic Duo for Real-Time Work

xMatters AIOps Tools Buying Guide

Oct 9, 2021 By xMatters

Download the new AIOps Buyer's Guide to learn about the use cases and key capabilities of the right AIOps tool.

Get White Paper

xMatters

Read more about xMatters AIOps Tools Buying Guide

Getting the SRE Model Right

Oct 9, 2021 By xMatters

Read the new xMatters guide, Getting the SRE Model Right, to learn how the right approach can help you minimize incidents and limit their severity, free up valuable engineering resources, and lay the groundwork for incident response at scale.

Get Guide

xMatters

Read more about Getting the SRE Model Right

What SREs Can Learn from Facebook's Largest Outage

Oct 8, 2021 By JJ Tang In Rootly

Facebook’s October 2021 outage was the type of event that gives SREs nightmares: A series of critical business apps crashed in minutes and remained unavailable for hours, disrupting more than 3.5 billion users around the world and costing about 60 million dollars. As incidents go, this was a pretty big one.

Read Post

Rootly

Read more about What SREs Can Learn from Facebook's Largest Outage

PagerDuty Integration Spotlight: Honeycomb

Oct 7, 2021 By PagerDuty In PagerDuty

Honeycomb delivers observability for modern engineering and DevOps teams to observe, debug, and improve production systems efficiently. The PagerDuty + Honeycomb integration uses Honeycomb Triggers to notify on-call responders based on alerts sent from Honeycomb. This integration is maintained and supported by Honeycomb. Liz Fong-Jones from Honeycomb joined us live on Twitch to share more about how Honeycomb and PagerDuty can be used together to help your teams and to do some live investigation into Honeycomb’s own performance data.

View Video

PagerDuty

Read more about PagerDuty Integration Spotlight: Honeycomb

4 xMatters Use Cases That May Surprise You

Oct 6, 2021 By Megan Lo In xMatters

xMatters is part technology, part service reliability, and a little bit of magic. If you’ve spent time on the xMatters website, you’ll likely have seen a number of valuable use cases for the platform—it can alert SREs when there’s a website outage, it can accelerate product development for DevOps teams, it can manage on-call schedules and alerts for support teams.

Read Post

xMatters

Read more about 4 xMatters Use Cases That May Surprise You

The Cost of Increasing Incidents: How COVID-19 Affected MTTR, MTTA, and More

Oct 6, 2021 By Hannah Culver In PagerDuty

Digital transformation accelerated for many companies during the last 18 months. While it may have been on the agenda prior to COVID-19, teams were pushed to extreme speeds to digitize and meet the rising online demand. During this time, organizations learned important lessons that they’ll carry on with them into this new future. Leaders can take these learnings and use them to build better products, healthier and more efficient teams, and a happier customer base.

Read Post

PagerDuty

Read more about The Cost of Increasing Incidents: How COVID-19 Affected MTTR, MTTA, and More

5 AIOps Use-Cases: How AIOps Helps IT Teams

Oct 6, 2021 By Phil Tee In Moogsoft

In a world with everything digital, you need AIOps to help ensure uptime and break through the noise. Still not sold? Let's explore 5 ways SRE and DevOps teams are using AIOps to boost existing monitoring tools.

Read Post

Moogsoft

Read more about 5 AIOps Use-Cases: How AIOps Helps IT Teams

Monthly Moo Update | October 2021

Oct 6, 2021 By Adam Frank In Moogsoft

There’s a number of monitoring and observability solutions on the market today. It almost reminds me of the automobile market and the endless number of automobiles available. Sure, they all get you from point A to point B, in some way. But some automobiles do it faster, smoother, more efficiently, with guidance, more comfort, storage space, perhaps towing capability, and even autonomously. Moogsoft is the automobile you’ve been dreaming about in the monitoring and observability market.

Read Post

Moogsoft

Read more about Monthly Moo Update | October 2021

FireHydrant expands Reliability Platform with Service Catalog

Oct 6, 2021 By Julia Tran In FireHydrant

Today, we are happy to announce the launch of Service Catalog to help you better manage, query, and learn about the services that exist in your infrastructure. At FireHydrant, we envision a world where all software is reliable, and we’re on a mission to help every company that builds or operates software get closer to 100% reliability. Service Catalog helps you get closer to 100% reliability.

Read Post

FireHydrant

Read more about FireHydrant expands Reliability Platform with Service Catalog

Facebook, Instagram, and Whatsapp's Outage - Understanding MTTR

Oct 5, 2021 By Deepa Ramachandra In ObservIQ

Yesterday the most used social media platforms in the world were inaccessible for 6 hours straight. Later, in a press release, Facebook revealed that the outage was due to configuration changes in their routers. There is no doubt that Facebook has an intense incident response plan, yet a small blind spot resulted in a significant business interruption. So how do we avoid this? The truth is, outages and performance issues are bound to happen in any network.

Read Post

ObservIQ

Read more about Facebook, Instagram, and Whatsapp's Outage - Understanding MTTR

PagerDuty Integration Spotlight: HashiCorp Terraform

Oct 5, 2021 By PagerDuty In PagerDuty

Manage your PagerDuty account objects with Terraform! Reap all the benefits of infrastructure as code and give your teams the flexibility they need to manage their services in real time. As infrastructure stacks grow increasingly more complex and involve an ever-growing number of services and systems, teams have looked to abstract configuration to its own layer of code. This concept of configuring infrastructure as code is gaining traction throughout the industry for a variety of reasons.

View Video

PagerDuty

Read more about PagerDuty Integration Spotlight: HashiCorp Terraform

xMatters Communication Center Dashboard - xMatters Support

Oct 5, 2021 By xMatters In xMatters

Join Chris Patch, xMatters’ Senior eLearning Specialist, as he teaches you how to use xMatters dashboards to quickly resolve incidents, view your on-call information, and find critical data quickly.

View Video

xMatters

Read more about xMatters Communication Center Dashboard - xMatters Support

The Aftermath of the Facebook 6-Hour Outage

Oct 5, 2021 By Victoria Oiknine In Komodor

Less than 24 hours ago, the world came to a “social standstill” as Facebook, and its sister companies, WhatsApp and Instagram, became unavailable, leaving its 3.5 billion users in a flap. The outage, which lasted almost 6 hours, shut off access for users and businesses all over the world and caused ripple effects that we will likely continue to see in the immediate (and perhaps not-so-immediate) future.

Read Post

Komodor

Read more about The Aftermath of the Facebook 6-Hour Outage

PagerDuty's Engineering Management Handbook for Healthier Teams and Services

Oct 5, 2021 By Hannah Culver In PagerDuty

This July, we launched The State of Digital Operations, which sheds light on the volume of real-time work, its growth over time, and how that increasingly burdens technical teams.

Read Post

PagerDuty

Read more about PagerDuty's Engineering Management Handbook for Healthier Teams and Services

PagerDuty Integration Spotlight: InfluxData

Oct 5, 2021 By PagerDuty In PagerDuty

InfluxData is an Open Source Platform built for metrics and events — a platform that is purpose-built for time series data. The essential time series toolkit — dashboards, queries, tasks and agents all in one place. InfluxDB is even more programmable and performant with a common API across OSS, cloud and enterprise editions. Send events to PagerDuty to keep your teams informed. Check out InfluxData’s integration.

View Video

PagerDuty

Read more about PagerDuty Integration Spotlight: InfluxData

Signl4 and Dynatrace 2 way integration

Oct 5, 2021 By SIGNL4 In SIGNL4

Configuring a 2 way connection between Dynatrace and the Signl4 Alerting app, to improve resolution time on incoming incidents.

View Video

SIGNL4

Read more about Signl4 and Dynatrace 2 way integration

Moogsoft Anomaly Detection

Oct 5, 2021 By Moogsoft In Moogsoft

Pinpoint the root cause to reduce downtime with Moogsoft anomaly detection.

View Video

Moogsoft

Read more about Moogsoft Anomaly Detection

Evaluating Splunk On-Call Alternatives

Oct 4, 2021 By xMatters In xMatters

Splunk On-Call (Formerly VictorOps) is a popular incident response and on-call management platform that allows engineering and operations teams to collaborate with ease and resolve issues faster. As part of the Splunk Observability Suite, Splunk On-Call is combined with related products to achieve the goal of bringing monitoring, troubleshooting, and investigation, into a single, comprehensive view — simplifying the process from incident detection to resolution.

Read Post

xMatters

Read more about Evaluating Splunk On-Call Alternatives

PagerDuty Integration Spotlight: LogDNA

Oct 3, 2021 By PagerDuty In PagerDuty

LogDNA’s Cloud logging platform helps your DevOps teams find and fix production issues faster so your teams can get back to doing what they do best, building amazing products. Send incident alerts from LogDNA directly to PagerDuty. Check out the LogDNA integration with PagerDuty to get started.

View Video

PagerDuty

Read more about PagerDuty Integration Spotlight: LogDNA

Google's State of DevOps 2021 Report: What SREs Need to Know

Oct 1, 2021 By Quentin Rousseau In Rootly

The four key takeaways for SREs from Google’s State of DevOps 2021 report

Read Post

Rootly

Read more about Google's State of DevOps 2021 Report: What SREs Need to Know

How to monitor IoT devices with Fyipe?

Oct 1, 2021 By OneUptime In OneUptime

A demo on monitoring your IoT devices, creating incidents, and showcasing reliability on status page. About Fyipe: Fyipe is a complete Site Reliability Engineering (SRE) platform. It gives you a beautiful status page for your business, monitors your web apps, and alerts your team when downtime happens.

View Video

OneUptime

Read more about How to monitor IoT devices with Fyipe?

How Service Catalog Increases Productivity

Oct 1, 2021 By Max Tilka In FireHydrant

Productivity is defined by measuring the amount of output over a given time frame. However, this discounts the quality of output, which is crucial in moving toward a more complete definition of productivity. Relating to services, increases in productivity generally highlight the amount of feature releases over time. This leaves out the critical measurement of quality compared to quantity. This is where a Service Catalog can greatly enhance true productivity within an engineering organization.

Read Post

FireHydrant

Read more about How Service Catalog Increases Productivity

Learn where you rank and how it affects digital service resilience

Oct 1, 2021 By xMatters

We evaluated where enterprises are positioned in the Incident Management Spectrum and in their journey to digital service resilience and found that incident management needs its own transformation. In the report, you'll learn which approach to incident management is the best for meeting today's business imperatives.

Get Report

xMatters

Incident Management

Read more about Learn where you rank and how it affects digital service resilience

Digital Transformation Secrets: Balancing Innovation and Uptime

Oct 1, 2021 By xMatters

Providing a superior digital customer experience is a critical component of business success for technology and digital service providers. But an enjoyable, effective, and reliable customer experience demands new IT architectures and places new expectations on the way SREs, development teams, ITOps, executives, and other previously siloed groups work together. And at what costs? To understand, we asked over 300 DevOps, ITOps and business leaders for perspectives.

Get Report

xMatters

Read more about Digital Transformation Secrets: Balancing Innovation and Uptime

Operations | Monitoring | ITSM | DevOps | Cloud