May 2022

DevOps Team Structure | Roles & Responsibilities

May 31, 2022 By Noor-ul-Anam Ruqayya In Blameless

We explain how a DevOps team is structured, the roles and responsibilities within the team, and the balance between an individual contributor and the needs of the team.

Read Post

Blameless

Read more about DevOps Team Structure | Roles & Responsibilities

xMatters Notification Override Feature

May 31, 2022 By xMatters In xMatters

Now you can sleep easy knowing xMatters notification override will let you know when a critical alert happens, regardless of your device status. Discover more about how xMatters can help ensure applications are always working, automate workflows, and deliver remarkable products at scale with the xMatters service reliability platform.

View Video

xMatters

Incident Management

Read more about xMatters Notification Override Feature

StatusCast expands product offering with Incident Management for IT Platform

May 31, 2022 By StatusCast In StatusCast

May 31, 2022 – Columbia, MD – StatusCast today announced the release of its IT Incident Management service, expanding its flagship offering from best-of-breed Status Page services to include the full incident management life-cycle. The new offering goes beyond standard status updates, allowing IT teams to respond faster and with more effectiveness when systems fail or go offline.

Read Post

StatusCast

Read more about StatusCast expands product offering with Incident Management for IT Platform

What's New: Updates to Incident Response, AIOps, Pagerduty Process Automation, and More!

May 31, 2022 By Vera Chan In PagerDuty

Summit’s right around the corner (have you registered yet?) but the shipping doesn’t stop! We’re excited to announce a new set of updates and enhancements to PagerDuty’s Digital Operations Platform. Recent updates from the product team include On-Call Management, Incident Response, Process Automation, and Integrations, to PagerDuty Community & Advocacy Events. New capabilities enable users and customers to resolve incidents faster, do the following, and more.

Read Post

PagerDuty

Read more about What's New: Updates to Incident Response, AIOps, Pagerduty Process Automation, and More!

Getting AWS CloudTrail alerts via SNS Endpoint

May 31, 2022 By Vishal Padghan In Squadcast

Logging and auditing have been an essential part of troubleshooting application and infrastructure performance. You can instantly spot areas of risk to ensure quick correction and prevention of issues. In this blog, we will explore the AWS CloudTrail service and discuss how integrating it with Squadcast can help you route alerts to the right users for quick and efficient incident response. Let's get started.

Read Post

Squadcast

Read more about Getting AWS CloudTrail alerts via SNS Endpoint

Simplifying SLO and Error Budget tracking for SRE teams

May 28, 2022 By Vishal Padghan In Squadcast

Service level objectives (SLOs), and the subsequent service level indicators (SLIs) are the foundation to establishing a strong SRE culture and how they promote accountability, trust and timely innovation. We are on a mission to simplify SLO and Error Budget tracking and with that aim in mind, we have added the SLO Tracker feature to the Squadcast platform. SLO Tracker seeks to provide a simple and effective way to keep track of your error budget burn rate without the hassle of configuring and aggregating multiple data sources.

Read Post

Squadcast

Read more about Simplifying SLO and Error Budget tracking for SRE teams

5 Tips If You're the 1st SRE Hire by Instacart's First SRE

May 27, 2022 By Quentin Rousseau In Rootly

Site Reliability Engineers (SREs) have a considerable set of tasks to juggle no matter where they work or how long their company has had an SRE practice. But if you’re the very first SRE to join an organization – as many SREs are these days, given that the SRE trend is trickling down into smaller and smaller companies – you face a special group of challenges. You may find it difficult to get buy-in for SRE from other technical teams.

Read Post

Rootly

Read more about 5 Tips If You're the 1st SRE Hire by Instacart's First SRE

Simulated Incident Call Recording

May 26, 2022 By PagerDuty In PagerDuty

This is a simulated incident call recording based on a real PagerDuty incident from Jan 2017. The purpose of this simulation is to show how Incident Command System principles are applied to technical product outages.

View Video

PagerDuty

Incident Management

Read more about Simulated Incident Call Recording

Introducing Incident Types

May 26, 2022 By Martha Lambert In Incident.io

We believe incident.io should be used across an organisation, from SRE teams to Customer Success and People Ops. Until now, the way you set up your incident response flows has relied on having one set of roles and fields for every incident, meaning you have to choose between having lots of irrelevant fields to cover every use-case, or not getting the full incident.io experience on some incidents. That’s changing today with incident types, conditional fields and roles!

Read Post

Incident.io

Read more about Introducing Incident Types

What Is DevOps Automation & What Are The Benefits?

May 26, 2022 By Myra Nizami In Blameless

Looking into DevOps automation? We explain how automation can improve your process, how to prioritize which tasks to automate, best practices, and how to avoid common mistakes. ‍

Read Post

Blameless

Read more about What Is DevOps Automation & What Are The Benefits?

Webinar: combating tool sprawl with AIOps

May 26, 2022 By Derrick Arakaki In BigPanda

Dexcom is more than a business. For its customers, the organization’s innovative continuous glucose monitoring platform provides them with a way to take control of their health and better manage their diabetes. Given the critical services Dexcom provides to its customers, their IT Operations teams have highly specific needs when it comes to the many tools and platforms, they rely on to keep their organization’s services up and running.

Read Post

BigPanda

Read more about Webinar: combating tool sprawl with AIOps

Everbridge Live: Don't Be Afraid of the Dark Web

May 25, 2022 By Everbridge In Everbridge

The dark web is often seen as a den of inequity and an underworld of illegal activity. However, this den can be a valuable source of information. Right now, more than 4 billion active internet users are online, creating posts, pictures, and videos. Billions of different points of view in real-time on the open web, social media, dark and deep web. Watch our short webinar where you will see just how easy it is to use Everbridge Signal to find valuable open source intelligence.

View Video

Everbridge

Incident Management

Read more about Everbridge Live: Don't Be Afraid of the Dark Web

We can't all be Shaq: why it's time for the SRE hero to pass the ball and how to get there

May 25, 2022 By Malcolm Preston In FireHydrant

At a going away party from a job I was leaving a few years back, my VP of engineering told a story I didn’t even remember but that I know subconsciously shaped how I viewed my role on that team: Toward the end of my very first day at the company, there was some internal system issue, and with pretty much zero context, I pulled out my laptop, figured out what was going on, and helped fix the issue.

Read Post

FireHydrant

Read more about We can't all be Shaq: why it's time for the SRE hero to pass the ball and how to get there

When incident response requires business response, who should you notify?

May 25, 2022 By Hannah Culver In PagerDuty

From a single on-call engineer hopping online to resolve a problem, to a massive cross-team effort that brings in even the most senior technical leadership (CTO, CISO, or CIO), incident response teams are lucky when they’re able to resolve issues before a customer is aware. But in the cases where there is customer impact, other stakeholders like sales and customer service need to be informed and updated as well.

Read Post

PagerDuty

Read more about When incident response requires business response, who should you notify?

PagerDuty Terraform Time: Write HCL in Go with hclwrite

May 25, 2022 By PagerDuty In PagerDuty

Scott McAllister, Developer Advocate, PagerDuty

View Video

PagerDuty

Read more about PagerDuty Terraform Time: Write HCL in Go with hclwrite

SRE: From Theory to Practice | What's difficult about incident command

May 24, 2022 By Emily Arnott In Blameless

A few weeks ago we released episode two of our ongoing webinar series, SRE: From Theory to Practice. In this series, we break down a challenge facing SREs through an open and honest discussion. Our topic this episode was “what’s difficult about incident command?” When things go wrong, who is in charge? And what does it feel like to do that role?

Read Post

Blameless

Read more about SRE: From Theory to Practice | What's difficult about incident command

Introducing Status Pages

May 24, 2022 By iLert In iLert

We are super excited to announce a major milestone in our company history. 10 years ago, iLert started with a simple mission: help companies to increase their uptime and deliver a seamless digital experience. Every feature in iLert is built to help you to respond to critical alerts faster and increase your uptime.

Read Post

iLert

Read more about Introducing Status Pages

List of Potential Incident Management Issues

May 24, 2022 By Roxana González In InvGate

Incident management is the process followed by the area of IT service management to respond to a service disruption, in order to restore it to normal as quickly as possible, minimizing the negative impact on the business. An incident is a single unplanned event that generates a service disruption, whereas a problem is a cause or potential cause of one or more incidents, as defined by ITIL incident management guidelines.

Read Post

InvGate

Read more about List of Potential Incident Management Issues

xMatters Surpasses the Competition

May 24, 2022 By Everbridge In Everbridge

We are excited to announce that xMatters, an Everbridge company, outperformed PagerDuty in a recent GigaOm benchmark report! The report took a deep dive into the features and functionality of the xMatters and PagerDuty platforms in a comprehensive head-to-head study based on tests conducted on both platforms.

Read Post

Everbridge

Read more about xMatters Surpasses the Competition

Tracking On-Call Health

May 24, 2022 By Fred Hebert In Honeycomb

If you have an on-call rotation, you want it to be a healthy one. But this is sort of hard to measure because it has very abstract qualities to it. For example, are you feeling burnt out? Does it feel like you’re supported properly? Is there a sense of impending doom? Do you think everything is under control? Is it clashing with your own private life? Do you feel adequately equipped to deal with the challenges you may be asked to meet? Is there enough room given to recover after incidents?

Read Post

Honeycomb

Read more about Tracking On-Call Health

4 Best Practices for Root Cause Analysis

May 24, 2022 By Special contributor In Scout

As failures are a common part of any system’s lifecycle - what would be the Root Cause Analysis for this type of problem? If you build and deploy a system, there are high chances that you'll have to deal with a failure in the near future. However, what matters is how you handle such failures. As an organization, you need to have pre-formulated strategies to handle failures as and when they occur.

Read Post

Scout

Read more about 4 Best Practices for Root Cause Analysis

Major Incident Process Is at the Heart of Effectiveness

May 23, 2022 By xMatters In xMatters

Read the new white paper on major incident management. Businesses need to be prepared for minor and major incidents to happen to their technologies, be it an integration disconnecting or an entire system being taken offline. Preparation ensure that not only can losses be minimized, but they can protect themselves and potentially their clients from risky impacts.

Read Post

xMatters

Read more about Major Incident Process Is at the Heart of Effectiveness

May 2022 Update 2 - New category features and improved zooming in the scheduler

May 23, 2022 By René In SIGNL4

Our second May update brings fragment search for category keywords (wildcards), content enrichments for Signls via categories, and an improved zoom level toggle in the duty scheduler. All the details are in this blog article.

Read Post

SIGNL4

Read more about May 2022 Update 2 - New category features and improved zooming in the scheduler

Making waves in IT Ops

May 23, 2022 By Mike Hurley In BigPanda

It feels a bit surreal stepping into the Regional Vice President of Sales position here at BigPanda just a few months after the company achieved Unicorn status. In more than 15 years of managing enterprise software sales, this is the first time I knew I was going to play a critical role in facilitating a company’s ascension to the top of their sector. Even in college, I knew this is what I wanted.

Read Post

BigPanda

Read more about Making waves in IT Ops

How StatusCast makes managing incidents smarter in Slack

May 20, 2022 By StatusCast In StatusCast

These days, more and more IT teams spend much of their workday in Slack. It’s essentially a second virtual home. For those employees who find Slack their main source of communication, it stands to reason that you need to access tools, bots, apps, and more – directly within the Slack environment. You shouldn’t have to leave your home to get your work done, and you shouldn’t have to leave Slack to communicate with and update your team and your clients.

Read Post

StatusCast

Read more about How StatusCast makes managing incidents smarter in Slack

Time Based Profiles in SIGNL4

May 20, 2022 By SIGNL4 In SIGNL4

A brief description of how to setup and use the new Time Based Profiles inside of SIGNL4 including the new Holiday scheduling and Import options.

View Video

SIGNL4

Read more about Time Based Profiles in SIGNL4

Severity vs. Priority | Understanding the Differences

May 19, 2022 By Myra Nizami In Blameless

Wondering about severity vs. priority? We explain severity and priority and discuss their differences and their impact on the incident management process.

Read Post

Blameless

Read more about Severity vs. Priority | Understanding the Differences

Now Available on AWS Marketplace: PagerDuty Runbook Automation and PagerDuty Process Automation On Prem

May 19, 2022 By Inga Weizman In PagerDuty

We are excited to announce that PagerDuty® Runbook Automation and PagerDuty® Process Automation On Prem are now available on the AWS Marketplace, the leading global cloud provider. With more than 200 different cloud services, AWS makes it simple and attractive to build and grow your cloud-native business and/or migrate your existing infrastructure to the cloud, so you can begin to take advantage of the unlimited scale, agility, and flexibility the cloud offers.

Read Post

PagerDuty

Read more about Now Available on AWS Marketplace: PagerDuty Runbook Automation and PagerDuty Process Automation On Prem

The not-so-obvious positive outcomes of great incident management

May 18, 2022 By Robert Ross In FireHydrant

Inflation is running rampant, the world stage is unpredictable, and what’s happening in the U.S. markets has been dubbed the “tech wreck.” A common theme I’m hearing come up in conversations across industries right now is value — we’re all looking to maximize every dollar spent, every hire made, every hour logged. For a lot of companies, this means looking at processes and tools with a critical eye for not only cost savings but also cost avoidance.

Read Post

FireHydrant

Read more about The not-so-obvious positive outcomes of great incident management

When Does a Problem Become an Incident?

May 18, 2022 By xMatters In xMatters

Incident management is a practice that seeks to resolve business-impacting events in the most efficient manner possible. But not every problem that arises requires an incident response, and it’s crucial that teams know the difference between a problem and an incident. Responding to problems may be part of daily routines, or small ad hoc projects that don’t require more than one resource or a significant time commitment.

Read Post

xMatters

Read more about When Does a Problem Become an Incident?

Is It Really An Incident?

May 18, 2022 By Kurt Andersen In Blameless

At first glance, people tend to think that incidents are cut-and-dried, relatively objective occurrences. But if you look closely, incidents are highly varied, often require unique handling, and often defy clear answers to something as seemingly simple as knowing when they even start.

Read Post

Blameless

Read more about Is It Really An Incident?

Monthly Moo - Special Edition | May 2022

May 18, 2022 By John Haley In Moogsoft

Welcome to a special Monthly Moo - Product Edition. We have so much to share that we needed to create a special edition of the Monthly Moo to cover all the latest features that are now available. And to see these in action, sign up for our webinar that's coming on Tuesday. More details below. With many new features recently rolled out we need to group them in logical order to cover each of the following categories.

Read Post

Moogsoft

Read more about Monthly Moo - Special Edition | May 2022

A Chat with Lex Neva of SRE Weekly

May 17, 2022 By Emily Arnott In Blameless

Since 2015, Lex Neva has been publishing SRE Weekly. If you’re interested enough in reading about SRE to have found this post, you’re probably familiar with it. If not, there’s a lot of great articles to catch up on! Lex selects around 10 entries from across the internet for each issue, focusing on everything from SRE best practices to the socio- side of systems to major outages in the news. ‍ I had always figured Lex must be among the most well-read people in SRE, and likely #1.

Read Post

Blameless

Read more about A Chat with Lex Neva of SRE Weekly

Announcing our new Webex Meetings integration

May 17, 2022 By Michelle Peot In FireHydrant

Previously, FireHydrant supported video collaboration tool integrations for Zoom and Google Meet. In response to customer asks, today we are pleased to introduce our new Cisco Webex Meetings integration for all paid plans. With the new integration, teams can automate Webex bridge creation as part of incident response.

Read Post

FireHydrant

Read more about Announcing our new Webex Meetings integration

AlertOps Partners With Cisco AppDynamics to Enhance Major Incident Resolution

May 17, 2022 By AlertOps In AlertOps

Chicago, IL – May 17, 2022 AlertOps, a major incident response management platform, announced today a new technology integration partnership with Cisco AppDynamics, the leading Application Performance Monitoring (APM) and full-stack, business-centric observability solution. This new relationship empowers AlertOps and AppDynamics, joint users, with intelligent alerting, escalation policies, workflows, and scheduling to rapidly remediate major incidents.

Read Post

AlertOps

Read more about AlertOps Partners With Cisco AppDynamics to Enhance Major Incident Resolution

On-Prem? Cloud? Hybrid? What is my best option?

May 16, 2022 By Derdack In Derdack

With the Cloud Bridge introduction, I started reaching out to our customer base to make sure people are aware of what this feature can do, Usually, I try to keep our customers up to date through blogs or the occasional webinar but with the Cloud Bridge, I went for a more personal approach. Reaching out to our customers individually, presented a unique opportunity to educate them on our Cloud Bridge and by extension what SIGNL4 can bring to the table.

Read Post

Derdack

Read more about On-Prem? Cloud? Hybrid? What is my best option?

How to empower your team to own incident response

May 16, 2022 By Martha Lambert In Incident.io

Responding to and managing incidents feels fairly straightforward when you’re in a small team. As your team grows, it becomes harder to figure out the ownership of your services, especially during critical times. In those moments, you need everyone to know exactly what their role is in order to recover fast. Moving to incident.io as the 7th engineer, from a scaleup of around 70 engineers, has given me a new perspective on what it means to own your code.

Read Post

Incident.io

Read more about How to empower your team to own incident response

New Feature: Adding more options to informational status updates

May 16, 2022 By StatusCast In StatusCast

Read Post

StatusCast

Read more about New Feature: Adding more options to informational status updates

Tip and Tricks with SIGNL4 and ConnectWise

May 16, 2022 By SIGNL4 In SIGNL4

Detailing some tips and tricks on using the SIGNL4 and ConnectWise connector.

View Video

SIGNL4

Read more about Tip and Tricks with SIGNL4 and ConnectWise

New Feature: Adding more options to informational status updates

May 13, 2022 By StatusCast In StatusCast

Not all status updates are published because of an incident or scheduled maintenance event. Sometimes, IT teams simply want to cast an informational status update without affecting the overall status. Now, with StatusCast’s newly released option, you can opt for informational updates to have no effect on your status.

View Video

StatusCast

Read more about New Feature: Adding more options to informational status updates

What SREs Can Learn from the Atlassian Nightmare Outage of 2022

May 13, 2022 By Weihan Li In Rootly

What happens when the tools and services you depend on to drive Site Reliability Engineering turn out to be susceptible to reliability failures of their own? That’s the question that teams at about 400 businesses have presumably had to ask themselves this month in the wake of a major outage in Atlassian Cloud.

Read Post

Rootly

Read more about What SREs Can Learn from the Atlassian Nightmare Outage of 2022

Alert Log - xMatters Support

May 13, 2022 By xMatters In xMatters

The xMatters alert log is a time-stamped list of all system messages that went out during an event. This can be used for auditing purposes or to help improve your incident resolution processes.

View Video

xMatters

Incident Management

Read more about Alert Log - xMatters Support

Sending Alerts - xMatters Support

May 13, 2022 By xMatters In xMatters

'Send Alert' is a sample message form used to show how xMatters can help automate your communication processes. You can use this form to send targeted notifications to users, define how xMatters handle their responses, and determine how those responses affect your notification flow. Follow us on social!

View Video

xMatters

Incident Management

Read more about Sending Alerts - xMatters Support

Who's on Call Report - xMatters Support

May 13, 2022 By xMatters In xMatters

The ‘Who’s on Call?’ report in xMatters gives you an at-a-glance view into the on-call status across the groups in your organization.

View Video

xMatters

Read more about Who's on Call Report - xMatters Support

Your xMatters Inbox - xMatters Support

May 13, 2022 By xMatters In xMatters

Your xMatters inbox is more than just a message center — it’s a centralized location for all your communications, past and present, and an alternate way to respond to notifications if a device is unavailable.

View Video

xMatters

Incident Management

Read more about Your xMatters Inbox - xMatters Support

Service Catalog - xMatters Support

May 13, 2022 By xMatters In xMatters

The Service Catalog lets you add and define your services to match your organization's infrastructure and architecture and then assign a group to take ownership of each service. This makes sure that when you identify the service at the root cause of an incident, there's no question about exactly who is responsible for that service.

View Video

xMatters

Incident Management

Read more about Service Catalog - xMatters Support

Whiskey and Wisdom: Justifying AIOps

May 12, 2022 By Derrick Arakaki In BigPanda

Whiskey and Wisdom is a monthly executive-only forum where IT Operations leaders can network independently and discuss high-level AI operations and IT Ops strategies with their industry peers. In our most recent session, the discussion was around justifying AIOps—proving the value the technology brings to the table.

Read Post

BigPanda

Read more about Whiskey and Wisdom: Justifying AIOps

Incident Commanders: where are they now?

May 12, 2022 By Stephanie Clegg In BigPanda

BigPanda gives the Incident Commander award to IT Ops superstars—people who go above and beyond in this high-pressure, critical line of work. In 2021, Ben Narramore, Director of Operations/Service Management at PlayStation was a recipient for his ability to handle high-impact global incidents with exemplary professionalism and skill. Let’s find out what he’s been up to…

Read Post

BigPanda

Read more about Incident Commanders: where are they now?

How to Make Your Incident Response Plan with Mattermost

May 11, 2022 By Andrew Zigler In Mattermost

For teams who deploy software to users around the world, every second counts when responding to outages and other incidents. It’s important that you have tools in your arsenal that are up to the challenge. Service monitoring, alerting, collaboration, and visibility are all essential components of a well-implemented incident response plan.

Read Post

Mattermost

Read more about How to Make Your Incident Response Plan with Mattermost

IT outages are a fact of life - it's how you handle them

May 11, 2022 By StatusCast In StatusCast

In the IT world, outages and service disruption are a fact of life. Stuff hits the fan… Stuff happens! And it can happen to any service provider – even the most well designed and managed SaaS applications and platforms. One of the reasons why stuff happens is failing to adhere to best practices. To minimize the potential for problems, here we run over some of the key points from the cloud platform management best practice playbook.

Read Post

StatusCast

Read more about IT outages are a fact of life - it's how you handle them

Improve Customer Support with a Digital Operations Platform

May 11, 2022 By Everbridge In Everbridge

One of the biggest roadblocks to providing consistently excellent customer experience is a breakdown of communication between customer support and technical teams. Too often support teams are left out of the loop when it comes to detailed incident information. When customers experience an incident and call support, they expect status updates immediately. But without access to the right information, representatives are left guessing at best and providing incorrect answers at worst.

Read Post

Everbridge

Read more about Improve Customer Support with a Digital Operations Platform

Holiday Import from iCal Files

May 10, 2022 By Ronald In SIGNL4

SIGNL4 offers powerful duty scheduling and time-based overrides for routing alerts to the right people at the right time. With time-based overrides for example, you can apply different alerting workflows during business hours, weekends, holidays, etc. Holidays in general can bring other requirements for signaling and must also be considered separately when planning shifts. You can add and edit holidays manually in SIGNL4 or you can import them from iCal files.

Read Post

SIGNL4

Read more about Holiday Import from iCal Files

Understanding Service Level Objectives

May 10, 2022 By Robert Ross In FireHydrant

True reliability takes into account all of the services that exist in your software environment — which is why it can get so complicated. An ecommerce site, for example, might have services that update current inventory in near real time, process payments in the shopping cart, trigger email receipts to send, kick off fulfillment orders, etc. And if one of these services isn’t operating at its best, that can mean money — and in some cases, customers — lost for the company.

Read Post

FireHydrant

Read more about Understanding Service Level Objectives

DevOps Pipeline | Best Practices, Tips, & Techniques

May 10, 2022 By Noor-ul-Anam Ruqayya In Blameless

Looking into DevOps pipelines? We explain what a DevOps pipeline is, how to build one, and the best practices for building one for your team.

Read Post

Blameless

Read more about DevOps Pipeline | Best Practices, Tips, & Techniques

How Sumo SREs manage and monitor SLOs as Code with OpenSLO

May 10, 2022 By Drew Horn In Sumo Logic

At Nobl9’s annual SLOconf—the first conference dedicated to helping SREs quantify the reliability of their applications through service level objectives (SLOs)—Sumo Logic shared our contribution of slogen to the OpenSLO community, as well as our commitment to OpenSLO as an emerging standard for expressing SLOs as Code. slogen is an open source, SLO-as-code CLI tool based on the OpenSLO specification.

Read Post

Sumo Logic

Read more about How Sumo SREs manage and monitor SLOs as Code with OpenSLO

5 Greatest Challenges of Effective Incident Management and Tips and Tools on Overcoming Them

May 8, 2022 By AlertOps In AlertOps

Planning for potential security incidents has become a crucial element in every organization’s business strategy in today’s complex landscape of data theft, security breaches, and cybercrime. Surveys revealed 41% of business investors and analysts are becoming increasingly worried about cyber threats. One way for organizations to achieve cybersecurity readiness and instill confidence among stakeholders is to build a robust security incident management plan.

Read Post

AlertOps

Read more about 5 Greatest Challenges of Effective Incident Management and Tips and Tools on Overcoming Them

How to: Reliability Insights Overview in Blameless

May 6, 2022 By Blameless In Blameless

In this video, our Solutions Engineer walks you through the Reliability Insights view in Blameless. Discover how to create custom data dashboards. You might start with MTTX metrics, but what other metrics are reliability teams following closely? We'll show you how to get those set up in Blameless.

View Video

Blameless

Read more about How to: Reliability Insights Overview in Blameless

More Tools + More People = Increased Complexity

May 5, 2022 By Moogsoft Team In Moogsoft

Consider what happens if digital apps or services go down. Companies lose revenue, decrease productivity, compromise customer loyalty and the list of repercussions goes on, depending on the business. Indeed, modern business continuity is contingent on a well-functioning suite of consumer and commercial apps and services.

Read Post

Moogsoft

Read more about More Tools + More People = Increased Complexity

Micro Lesson: Troubleshoot an Incident Using Root Cause Explorer

May 4, 2022 By Sumo Logic In Sumo Logic

The video uses a scenario to demonstrate how to use Root Cause Explorer to analyse and troubleshoot an incident faster. The video shows how Root Cause Explorer helps you dig deeper into the relevant logs and traces in order to isolate the root cause using various dashboards.

View Video

Sumo Logic

Read more about Micro Lesson: Troubleshoot an Incident Using Root Cause Explorer

Micro Lesson: Understanding Root Cause Explorer

May 4, 2022 By Sumo Logic In Sumo Logic

This video explains the main features of Root Cause Explorer in great detail.

View Video

Sumo Logic

Read more about Micro Lesson: Understanding Root Cause Explorer

The wait is over.... Zenduty Web v2.0 is here!

May 4, 2022 By Zenduty In Zenduty

"New is always better!" While we can't say this about a scotch that hasn't been aged, we can definitely say it about Zenduty's gorgeous new redesign! Zenduty Web V2.0 is here! What's new?

View Video

Zenduty

Read more about The wait is over.... Zenduty Web v2.0 is here!

Are your SLOs realistic? How to analyze your risks like an SRE

May 4, 2022 By Ayelet Sachto In Google Operations

Setting up Service Level Objectives (SLOs) is one of the foundational tasks of Site Reliability Engineering (SRE) practices, giving the SRE team a target against which to evaluate whether or not a service is running reliably enough. The inverse of your SLO is your error budget — how much unreliability you are willing to tolerate.

Read Post

Google Operations

Read more about Are your SLOs realistic? How to analyze your risks like an SRE

How to Achieve Measurable Reliability Results

May 4, 2022 By Emily Arnott In Blameless

Reliability is more important than ever. As users depend on services more and more, and competition in every sector grows, a great digital experience becomes the baseline for expectations, not the ceiling. It’s crucial to invest in making your software reliable enough to keep customers happy. ‍ But what does investing in reliability look like?

Read Post

Blameless

Read more about How to Achieve Measurable Reliability Results

Launched - Zenduty Web v2.0

May 4, 2022 By Menahi Shayan In Zenduty

We constantly update the platform to provide the best-in-class experience to our users. These updates are not something that we feel is right for the client; these updates are based on the user data, behavior, and requests that our users provide. We are always excited to bring new updates and share them with people but this one is special! We bring to you Zenduty Web v2.0!

Read Post

Zenduty

Read more about Launched - Zenduty Web v2.0

The Reverse Red Herring

May 4, 2022 By Geoff Townsend In Blameless

During an incident, time is fungible. At points it seems to go way too fast, and at times it seems like an eternity for a command to complete. More importantly, however, is how it feels to be in an incident. It’s a heightened state of being, where any and every piece of information could be “the one” that helps crack open what is really going on. Likewise, there is an inherent distrust of incoming information.

Read Post

Blameless

Read more about The Reverse Red Herring

How to implement a Blameless Postmortem (part two)

May 3, 2022 By Dave Harrison In Raygun

This is Part 2 of a two-part series on Blameless Postmortems. The previous article went into why blameless postmortems are so effective; this second part goes into detail on how to build your own postmortem process and kick it into overdrive. Read Part 1 here. So you've read our first installment and recognized the value of the blameless postmortem for efficiency, culture, and output. Now you're ready to get off the blame train and kickstart a blameless postmortem process of your own. Where to begin?

Read Post

Raygun

Read more about How to implement a Blameless Postmortem (part two)

May 2022 Update - Templates, scheduler enhancements, landline numbers, and more

May 3, 2022 By René In SIGNL4

Our May update brings Signl templates for manual alerting, improvements for duty scheduling and various enhancements in the web portal. Another new feature is the possibility to notify through calling landline numbers. All details can be found in this blog article.

Read Post

SIGNL4

Read more about May 2022 Update - Templates, scheduler enhancements, landline numbers, and more

SIEM: Introduction to SIEM and 4 Top SIEM Tools

May 3, 2022 By Ritika Bramhe In OnPage

Security Information and Event Management (SIEM) technology has become a fundamental part of identifying and guarding against cyber attacks. It is one of the essential technologies powering the modern security operations center (SOC). SIEM is an umbrella term that includes multiple technologies, including log management, security log aggregation, event management, event correlation, behavioral analytics, and security automation.

Read Post

OnPage

Read more about SIEM: Introduction to SIEM and 4 Top SIEM Tools

Derdack SIGNL4 Joins Microsoft Intelligent Security Association (MISA)

May 1, 2022 By stefanie In SIGNL4

Today, Derdack SIGNL4 (www.signl4.com), provider of critical alerting and anywhere incident response for SecOps teams, announced it has joined the Microsoft Intelligent Security Association (MISA), an ecosystem of independent software vendors and managed security service providers that have integrated their solutions to better defend against a world of increasing threats.

Read Post

SIGNL4

Read more about Derdack SIGNL4 Joins Microsoft Intelligent Security Association (MISA)

Operations | Monitoring | ITSM | DevOps | Cloud

May 2022