May 2021

The 7 SRE Principles [And How to Put Them Into Practice]

May 31, 2021 By Emily Arnott In Blameless

Whether you're just adopting SRE or optimizing your current processes, we can help. We’ll explain the 7 key principles of SRE and how to put them into practice. So, what are the SRE principles? The fundamental SRE principles are: SRE is a method that operates through principles. Instead of prescribing specific solutions, it guides you with best practices. These SRE principles help organizations decide what's best for them. Once you understand the principles, you can apply them in many areas.

Read Post

Blameless

Read more about The 7 SRE Principles [And How to Put Them Into Practice]

Summit 2020 Product Keynote

May 29, 2021 By PagerDuty In PagerDuty

Check out what Product news we rolled out at PDSummit20 with PagerDuty's Jonathan Rende, Senior VP of Product.

View Video

PagerDuty

Read more about Summit 2020 Product Keynote

Polystream Stopped Implementing Technology and Implemented A Mindset - Customer Stories

May 28, 2021 By xMatters In xMatters

Polystream is changing the way we think about video game streaming, and with xMatters, they know that incidents won't keep them from achieving their goals. In this customer chat, join Tracey McGarrigan, Chief Marketing Officer at Polystream, Cheryl Razzell, VP of Engineering, and xMatters own Laura Meadows, VP EMEA Region, as they discuss Polystream's ongoing ambitions and how xMatters helps their growth. Plus, make sure you don't miss why Tracey would describe xMatters as Polystream's comfort blanket!

View Video

xMatters

Read more about Polystream Stopped Implementing Technology and Implemented A Mindset - Customer Stories

Deliver Real-Time Alerts From Facility Management Systems

May 28, 2021 By Ritika Bramhe In OnPage

Facility managers, including service technicians, are expected to operate their facilities safely to meet the expectations of customers. They focus on the smooth functioning and maintenance of many components that fall within the scope of their facility. Typical components include roads, pavements, HVAC and plumbing systems. As a facility manager, staying on top of these siloed and geographically dispersed systems can be challenging.

Read Post

OnPage

Read more about Deliver Real-Time Alerts From Facility Management Systems

Shorter Incidents & Fewer Escalations with Runbook Automation

May 28, 2021 By PagerDuty In PagerDuty

Damon Edwards, Co-founder and Chief Product Officer from Rundeck walks us through how your organization can have shorter incidents and fewer escalations in this Summit 20 presentation.

View Video

PagerDuty

Read more about Shorter Incidents & Fewer Escalations with Runbook Automation

Incident Management vs. Incident Response - What's the Difference?

May 28, 2021 By Quentin Rousseau In Rootly

What are the differences between incident management and incident response? The answer varies widely depending on whom you ask.

Read Post

Rootly

Read more about Incident Management vs. Incident Response - What's the Difference?

How, Not Why An Alternative to the Five Whys for Post Mortem Analysis

May 28, 2021 By PagerDuty In PagerDuty

Listen in on Robert Blumen, Lead DevOps Engineer from Salesforce's session from Summit 20 on How, Not Why: An Alternative to the Five Whys for Post-mortem Analysis.

View Video

PagerDuty

Read more about How, Not Why An Alternative to the Five Whys for Post Mortem Analysis

FireHydrant May 2021 Product Updates: The summer of integrations

May 27, 2021 By Julia Tran In FireHydrant

With 50% of the US adult population vaccinated, there’s a lot to look forward to this summer, life no longer feels like it’s on hold, and we’re fully embracing that. Get your fire hoses ready, 'cause extinguishing incidents just got easier. We’re rolling out a summer full of new integrations, product releases, events, and more.

Read Post

FireHydrant

Read more about FireHydrant May 2021 Product Updates: The summer of integrations

Be ready for anything in a world of digital everything

May 26, 2021 By PagerDuty In PagerDuty

PagerDuty is a digital operations management platform that empowers the right action, when seconds matter. With over 500 integrations and powerful automation capabilities, we make it easy to stay on top of urgent, mission-critical work and keep your digital services always on. For the developers and IT teams working in real-time operations, PagerDuty makes sure you can focus on what matters most. And stay ready for what’s next.

View Video

PagerDuty

Read more about Be ready for anything in a world of digital everything

Four things to consider when evaluating incident management platforms

May 26, 2021 By Daniel Condomitti In FireHydrant

When you’re feeling the stress and pain around incidents, making the decision to find an incident management tool is a no-brainer. But how do you choose the one that will work for you, your team, and your business? You might be asking yourself: Where do I start? What do I need to know? What questions do I ask? What are the options? How can I be sure we’re choosing the right tool?

Read Post

FireHydrant

Read more about Four things to consider when evaluating incident management platforms

What do site reliability engineers do?

May 25, 2021 By Emily Arnott In Blameless

Are you considering adopting SRE? We will explain the roles and responsibilities of an SRE team within your organization, and how to start building one. So what does an SRE team do? An SRE team is responsible for building software that improves the resiliency of systems, implementing fixes, responding to incidents, and automating processes whenever possible. Site reliability engineering is a holistic practice that incorporates various types of work.

Read Post

Blameless

Read more about What do site reliability engineers do?

Blameless Runbook Documentation is Now Generally Available!

May 25, 2021 By Blameless Community In Blameless

At Blameless, our mission is to provide teams with the tools they need to operationalize SRE and embrace a culture of resilience. We help teams automate toil and adopt best practices across integrated incident management, comprehensive retrospectives, service level objectives, reliability insights, and more. We are very excited to announce that Blameless Runbook Documentation is now generally available for all customers.

Read Post

Blameless

Read more about Blameless Runbook Documentation is Now Generally Available!

SRE Culture [How to Build a Better Team]

May 24, 2021 By Emily Arnott In Blameless

If you're just adopting SRE or improving your current environment, we’ll help explain SRE culture and how to create a blameless development process. So what is SRE Culture? SRE Culture is founded on these main tenets.

Read Post

Blameless

Read more about SRE Culture [How to Build a Better Team]

Discover Everbridge Digital Wayfinding for Higher Education

May 24, 2021 By Everbridge In Everbridge

Creating a positive visitor experience is a key component of the administrative health of a school. Despite advances in technology, campus visits have remained mostly formulaic. Digital Wayfinding takes mobile mapping technology the public is used to and applies it to your school, creating an easy-to-use, attractive, interactive tool for your visitors.

View Video

Everbridge

Read more about Discover Everbridge Digital Wayfinding for Higher Education

Everbridge Engage Platform Overview

May 24, 2021 By Everbridge In Everbridge

Everbridge Engage Platform Overview

View Video

Everbridge

Incident Management

Read more about Everbridge Engage Platform Overview

ITSM Buyers' Guide: 7 Use Cases to Define Your ITSM Goals

May 24, 2021 By Kari Nelson In Ivanti

Attempting an upgrade or switch to a new ITSM tool is obstacle-ridden for IT directors. From having to address fears surrounding the cost of switching vendors to assessing service management maturity, building a case around why and how an ITSM can advance the business can be a harrowing feat. Thankfully, Info-Tech pulled together this selection guide.

Read Post

Ivanti

Read more about ITSM Buyers' Guide: 7 Use Cases to Define Your ITSM Goals

The Incident Review: 4 Odd Incidents Caused by Animals

May 21, 2021 By JJ Tang In Rootly

Incidents and outages caused by animals highlight the importance of flexibility and out-of-the-box thinking when it comes to SRE.

Read Post

Rootly

Read more about The Incident Review: 4 Odd Incidents Caused by Animals

Single Sign-On Now Available on OnPage Enterprise-Level Accounts

May 21, 2021 By Ritika Bramhe In OnPage

Single sign-on (SSO) services provide a unified view into applications, logins and devices through a secure identity cloud. SSO allows users to access SaaS-based applications through one simple login process. We, at OnPage, are excited to announce that we’ve extended our integration catalog to include SSO services like Okta and OneLogin. Through a single sign-on process, OnPage enterprise-level users can access the OnPage dashboard from their Okta and OneLogin accounts.

Read Post

OnPage

Read more about Single Sign-On Now Available on OnPage Enterprise-Level Accounts

New Integration: Declare FireHydrant Incidents from Checkly Alerts

May 21, 2021 By FireHydrant In FireHydrant

Streamlining your incident management process is what we do best, and one of the ways we do that is by acting as the connective tissue across all of your applications. We’ve partnered with Checkly to bring you a new integration that empowers you to detect problems and resolve incidents faster.

Read Post

FireHydrant

Read more about New Integration: Declare FireHydrant Incidents from Checkly Alerts

New Integration: Create Google Meet Incident Bridges Automatically

May 21, 2021 By Julia Tran In FireHydrant

We’re happy to announce our integration with Google Meet to create incident bridges automatically. Using the power of FireHydrant Runbooks, a Google Meet can be added with fully customizable titles and agendas based on your incident details.

Read Post

FireHydrant

Read more about New Integration: Create Google Meet Incident Bridges Automatically

Use Datadog's Notebooks API to programmatically manage your notebooks

May 20, 2021 By Stephanie Niu In Datadog

Datadog Notebooks simplify the way teams across an organization find and share knowledge. By bringing together live data and rich Markdown text, Notebooks help teams create powerful, data-driven documents—from runbooks and support playbooks to incident postmortems and data reports. And with collaboration functionalities like real-time editing and commenting, team members can simultaneously make changes to a document and gather feedback along the way.

Read Post

Datadog

Read more about Use Datadog's Notebooks API to programmatically manage your notebooks

Resilience in Action Episode 7: Killing Ops with Tony Hansmann

May 19, 2021 By Blameless Community In Blameless

Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.

Read Post

Blameless

Read more about Resilience in Action Episode 7: Killing Ops with Tony Hansmann

If everyone is AIOps, which AIOps is right for you?

May 19, 2021 By BigPanda In BigPanda

With so many IT vendors claiming they provide AIOps platforms, how do you understand the differences between them, and decide which flavor of AIOPs to choose for your organization? Join us in a CTO Perspective with Elik Eizenberg, CTO and co-founder at BigPanda, to find out.

View Video

BigPanda

Read more about If everyone is AIOps, which AIOps is right for you?

If everyone is AIOps - which AIOps is right for you?

May 19, 2021 By Yoram Pollack In BigPanda

With so many IT vendors claiming they provide AIOps platforms, how do you understand the differences between them, and decide what flavor of AIOPs to choose for your organization? Join us in a CTO Perspective discussion with Elik Eizenberg, CTO and co-founder at BigPanda, to find the answer. Read the skinny for a brief summary, then either lean back and watch the interview, or if you prefer to continue reading, take a few minutes to read the transcript. Enjoy!

Read Post

BigPanda

Read more about If everyone is AIOps - which AIOps is right for you?

SREview Issue #13 May 2021

May 18, 2021 By Blameless Community In Blameless

Is it a coincidence that “May” and “yay” rhyme? Probably not. This month has been pretty exciting for us here at Blameless, and we’d love to share why. We also have some of our favorite Tweets, content, and events happening in the SRE and resilience engineering community this month.

Read Post

Blameless

Read more about SREview Issue #13 May 2021

SRE vs. DevOps [Understanding Differences & Similarities]

May 17, 2021 By Emily Arnott In Blameless

Site Reliability Engineering (SRE) and DevOps share a goal of building a bridge between development and operations. We'll explore and compare both approaches. Wondering to yourself, which is better for your company, SRE or DevOps? Neither SRE or DevOps is “better,” exactly, since they’re similar yet different in a few key ways: SRE, or site reliability engineering, is a methodology developed by Google engineer Ben Treynor Sloss in 2003.

Read Post

Blameless

Read more about SRE vs. DevOps [Understanding Differences & Similarities]

Make your Onboarding Experience Better with a Murder Mystery Game

May 17, 2021 By Blameless Community In Blameless

Onboarding a new tool can be boring. Or stressful. Or both. When onboarding an incident response tool, it can be difficult to make sure that your team is getting the most from the experience. Do you opt for a run-of-the-mill meeting, or try to learn while in an incident? Neither option is ideal. That’s why Petal’s DevOps Engineer Michael Cole found a new way to get his team using Blameless for their incident response process.

Read Post

Blameless

Read more about Make your Onboarding Experience Better with a Murder Mystery Game

SRE Availability Metrics

May 17, 2021 By John Hasinsky In PagerTree

How available is your website, service, or platform? What must you monitor and measure to ensure availability? How do you translate uptime into availability? This chart has numbers that every Site Reliability Engineer (SRE) should know. Below the chart, you will find answers to commonly asked questions about SRE and associated metrics.

Read Post

PagerTree

Read more about SRE Availability Metrics

Mattermost v5.35 is now available

May 17, 2021 By Katie Wiersgalla In Mattermost

Mattermost v5.35 is generally available today. Incident Collaboration: Ad hoc tasks, stakeholder overview, and more (Cloud and E20 Edition). We are excited to release multiple new features for the Incident Collaboration product:

Read Post

Mattermost

Read more about Mattermost v5.35 is now available

A Day in the Life: Intelligent Observability at Work with our SRE, Dinesh

May 17, 2021 By Helen Beal In Moogsoft

When I asked Charlie for permission to attend this year’s AICon (virtual, natch) I thought it would be a shoo-in; learning’s part of my OKRs after all. But he never makes things easy and his ‘yes’ came with a caveat that’s typical when dealing with him. This time, he claimed he didn’t have the budget for the ticket (a likely story!) and I’d have to find another way to get one.

Read Post

Moogsoft

Read more about A Day in the Life: Intelligent Observability at Work with our SRE, Dinesh

WTF is Incident Management? Post-Panel Wrap-Up

May 17, 2021 By FireHydrant In FireHydrant

That's a wrap! We hosted "WTF is Incident Management" on May 12, 2021. We invited four very knowledgeable panelists to discuss how they define incident management, what changes they'd make if they could start again from scratch, how to manage team stress after an incident, and other subjects. Our panelists were: host Matt Stratton (Staff Developer Advocate at Pulumi), Emily Ruppe (Incident Commander at Twilio), Alina Anderson (Sr.

Read Post

FireHydrant

Read more about WTF is Incident Management? Post-Panel Wrap-Up

Enterprise Alert Alarm Center. A NOC's best friend.

May 17, 2021 By Derdack In Derdack

Over time, Enterprise Alert continues to grow and more and more teams are starting to benefit from Enterprise Alert’s reliable alerting. As part of this process, Enterprise Alert almost always becomes a central component of the NOC and has practically trained the NOC admins. For this reason, here in support we rarely have the pleasure of presenting the features of our alarm center.

Read Post

Derdack

Read more about Enterprise Alert Alarm Center. A NOC's best friend.

New Event Source - Website Monitoring

May 17, 2021 By Derdack In Derdack

Enterprise Alert is constantly evolving to provide our customers with new ways to implement event sources and use new features. With version 9, several new features have been implemented that make it easier for customers to create alerts for specific processes and events. These include the new “Website Monitoring” event source.

Read Post

Derdack

Read more about New Event Source - Website Monitoring

New Event Source - Alert Timer

May 17, 2021 By Derdack In Derdack

Enterprise Alert is constantly evolving to provide our customers with new ways to implement event sources and use new features. With version 9, several new features have been implemented that make it easier for customers to create alerts for specific processes and events. One of them is the new event source “Alert Timer”.

Read Post

Derdack

Read more about New Event Source - Alert Timer

Self-Service for Teams in Enterprise Alert

May 17, 2021 By Derdack In Derdack

A few days ago I had an insightful conversation with one of our customers who inspired me to write this blog. He, like so many other customers, was facing the problem that his Enterprise Alert management overhead was increasing with each new team he added, as he had been managing resources such as event sources, notification channels and alert policies for the new teams as well. His question to us, therefore, was whether he could not also put these management tasks in the hands of the teams.

Read Post

Derdack

Read more about Self-Service for Teams in Enterprise Alert

Enhance NOC Alerts With Incident Management and Alert Automation

May 14, 2021 By Ritika Bramhe In OnPage

In a network operations center (NOC), alerts originating from hundreds of servers, application monitoring systems, emails and ticketing services compete to catch a NOC analyst’s attention. NOCs face many challenges in parsing through alerts to identify actionable notifications and mobilize the right response team into action.

Read Post

OnPage

Read more about Enhance NOC Alerts With Incident Management and Alert Automation

Understanding a Microsoft Service Outage

May 14, 2021 By Stephen Burke In Martello Technologies

Maintaining business continuity when an issue arises has proven to be a challenge many organizations struggle with. A global pandemic being thrown into the mix in Q1 of 2020 (one that many businesses are still navigating through) introduced a new set of problems for both service providers and businesses reliant on those services.

Read Post

Martello Technologies

Read more about Understanding a Microsoft Service Outage

What Are MTTR and MTTD?

May 14, 2021 By Allyson Barr In StackState

There are several metrics in use to determine incident management success. Two of them are MTTD and MTTR, which we will be discussing in this piece.

Read Post

StackState

Read more about What Are MTTR and MTTD?

Practical Guide to SRE: Using SLOs to Increase Reliability

May 13, 2021 By Quentin Rousseau In Rootly

Service Level Objectives (SLOs) are a key component of any successful Site Reliability Engineering initiative. The question is, what are SLOs; and how do you determine what your SLOs should be? Once you've done that, how should you use them?

Read Post

Rootly

Read more about Practical Guide to SRE: Using SLOs to Increase Reliability

Care Converge: Secure Clinical Communication and Collaboration

May 13, 2021 By Everbridge In Everbridge

Everbridge’s CareConverge speeds diagnosis and care, enabling time and resource-constrained providers to manage capacity and deliver quality patient care in less time, while exceeding healthcare compliance standards and patient satisfaction. Whether responding to a daily, non-emergent clinical case or a high-acuity clinical case, collaboration across the health system is seamless, reliable, and HIPAA-compliant.

View Video

Everbridge

Read more about Care Converge: Secure Clinical Communication and Collaboration

Why AHEAD customers use BigPanda with their ServiceNow deployment

May 12, 2021 By BigPanda In BigPanda

Johnny Hatch, Practice Director – Enterprise Monitoring & Analytics at AHEAD, explains why AHEAD's fortune 500 customers use BigPanda to augment their ServiceNow installations.

View Video

BigPanda

Read more about Why AHEAD customers use BigPanda with their ServiceNow deployment

What is Opsgenie?

May 12, 2021 By Opsgenie In Opsgenie

Opsgenie is an on-call and alert management and incident response solution to keep services always on. It empowers Dev and Ops teams to plan for service disruptions and stay in control during incidents. With over 200 deep integrations and a highly flexible rules engine, Opsgenie centralizes alerts, notifies the right people reliably, and enables them to collaborate and take rapid action.

View Video

Opsgenie

Read more about What is Opsgenie?

Celebrities Explain WTF is Incident Management

May 12, 2021 By FireHydrant In FireHydrant

Our friends Felicia Day, Steve Wozniak, and Brian Baumgartner help us explain what the heck incident management is. FireHydrant is the only comprehensive incident management platform that allows you to create consistency for the entire incident response lifecycle to focus on fighting fires faster. From alert to retrospective, tracking, communicating, and reporting on results: FireHydrant will automate the process so you can focus on resolution. Visit firehydrant.io to learn how you can manage the mayhem.

View Video

FireHydrant

Read more about Celebrities Explain WTF is Incident Management

SRE Leaders Panel: Business Agility is what matters, SRE can help you get there

May 11, 2021 By Blameless Community In Blameless

Blameless recently had the privilege of hosting SRE leaders Garima Bajpai, Founder at Community of Practice - DevOps Canada and Jason Fraser, Delivery Lead at VMware Tanzu to discuss the value of crisis during incident response, the best and worst tech transformations they’ve seen, how reliability impacts the flow of value, and more.

Read Post

Blameless

Read more about SRE Leaders Panel: Business Agility is what matters, SRE can help you get there

Welcome to the PagerDuty Community

May 11, 2021 By PagerDuty In PagerDuty

View Video

PagerDuty

Read more about Welcome to the PagerDuty Community

Concrete Steps to Reducing MTTR

May 11, 2021 By Kumar Harsh In Scout

In today’s data-centric world, metrics or numbers define all performance benchmarks. The time between when an event starts and ends shows how well a system can handle and process such events. One of such metrics is MTTR. MTTR usually stands for Mean Time To Resolution, but it has held several meanings over the years. MTTR is a metric used to measure how well a system can bounce back from errors and provide long-lasting solutions.

Read Post

Scout

Read more about Concrete Steps to Reducing MTTR

Monthly Moo Update | April 2021

May 11, 2021 By Adam Frank In Moogsoft

I don’t know about you, but April traveled at the speed of light. A blink and it happened. Our teams have been working at the same speed throughout one of our favorite months of the year. With an incredible amount of updates, we’ve made our product even more transparent and easier to use. It’s not just our world-class documentation that enables you, it’s also the in-product visualizations and enablement that help guide you without you even realizing it.

Read Post

Moogsoft

Read more about Monthly Moo Update | April 2021

Creating a Better Incident Response Plan

May 10, 2021 By Biju Chacko In Squadcast

A few minutes of unexpected downtime can have catastrophic effects! Having a great incident response plan is more than a luxury - it is a necessity for organisations of all sizes today. This blog outlines key activities that can help you in formulating a better incidence plan.

Read Post

Squadcast

Read more about Creating a Better Incident Response Plan

Top SRE Toolchain Used By Site Reliability Engineers

May 7, 2021 By Biju Chacko In Squadcast

We have compiled a list of the most popular and sought out tools (some you may have heard of) that SREs need in their toolkit - at every phase of a production system to keep up with SRE best practices Site reliability engineering (SRE) practices help organizations by ensuring smooth functioning of their deliverables with utmost reliability and resilience. These can be achieved by a set of well-defined tools that are deployed at every phase of the production system to keep up with SRE best practices.

Read Post

Squadcast

Read more about Top SRE Toolchain Used By Site Reliability Engineers

OnPage Recognized in Gartner's Latest Report on CC&C Systems

May 7, 2021 By Ritika Bramhe In OnPage

Gartner’s latest “Quick Answer” report discusses how clinical communication and collaboration (CC&C) systems can enhance pandemic-related provider and patient engagement. Modern healthcare delivery organizations (HDO) invest in CC&C solutions to simplify communication among care teams consisting of physicians, nurses and critical support personnel. The OnPage team is pleased to be recognized as a vendor in Gartner’s latest CC&C publication.

Read Post

OnPage

Read more about OnPage Recognized in Gartner's Latest Report on CC&C Systems

Failover Conf 2021 Wrap-Up

May 7, 2021 By FireHydrant In FireHydrant

That’s a wrap! Gremlin hosted Failover Conf 2: Fail Smarter on April 27, 2021. In attendance were over 500 SREs, developers, sales engineers, product managers, DevOps experts, C-level execs, and other reliability pros from around the globe! This year’s conference included discussions around the future of DevOps, strategies for building reliable teams, analyzing human error to create better systems, and more.

Read Post

FireHydrant

Read more about Failover Conf 2021 Wrap-Up

OnPage Showcased as One of Massachusetts' Top Messaging and Communication Companies

May 6, 2021 By Ritika Bramhe In OnPage

Cutting-edge messaging systems simplify communication and collaboration for organizations with complex communication needs. These systems are equipped with secure mobile messaging and a full suite of automation capabilities that can route notifications and voice calls across on-call teams. These platforms simplify on-call management through digital on-call schedules and escalation policies.

Read Post

OnPage

Read more about OnPage Showcased as One of Massachusetts' Top Messaging and Communication Companies

Practical Guide to SRE: Automating On-Call

May 6, 2021 By JJ Tang In Rootly

Let's all face it, on call work isn't fun. But it can be better. Even if you have to work on call, it would be nice to have at least some of the work done for you, before you drag yourself out of bed at 3am to respond to an incident.

Read Post

Rootly

Read more about Practical Guide to SRE: Automating On-Call

Ivanti Gives Voice to IT Incident Management Software

May 6, 2021 By Brent Bluth In Ivanti

A protracted, exasperating customer service experience popped into my mind while reading this sentence in the Ivanti Voice data sheet: “One of the most frequent customer complaints about call centers is having to repeat information.” Ain’t that the truth. Here’s a brief personal experience.

Read Post

Ivanti

Read more about Ivanti Gives Voice to IT Incident Management Software

Domain-agnostic and here to stay: Gartner outlines the current state and future of AIOps

May 6, 2021 By Yoram Pollack In BigPanda

Coined by Gartner in 2016, the term ‘AIOps’ refers to the combining of big data AI and machine learning to automate and improve IT operations processes. Back then, this very broad definition led to some confusion, with different IT vendors characterizing AIOps differently, depending on what they were actually offering.

Read Post

BigPanda

Read more about Domain-agnostic and here to stay: Gartner outlines the current state and future of AIOps

Webinar (UK) - Silence the Noise: Simplify Your Crisis Response

May 4, 2021 By Everbridge In Everbridge

Silence the Noise: Simplify Your Crisis Response, aims to educate you on simplifying the complexities of managing information during an incident. Since COVID, all organisations have experienced the cumbersome processes of managing a long term, on-going incidents This webinar will address how to simplify information management and apply these practices to a real life scenario.

View Video

Everbridge

Read more about Webinar (UK) - Silence the Noise: Simplify Your Crisis Response

DevOps vs. Agile

May 4, 2021 By AlertOps In AlertOps

DevOps is a term for, “a cross-disciplinary practice dedicated to the study of building, evolving and operating, rapidly-changing resilient systems at scale.” (Jez Humble) There is no wall between development and operations so they work simultaneously and without silos. The system focuses on uniting the developmental and operations teams in a continuous process. Agile is a software development strategy that focuses on responding to change with cross-functional team communication.

Read Post

AlertOps

Read more about DevOps vs. Agile

How Blameless Integrates with Prometheus

May 3, 2021 By Blameless Community In Blameless

Blameless is excited to announce a new source for monitoring data for your SLIs and SLOs. Prometheus is an open source monitoring and alerting solution which is highly customizable.

Read Post

Blameless

Read more about How Blameless Integrates with Prometheus

How Blameless Integrates with New Relic

May 3, 2021 By Blameless Community In Blameless

Blameless is excited to announce a new source for monitoring data for your SLIs and SLOs. New Relic is an observability platform that helps engineers instrument, analyze, troubleshoot, and optimize their entire software stack.

Read Post

Blameless

Read more about How Blameless Integrates with New Relic

How Blameless Integrates with Pingdom

May 3, 2021 By Blameless Community In Blameless

Blameless is excited to announce a new source for monitoring data for your SLIs and SLOs. Pingdom is a leading monitoring platform that allows users to monitor synthetically and with real user data both applications and infrastructure.

Read Post

Blameless

Read more about How Blameless Integrates with Pingdom

How Blameless Integrates with Datadog

May 3, 2021 By Blameless Community In Blameless

Blameless is excited to announce a new source for monitoring data for your SLIs and SLOs. Datadog is a monitoring and security platform for cloud applications. It brings together end-to-end traces, metrics, and logs to make applications, infrastructure, and third-party services observable.

Read Post

Blameless

Read more about How Blameless Integrates with Datadog

PagerDuty Zoom Integration

May 3, 2021 By PagerDuty In PagerDuty

PagerDuty Zoom Integration Overview (Short Overview)

View Video

PagerDuty

Read more about PagerDuty Zoom Integration

Improve your Reliability with Blameless SLOs, Now Generally Available

May 3, 2021 By Blameless Community In Blameless

Blameless is excited to announce that our SLO Manager is now generally available! SLO Manager is a new service added to the Blameless platform. This service helps SRE and engineering teams proactively make data-driven decisions about reliability efforts. According to a survey Blameless conducted, over 80% of organizations use SLOs or will in the next 1-2 years.

Read Post

Blameless

Read more about Improve your Reliability with Blameless SLOs, Now Generally Available

SLOs: What, Why, and How?

May 3, 2021 By Blameless In Blameless

What are SLOs, why are they important, and how can I start crafting them? We get these questions every day. In response, we’re hosting a webinar titled, “SLOs: What, Why, and How?” May 3, 2021 at 1 PM PDT. Kurt Andersen (SRE Architect), Dan Genzale (Director of Infrastructure), and Nicolas Philip (Director PM) will be speaking with one another in a fireside chat about SLO best practices.

View Video

Blameless

Incident Management

Read more about SLOs: What, Why, and How?

Operations | Monitoring | ITSM | DevOps | Cloud

May 2021