August 2023

PagerDuty Expands Generative AI Offerings and Enhances Analytics Capabilities PagerDuty

Aug 31, 2023 By PagerDuty In PagerDuty

Operations Cloud's AI-generated runbooks accelerate automation efforts to drive cost optimization and efficiency for customers.

Read Post

PagerDuty

Read more about PagerDuty Expands Generative AI Offerings and Enhances Analytics Capabilities PagerDuty

A better Grafana OnCall: web-based scheduling, mobile app, email support

Aug 31, 2023 By Devin Cheevers In Grafana

Does anyone really enjoy being on-call? That looming dread over what could go wrong? The alarms in the middle of the night when everything does in fact go wrong? Of course not! But that doesn’t mean on-call shifts need to be a giant bundle of anxiety and exhaustion. This is something near and dear to our hearts at Grafana Labs, since the majority of our engineers participate in on-call shifts.

Read Post

Grafana

Read more about A better Grafana OnCall: web-based scheduling, mobile app, email support

What's New: Enhanced PagerDuty Analytics for Faster Insights and Smarter Recommendations

Aug 31, 2023 By Cristina Dias In PagerDuty

Data has become the lifeblood of businesses, empowering organizations to make more informed decisions, drive innovation, and gain a competitive edge. McKinsey touts the benefits of adopting data-supported capabilities, referring to the various ways data is utilized to enable and enhance the functioning of an organization.

Read Post

PagerDuty

Read more about What's New: Enhanced PagerDuty Analytics for Faster Insights and Smarter Recommendations

Democratize Automation with AI-Generated Runbooks

Aug 31, 2023 By Ranjana Devaji In PagerDuty

Operational efficiency is as critical within the IT and engineering teams as any other part of the business. Automating repetitive tasks and reducing escalations within and to these teams is of immense value. While automation saves time and boosts productivity, the complexity of developing automation can be a limiting factor and bottleneck. Generative AI is a paradigm shift here, in that it brings consumer-style simplicity to assisting in the development of enterprise-grade automation.

Read Post

PagerDuty

Read more about Democratize Automation with AI-Generated Runbooks

Incident Management Today: Benefits, 6-Step Process & Best Practices

Aug 31, 2023 By Laiba Siddiqui In Splunk

Disruptive cybersecurity incidents become more and more commonplace each day. Even if nothing is directly hacked, these incidents can harm your systems and networks. Navigating cybersecurity incidents is a constant challenge — the best way to stay ahead of the game is with effective incident management.

Read Post

Splunk

Read more about Incident Management Today: Benefits, 6-Step Process & Best Practices

Reimagining Retrospectives

Aug 31, 2023 By Blameless In Blameless

The Blameless retrospective is one of the most often discussed and rarely executed components of the SRE practice. Getting real value from the retrospective process takes time, focus and the right approach. This webinar features Ken Gavranovic and author of Architecting For Scale Lee Atchison, where they discuss the blueprint for high-performing engineering teams to maximize the value of retrospectives.

View Video

Blameless

Read more about Reimagining Retrospectives

Process Automation v4.16.0 Release Notes

Aug 30, 2023 By PagerDuty In PagerDuty

Product Managers Jake and Forrest are back to share exciting new improvements in the 4.16.0 release of PagerDuty Process Automation. Big news in this release: your Process Automation Enterprise Runners can now access secrets from HashiCorp Vault services local to the Runner. Jake shows us how it’s done.

View Video

PagerDuty

Read more about Process Automation v4.16.0 Release Notes

Notification Alternatives to SMS

Aug 30, 2023 By PagerDuty In PagerDuty

Product Managers Abby Allen and Vivek Raj Saxena join Engineering Manager Girish Shankarraman to talk with Mandi Walls about the challenges and regulatory environment of SMS notifications. SMS is a popular notification channel for PagerDuty users but other types of notifications can get you important information faster.

View Video

PagerDuty

Incident Management

Read more about Notification Alternatives to SMS

Enriching Incidents with Event Orchestration

Aug 30, 2023 By PagerDuty In PagerDuty

Event data isn’t always helpful. Instead of parsing this by hand and hoping you’re reading it right, enrich these incidents. Add key fields that tell you exactly what’s happening immediately with Event Orchestration.

View Video

PagerDuty

Incident Management

Read more about Enriching Incidents with Event Orchestration

Pausing Incidents with Event Orchestration

Aug 30, 2023 By PagerDuty In PagerDuty

Some incidents resolve themselves without any intervention. Getting alerted for these incidents is just an interruption that you don’t need. Pause incident creation and only get alerted if the incident doesn’t resolve in the usual timeframe with Event Orchestration.

View Video

PagerDuty

Incident Management

Read more about Pausing Incidents with Event Orchestration

10 Reasons AlertOps is the Preferred PagerDuty Competitor

Aug 30, 2023 By AlertOps In AlertOps

PagerDuty recently made changes to their pricing plans by moving rules-based noise suppression features out of their Professional Plan into the Event Intelligence add-on module. AlertOps includes rules-based noise suppression features beginning in the Premium Plan. AlertOps plans offer more competitive noise suppression features vs PagerDuty plans.

Read Post

AlertOps

Read more about 10 Reasons AlertOps is the Preferred PagerDuty Competitor

Pagerduty Pricing:10 Reasons Alertops Value Your Money

Aug 30, 2023 By AlertOps In AlertOps

Customers are not sure exactly which features they need to use when they sign up. After using PagerDuty, they discover they need to upgrade to another plan to get some features that they need. When compared to PagerDuty Pricing, AlertOps pricing plans are simpler and less confusing.

Read Post

AlertOps

Read more about Pagerduty Pricing:10 Reasons Alertops Value Your Money

Why the Blameless Mission Matters Today

Aug 30, 2023 By Emily Arnott In Blameless

Blameless was founded over 5 years ago, in a world that looked very different than the world today. We were the first mover in the incident management space, setting the standards for what these tools should achieve. These days, concerns about reliability, incidents, and toil have hit the mainstream. Why have we seen the tech world enter an era where reliability is priority #1? Why do we believe that the Blameless mission matters more today than ever before?

Read Post

Blameless

Read more about Why the Blameless Mission Matters Today

Latest Developments in Site Reliability Engineering, 2023

Aug 30, 2023 By Halle Katz In OnPage

Gartner recently published its Hype Cycle for Site Reliability Engineering, 2023, (July 2023) report. OnPage was inspired by this report to share its prediction about the future of site reliability engineering. In this blog, OnPage will review evolutionary tools that can improve site reliability engineering practices.

Read Post

OnPage

Read more about Latest Developments in Site Reliability Engineering, 2023

How to configure Grafana Incident with Microsoft Teams

Aug 30, 2023 By Jackson Coelho In Grafana

Grafana Incident, the powerful incident response tool that is part of the Grafana IRM suite in Grafana Cloud, comes with a range of integrations out of the box, including Zoom and Google Meet spaces, GitHub and JIRA issues, and even a Google Doc template for post-incident review documents. One of the key features in Grafana Incident is the chatbot integration, which previously only supported Slack.

Read Post

Grafana

Read more about How to configure Grafana Incident with Microsoft Teams

Routing Incidents with Event Orchestration

Aug 30, 2023 By PagerDuty In PagerDuty

Have events that you know should go to one team or another based on set criteria? Stop bouncing those from team to team and immediately route them to the right person dynamically with Event Orchestration.

View Video

PagerDuty

Incident Management

Read more about Routing Incidents with Event Orchestration

Suppressing Alerts with Event Orchestration

Aug 30, 2023 By PagerDuty In PagerDuty

Some events are just informational. Do they need to wake you up or interrupt you from other work? No. Suppress them with Event Orchestration.

View Video

PagerDuty

Read more about Suppressing Alerts with Event Orchestration

Add Notes to an Incident with Event Orchestration

Aug 30, 2023 By PagerDuty In PagerDuty

Kick start your incident response with the right context. Populate incident notes with wikis, runbooks, or additional notes to reduce tribal knowledge and codify response processes with Event Orchestration.

View Video

PagerDuty

Incident Management

Read more about Add Notes to an Incident with Event Orchestration

The Unplanned Show, Episode 11: Donnie Berkholz on ITIL, DevOps and Platforms

Aug 29, 2023 By PagerDuty In PagerDuty

In this episode, Donnie breaks down where ITIL came from and where it’s starting to go, and why that’s useful for teams that are trying to adopt DevOps practices in ITIL-oriented organizations. Donnie gives some great examples of building empathy and bringing the ITIL teams along for automating changes and decentralizing Sev 2 incident management. He also lays out his core philosophies on Platform Engineering and how to justify the effort.

View Video

PagerDuty

Read more about The Unplanned Show, Episode 11: Donnie Berkholz on ITIL, DevOps and Platforms

Three Teams That Can Use AIOps to Work Smarter, Not Harder

Aug 28, 2023 By Hannah Culver In PagerDuty

There isn’t a boardroom today that isn’t asking what AI and generative AI in application can help drive efficiency and accelerate their business. For organizations looking to capitalize on ML and automation to improve their efficiency during incidents, AIOps is a tangible, proven application thatproves to be an exciting opportunity for ITOps teams. As we’ve seen across market landscape evaluations, there are a number of ways that solutions can be implemented.

Read Post

PagerDuty

Read more about Three Teams That Can Use AIOps to Work Smarter, Not Harder

A Practical Guide to Incident Communication

Aug 28, 2023 By Emily Arnott In Blameless

Even the best software fails sometimes. How quickly those failures get addressed, and how your teammates and customers feel about you after the fact, comes down to how well you communicate with them. Users, customer success managers, Ops team members, IT, security, engineering leadership, even the executive team. Each has a vested interest in resolving engineering incidents quickly. All need to be updated with the right information at the right time.

Read Post

Blameless

Read more about A Practical Guide to Incident Communication

Continuous Deployment vs. Delivery | Differences Explained

Aug 28, 2023 By Noor-ul-Anam Ruqayya In Blameless

Curious about continuous deployment vs delivery? We explain what each is, what happens in each step, and their importance in the DevOps lifecycle.

Read Post

Blameless

Read more about Continuous Deployment vs. Delivery | Differences Explained

What is MTTR? The Different Meanings Explained

Aug 28, 2023 By Noor-ul-Anam Ruqayya In Blameless

Curious about MTTR? We explain what the mean time to recovery is, why it matters to your development team, and how to reduce it.

Read Post

Blameless

Read more about What is MTTR? The Different Meanings Explained

Incident Management KPIs | Choosing Metrics that Matter

Aug 28, 2023 By Noor-ul-Anam Ruqayya In Blameless

Wondering about incident management KPIs? We explain what incident management metrics are, how to track them, and what to do with the information.

Read Post

Blameless

Read more about Incident Management KPIs | Choosing Metrics that Matter

How to use Key-Based Deduplication in Squadcast | Deduplication Rules | Squadcast

Aug 28, 2023 By Squadcast In Squadcast

Key Based Deduplication is an efficient way to avoid duplicate entries when processing incoming Events alongside existing Incidents. It generates a Deduplication Key using a user-defined template specific to events from an Alert Source. This key helps identify and group duplicates. This video explains how does Key Based Deduplication work and how to set it up effectively.

View Video

Squadcast

Read more about How to use Key-Based Deduplication in Squadcast | Deduplication Rules | Squadcast

Helm Dry Run: Guide & Best Practices

Aug 27, 2023 By Squadcast Community In Squadcast

Kubernetes, the de-facto standard for container orchestration, supports two deployment options: imperative and declarative. Because they are more conducive to automation, declarative deployments are typically considered better than imperative. A declarative paradigm involves: The issue with the declarative approach is that YAML manifest files are static.

Read Post

Squadcast

Read more about Helm Dry Run: Guide & Best Practices

Managing On-Call Rotations: Navigating Incident Management from Chaos to Calm

Aug 25, 2023 By Chitra Bisht In Squadcast

Navigating On-Call rotations can often feel like taming a storm of alerts and constant disruptions, leaving teams overwhelmed and stressed. Hence there is a need to streamline On-Call rotations and leverage concerned software to restore order and peace. In this guide, you'll explore practical tips, best practices, and smart strategies to transform your Incident Management process. Let's get to a more efficient On-Call experience.

Read Post

Squadcast

Read more about Managing On-Call Rotations: Navigating Incident Management from Chaos to Calm

Demo Roundup: What's new in the PagerDuty Operations Cloud, August 2023

Aug 25, 2023 By PagerDuty In PagerDuty

Customer-impacting issues detected and reported by customers anywhere from 20% to 90%+! In this episode of our quarterly demo roundup, we'll see how to quickly take action on a customer-reported issue, with the help of #GenerativeAI and more great new capabilities in the PagerDuty Operations Cloud. Six of PagerDuty’s product managers give live demos.

View Video

PagerDuty

Incident Management

Read more about Demo Roundup: What's new in the PagerDuty Operations Cloud, August 2023

Everbridge Business Operations

Aug 25, 2023 By Everbridge In Everbridge

Everbridge Business Operations helps businesses prepare for, and respond to, critical events, protecting facilities and business operations. Built on Everbridge’s industry-leading critical event management (CEM) platform, businesses can detect potential risks that might impact business operations and orchestrate a response in seconds across teams and digital/physical systems.

View Video

Everbridge

Read more about Everbridge Business Operations

Everbridge Smart Security

Aug 25, 2023 By Everbridge In Everbridge

Everbridge Smart Security helps organizations maintain control of their security by identifying threats to their people or assets and orchestrating a rapid response across teams and systems, all within an easy-to-use common platform.

View Video

Everbridge

Read more about Everbridge Smart Security

Everbridge People Resilience

Aug 25, 2023 By Everbridge In Everbridge

Everbridge People Resilience solutions help businesses prepare for, and respond to, critical events, keeping people healthy, safe, and productive wherever they work or travel around the globe. Built on Everbridge’s industry-leading Critical Event Management (CEM) platform, businesses can detect potential threats that might impact your people, and orchestrate a rapid response across teams and digital/physical systems.

View Video

Everbridge

Read more about Everbridge People Resilience

Everbridge Digital Operations

Aug 25, 2023 By Everbridge In Everbridge

Everbridge Digital Operations Platform allows teams to open fewer tickets and spend less time reinventing iterative resolutions, leading to faster Mean Time to Repair (MTTR)—so impacted services get resolved before users become aware. Less time spent on incidents means more time for innovation.

View Video

Everbridge

Read more about Everbridge Digital Operations

The Unplanned Show, Episode 10: Mitra Goswami on Generative AI

Aug 25, 2023 By PagerDuty In PagerDuty

In this episode, Mitra shares a bunch of valuable insights in how to successfully adopt generative AI, from selecting use cases that deliver value, having foundational data infrastructure in place, to having design and privacy guidelines. Grab a paper and pen and take some notes!

View Video

PagerDuty

Read more about The Unplanned Show, Episode 10: Mitra Goswami on Generative AI

Streamlining incident response: the power of integration in engineering tools

Aug 24, 2023 By Mike Lacsamana In FireHydrant

In the ever-evolving world of software development, incidents are bound to happen. Whether it's an unexpected server crash, a critical bug impacting user experience, or a security breach, handling incidents swiftly and effectively is crucial for maintaining a seamless user experience and preserving business reputation. That's where incident response tools come in — to help you automate, document, communicate, and mitigate.

Read Post

FireHydrant

Read more about Streamlining incident response: the power of integration in engineering tools

More than downtime: the opportunity costs of poor incident management

Aug 24, 2023 By Robert Ross In FireHydrant

In my last blog post, I wrote about the explicit costs of incidents — the ones you can easily track based on dollars lost. But the cost of incidents goes beyond the time spent resolving them. While we’re spending our time managing incidents (that includes mitigating and retrospectives), we’re incurring a large opportunity cost in terms of releasing the next big thing.

Read Post

FireHydrant

Read more about More than downtime: the opportunity costs of poor incident management

Connect OneUptime to Slack using Workflows

Aug 24, 2023 By OneUptime In OneUptime

Connect OneUptime to Slack using Workflows.

View Video

OneUptime

Read more about Connect OneUptime to Slack using Workflows

August 2023 update - calendar export and video attachments in the mobile app

Aug 23, 2023 By René In SIGNL4

Our August update brings the ability to export shift scheduling from SIGNL4 to other calendar apps. In addition, video attachments are now supported for Signls. As always, all the details are in this blog article.

Read Post

SIGNL4

Read more about August 2023 update - calendar export and video attachments in the mobile app

New features summer wrap-up: Evolving ChatOps, AI-assisted Incident Comms, and Time-based alert grouping

Aug 23, 2023 By Daria Yankevich In iLert

It is time to sum up the product updates that we introduced during summer 2023. As always, our focus has been on minimizing limitations in the incident response process and accelerating the workflow from acknowledgment to resolution. We invite you to contribute to the ilert roadmap by submitting your feature and improvement ideas here.

Read Post

iLert

Read more about New features summer wrap-up: Evolving ChatOps, AI-assisted Incident Comms, and Time-based alert grouping

OnPage - Datto's Autotask PSA Integration [UPDATED] Setup, workflow creation and ticketing

Aug 23, 2023 By OnPage In OnPage

OnPage partners with Datto’s Autotask PSA to convert tickets into intelligent alerts, reducing operating costs, improving response rates and SLA compliance.

View Video

OnPage

Read more about OnPage - Datto's Autotask PSA Integration [UPDATED] Setup, workflow creation and ticketing

We Need to Talk About the Hero Pattern Among SREs

Aug 22, 2023 By Hans Chung In Rootly

Let’s be honest. When you see an alert pop up on your phone, you aren’t thinking “according to section 12 of our most recent SRE handbook used at training 6 months ago I need to keep in mind who should be Incident Commander and who should be Ops Lead”. You’re an engineer at heart.

Read Post

Rootly

Read more about We Need to Talk About the Hero Pattern Among SREs

The Iceberg of Engineering Incident Costs

Aug 22, 2023 By Aaron Lober In Blameless

I've long been fascinated with the metaphor of an iceberg to describe a problem who’s true magnitude is obscured beneath the surface. If you’re not familiar with this phenomenon, when ice freezes it decreases in density. This allows the solid ice to float, partially, atop the water with only a small fraction of it exposed. In fact, icebergs hold nearly 90% of their mass hidden below the water.

Read Post

Blameless

Read more about The Iceberg of Engineering Incident Costs

Advancements in Real-Time Health System Technologies, 2023

Aug 22, 2023 By Zoe Collins In OnPage

The OnPage team is pleased to inform you that we’ve been acknowledged in the Gartner® Hype Cycle™ for Real-Time Health System Technologies, 2023 report, as a Sample Vendor in the Clinical Communication and Collaboration category. As per the Gartner report, “This Hype Cycle includes technologies pivotal to the real-time health system vision.

Read Post

OnPage

Read more about Advancements in Real-Time Health System Technologies, 2023

3 New Updates to the PagerDuty Scheduling Experience

Aug 18, 2023 By Débora Cambé In PagerDuty

With the acceleration of cloud and digital transformation initiatives, enterprises are under pressure to adopt more agile, DevOps practices to be responsive to the business. But the increased complexity of digital systems and reliance on digital business only makes the cost of incidents more expensive.

Read Post

PagerDuty

Read more about 3 New Updates to the PagerDuty Scheduling Experience

Incident Management: A Complete Introduction

Aug 17, 2023 By Guest Author In Netreo

In the dynamic landscape of IT operations, incidents are bound to occur. Incident management is a structured and proactive approach to address and resolve these unexpected events promptly and effectively. It forms a crucial component of IT service management (ITSM), ensuring smooth operations and minimizing the impact of incidents on an organization’s productivity and customer experience.

Read Post

Netreo

Read more about Incident Management: A Complete Introduction

10 Observability Tools in 2023: Features, Market Share and Choose the Right One for You

Aug 17, 2023 By Anjali Udasi In Zenduty

Understanding what's happening within your systems is a necessity. Have you ever wondered how experts keep an eye on systems to make sure everything's running smoothly? That's where observability tools come in! Observability tools are like helpers that give you a peek inside your tech. In this blog, we will talk about observability tools and how they can be used in different situations so it's easier for you to choose the right one for your organization.

Read Post

Zenduty

Read more about 10 Observability Tools in 2023: Features, Market Share and Choose the Right One for You

PagerDuty Recognized in 12 2023 Gartner Hype Cycle Reports

Aug 16, 2023 By Sean Scott In PagerDuty

While most of the world knows us for on-call management, we’ve been hard at work expanding the PagerDuty Operations Cloud to other areas like AIOps, Process Automation and Customer Service Operations (CSOps). Underscoring our commitment to redefining digital operations management for our customers, our commitment to R&D and delivering the best products and platform has resulted in PagerDuty being recognized in 12 distinct 2023 Gartner Hype Cycle reports across nine unique categories.

Read Post

PagerDuty

Read more about PagerDuty Recognized in 12 2023 Gartner Hype Cycle Reports

More than downtime: the explicit costs of poor incident management

Aug 16, 2023 By Robert Ross In FireHydrant

A cold fact of SaaS Life™ is that you can’t make money when your product or website doesn’t work — and those lost dollars add up fast. Downtime, SLA breach paybacks, compliance fines, and other explicit costs are the easiest to quantify and they’re what most people think of when they think about incidents.

Read Post

FireHydrant

Read more about More than downtime: the explicit costs of poor incident management

Reduce MTTR with Grafana, Grafana k6, and Prometheus: Inside DHL's observability stack

Aug 16, 2023 By Lauren Johnson In Grafana

Each year, more than 296 million packages are shipped around the world via DHL and their premium service, Time Definite International. And at DHL Express Switzerland, a local unit of the international logistics and shipping company, the IT team provides solutions for tracking customs clearance progress, analytics, mobile and optical character recognition (OCR) scanning, and warehouse management on every package that moves through Switzerland.

Read Post

Grafana

Read more about Reduce MTTR with Grafana, Grafana k6, and Prometheus: Inside DHL's observability stack

The Unplanned Show, Episode 9: James Urquhart on Flow Architectures

Aug 16, 2023 By PagerDuty In PagerDuty

“All data is valuable when it’s generated, but the question is how fast does the value of the data decay?”

View Video

PagerDuty

Incident Management

Read more about The Unplanned Show, Episode 9: James Urquhart on Flow Architectures

PagerDuty Schedules: How to Add or Remove Teams from a Schedule

Aug 16, 2023 By PagerDuty In PagerDuty

Learn how users with at least an admin base role can add or remove teams when editing a schedule.

View Video

PagerDuty

Incident Management

Read more about PagerDuty Schedules: How to Add or Remove Teams from a Schedule

CloudOps: Transforming IT Operations in the Cloud

Aug 15, 2023 By OnPage Corporation In OnPage

CloudOps, or Cloud Operations, is quickly becoming the standard for managing IT operations in the cloud computing ecosystem. By transforming traditional IT operations to harness the full potential of the cloud, businesses are experiencing greater automation, collaboration, agility, and resilience. This article is a deep dive into the concept of CloudOps, its core components, the advantages it offers, and the steps necessary to implement it effectively within an organization.

Read Post

OnPage

Read more about CloudOps: Transforming IT Operations in the Cloud

Welcome To xMatters - Ep4 - Initiating Incidents

Aug 15, 2023 By xMatters In xMatters

Everyone makes mistakes. So, it is important that when they do, we can act quickly, resolve the problem, and understand what went wrong to reduce the chances of it happening again. When your business is suddenly impacted by an unforeseen event, it’s important that you can efficiently report the problem and call for help as soon as possible. With xMatters, you can initiate incidents quickly and target specific groups with the vital information they need.

View Video

xMatters

Incident Management

Read more about Welcome To xMatters - Ep4 - Initiating Incidents

Webinar : Status Page Customization

Aug 15, 2023 By StatusCast In StatusCast

Reimagining Status Pages - Beyond the Cookie-Cutter Approach. Don't limit yourself to a status page you don’t fully control. Leverage customizable status pages for superior UX and design. Own everything that matters; shape user experience, represent your brand and communicate effectively.

View Video

StatusCast

Read more about Webinar : Status Page Customization

But It's Not Our Fault! When Third-party Incidents Affect Your Service

Aug 14, 2023 By Ashley Sawatsky In Rootly

Very few SaaS products exist completely independently. Between cloud service providers, payment processors, content delivery networks, and more, chances are you rely on external systems to keep your product working. When these systems fail, it can leave you feeling pretty helpless. In some cases you might have fallback options, but oftentimes all you can do is wait for recovery and clean up the fallout.

Read Post

Rootly

Read more about But It's Not Our Fault! When Third-party Incidents Affect Your Service

Azure Monitoring Agent: Key Features & Benefits

Aug 13, 2023 By Squadcast Community In Squadcast

In today's rapidly evolving digital landscape, businesses increasingly rely on cloud computing and infrastructure to support their operations. As organizations migrate their workloads to the cloud, robust monitoring and management tools are paramount to ensure optimal performance, security, and efficiency. In response to this demand, Microsoft Azure has introduced the Azure Monitoring Agent (AMA), a powerful and versatile solution designed to enhance the monitoring capabilities of Azure resources.

Read Post

Squadcast

Read more about Azure Monitoring Agent: Key Features & Benefits

How To Write Incident Postmortems

Aug 10, 2023 By Anjali Udasi In Zenduty

Writing a public postmortem regarding an outage is essential to maintaining transparency and accountability when things go wrong in a service or system. The purpose of writing a postmortem is to analyze and document an incident or event that has occurred, usually with a focus on identifying its root causes, understanding what went wrong, and outlining steps to prevent similar issues from happening in the future.

Read Post

Zenduty

Read more about How To Write Incident Postmortems

The Unplanned Show, Episode 8: Platform Engineering with Martin Van Son

Aug 10, 2023 By PagerDuty In PagerDuty

In this episode, Martin Van Son provides a simplified definition of platforms in this context: a way for internal users to request anything from environments to deployments. The platform engineering comes in because someone needs to own stitching together and automating away all the complexity involved to complete that action. In the end, both the consumers and the creators save time. Furthermore, platform engineers have an opportunity to encode best practices and cost saving measures that are often forgotten when users are left to their own devices.

View Video

PagerDuty

Read more about The Unplanned Show, Episode 8: Platform Engineering with Martin Van Son

New OnPage + ConnectWise Incident Alerting Workflow

Aug 10, 2023 By OnPage In OnPage

OnPage has combined the power of voicemail transcription with keyword-based triggers to identify and prioritize after-hours incidents. The new OnPage + ConnectWise workflow enhances incident alert management for IT and Managed IT clients by drastically decreasing incident response times. By streamlining after-hours on-call communication, OnPage's critical alerting platform has revolutionized the on-call IT industry.

View Video

OnPage

Read more about New OnPage + ConnectWise Incident Alerting Workflow

Rootly Raises $12 Million from Renegade Partners, Google Gradient Ventures, & XYZ Ventures

Aug 10, 2023 By JJ Tang In Rootly

We are excited to announce that we have raised a $12M round of financing led by Renegade Partners with participation from Google Gradient Ventures (Google’s AI-focused venture fund) and XYZ Ventures. This brings our total funding to date to $15.2M ($20M CAD) alongside our other existing investors Y Combinator and 8VC.

Read Post

Rootly

Read more about Rootly Raises $12 Million from Renegade Partners, Google Gradient Ventures, & XYZ Ventures

July 2023 newsletter: Changelog-The Deluxe Edition

Aug 10, 2023 By incident.io In Incident.io

🎵 Gotta give the people, give the people what they want! 🎵 You've been asking. And we've been listening. Over the past few weeks, we've been shipping frequently requested features to help you bring your incident management to the next level. It may be the dog days of summer, but let's ignore that, yeah? Just take a look at this recent changelog. Note that this is the biggest one we've ever published.

Read Post

Incident.io

Read more about July 2023 newsletter: Changelog-The Deluxe Edition

From On-call to Non-call: Resolving Incidents Before They Even Happen

Aug 9, 2023 By Datadog In Datadog

Artificial intelligence has captured the attention of the world, with tools like ChatGPT and large language models (LLMs) driving the conversation. But you don’t need to wait for the future or new features powered by LLMs to start working smarter—the tech industry has been investing in intelligent, automated tools for years and they’re ready for production now. In this talk, you’ll learn how the engineering teams at Toyota Connected use tools like Datadog Watchdog, Anomaly Detection, and Workflows to make our lives easier and keep our platform stable.

View Video

Datadog

Read more about From On-call to Non-call: Resolving Incidents Before They Even Happen

Tools and Trends in Site Reliability Engineering according to Gartner's 2023 Hype Cycle

Aug 9, 2023 By Halle Katz In OnPage

Gartner recently published its Hype Cycle for Site Reliability Engineering, 2023, report. This blog reviews the future of site reliability engineering based on Gartner’s Hype Cycle. Additionally, the OnPage team is pleased that Gartner mentioned OnPage as a sample vendor in the Automated Incident Response category.

Read Post

OnPage

Read more about Tools and Trends in Site Reliability Engineering according to Gartner's 2023 Hype Cycle

Exploring distributed vs centralized incident command models

Aug 8, 2023 By Robert Ross In FireHydrant

Recently in our Better Incidents Slack channel, there’s been some chatter around how people structure dedicated incident commanders at their company: distributed or centralized. The way I see it, there are two types of commanders: the temporary, distributed role — a hat that an on-call engineer or an engineering manager puts on during an incident. Then there’s the centralized, full-time role, where someone is the designated incident commander (or one of a few) for all incidents.

Read Post

FireHydrant

Read more about Exploring distributed vs centralized incident command models

BigPanda's Resources for Navigating Change Through the AI Revolution

Aug 8, 2023 By Alec Down In BigPanda

AI has revolutionized the way we engage online in 2023. From Chat GPT and AI Art Generators to healthcare, finance, and business, you can hardly read the news without reading the latest proclamation of how AI is poised to change every aspect of our lives. AI has brought fundamental changes to how we live and work, and we’re still scrambling to understand the impacts of these changes. Especially where their work is concerned, change can be difficult for people to embrace.

Read Post

BigPanda

Read more about BigPanda's Resources for Navigating Change Through the AI Revolution

Getting Started with PagerDuty

Aug 8, 2023 By PagerDuty In PagerDuty

In this video you will achieve a baseline understanding of what PagerDuty does and how to configure your PagerDuty account. To dive deeper into the PagerDuty platform, select relevant topics in our complimentary on-demand e-learning center at university.pagerduty.com. The PagerDuty Operations Cloud is essential infrastructure that detects and diagnoses disruptive events, mobilizes the right team members to respond, and automates workflows across your digital operations - so that your business moves forward, faster. Get started now!

View Video

PagerDuty

Incident Management

Read more about Getting Started with PagerDuty

What's missing from your incident management workflow

Aug 8, 2023 By Cortex In Cortex

The first fifteen minutes of an incident set the tone for the rest of the resolution process. But what makes the difference between a rapid response and a stressful scramble—clear ownership—hasn't always been easy to ascertain. In this article, we’ll cover how Cortex, an internal developer portal, can be your team’s source of truth to accelerate the incident management process, and reduce MTTR.

Read Post

Cortex

Read more about What's missing from your incident management workflow

Synced for Success: OnPage & Slack for Incident Response

Aug 7, 2023 By Ritika Bramhe In OnPage

As the post-pandemic world finds its footing again, a resilient spirit drives the revival, propelling businesses to embrace a new era of technological innovation. Notably, IT teams are swiftly adopting the digital transformation of their processes, particularly in incident response. From virtual collaboration tools and remote IT support to automated incident management, teams have found innovative ways to ensure seamless business continuity while delivering IT services with minimum downtimes.

Read Post

OnPage

Read more about Synced for Success: OnPage & Slack for Incident Response

Scaling Up to Keep Costs Down: Automation for Web Application Incident Management

Aug 7, 2023 By Derek Pascarella In Resolve

Any organization that’s keeping up with today’s sharp rise in business demands (or better yet, getting ahead of the game) is doing so by getting innovative and jumping at the chance to do things differently. They’re not relying on the old ways or trying to use their existing toolbox. Instead, organizations are looking to the newest technologies and means of adding efficiency to as many day-to-day functions as possible.

Read Post

Resolve

Read more about Scaling Up to Keep Costs Down: Automation for Web Application Incident Management

Evolution of Site Reliability - Incidentally Reliable with Manoj Sebastian

Aug 4, 2023 By Zenduty In Zenduty

Catch Manoj Sebastian(ex-Flipkart, Amazon, Atlassian, Intuit, Yahoo) talk about The Evolution of SRE through 20 years, Incident Response and Post Incident Culture at Big Tech and the Future of Reliability with AI ramping up at full speed. The freshest podcast for Site Reliability Engineers, hosted by Vishwa and Shubham from Zenduty.

View Video

Zenduty

Read more about Evolution of Site Reliability - Incidentally Reliable with Manoj Sebastian

incident.io: A scalable incident management solution built for enterprises

Aug 4, 2023 By Luis Gonzalez In Incident.io

For enterprise businesses, a lot is riding on the efficiency of their incident response. These organizations have large customer bases, complex products, and many incidents. They also have loads of incident responders across various roles, making it difficult to coordinate internally.

Read Post

Incident.io

Read more about incident.io: A scalable incident management solution built for enterprises

Group Performance Report - xMatters Support

Aug 4, 2023 By xMatters In xMatters

The Group Performance Report in xMatters displays a group's event response statistics, letting organizations compare how groups are handling the events assigned to them.

View Video

xMatters

Incident Management

Read more about Group Performance Report - xMatters Support

Unveiling Squadcast's Enhanced Status Pages

Aug 3, 2023 By Sanjog Sandhu In Squadcast

Meet Kevin and Mai (again): Navigating the Troublesome Waters of Platform Downtime. Kevin is a Site Reliability Engineer (SRE), constantly on the lookout for potential downtime that could impact their platform, kryptobro.com. Mai is his adept partner, ever-ready to troubleshoot. In their journey, the previous version of Squadcast Status Pages served as a helpful tool, but they soon found room for improvements.

Read Post

Squadcast

Read more about Unveiling Squadcast's Enhanced Status Pages

Discover what's driving the recognition behind BigPanda's AIOps innovations

Aug 3, 2023 By Joel McKelvey In BigPanda

Every day, BigPanda is transforming the way our customers operate. Our advanced AIOps technology redefines incident management, prevents service disruptions, and elevates customer satisfaction – and I couldn’t be more thrilled to see industry experts take notice. I’m particularly proud to see BigPanda mentioned in nine of the highly esteemed 2023 Gartner Hype Cycle reports.

Read Post

BigPanda

Read more about Discover what's driving the recognition behind BigPanda's AIOps innovations

Process Automation v4.15 Release Notes

Aug 3, 2023 By PagerDuty In PagerDuty

What’s new in PagerDuty Process Automation? New configuration workflow for the secrets storage you’re already using! New IAM authentication in AWS! Jake and Forrest join Tiago and Mandi for a tour of what’s new in this release and a demo of the new features for AWS and secret storage.

View Video

PagerDuty

Read more about Process Automation v4.15 Release Notes

Demo Roundup: PagerDuty Operations Cloud for Kubernetes

Aug 3, 2023 By PagerDuty In PagerDuty

In this demo, Corbin Mills shows how to use the PagerDuty Operations Cloud to streamline and automate how a node failure is resolved. You’ll see how he uses event orchestration (in PagerDuty AIOps) to enrich an alert with pod names, and automatically runs a job to check the Kube API status, so that a responder has instant context. AIOps is also grouping and suppressing alerts. Then you’ll see how the responder can run more health status checks without the need to SSH into the environment or interrupt a co-worker for access.

View Video

PagerDuty

Read more about Demo Roundup: PagerDuty Operations Cloud for Kubernetes

Kubernetes Incident Management Best Practices

Aug 3, 2023 By Rajesh Tilwani In Rootly

Creating just any infrastructure on Kubernetes is not enough. There are so many basic configurations you could apply and create the infrastructure for your application for the time being and it might work just fine. The incident responses won’t always remain 100% reliable. You will run into newer potholes, and that’s okay.

Read Post

Rootly

Read more about Kubernetes Incident Management Best Practices

Introducing Personalized Service Health: Upleveling incident response communications

Aug 3, 2023 By Daniel Dobalian In Google Operations

Personalized Service Health sends custom granular alerts about Google Cloud service disruptions, and integrates with incident management tooling.

Read Post

Google Operations

Read more about Introducing Personalized Service Health: Upleveling incident response communications

Understanding Blameless Postmortems

Aug 2, 2023 By Anjali Udasi In Zenduty

Progress often accompanies unforeseen challenges and mishaps in organizations. Traditionally, these setbacks resulted in pointing fingers, hindering progress, and creating a negative work atmosphere. However, a "Blameless Postmortems" approach transforms how organizations respond to failure. In this blog, we will delve into the importance of cultivating a blameless postrmortem culture when faced with setbacks.

Read Post

Zenduty

Read more about Understanding Blameless Postmortems

Transnetyx Case Study: Using BigPanda Starter Pack for a 96% email alert reduction within weeks

Aug 2, 2023 By Elli Dugger In BigPanda

Transnetyx is an automated genotyping company dedicated to providing biomedical researchers with faster, easier, and more accurate results worldwide.

Read Post

BigPanda

Read more about Transnetyx Case Study: Using BigPanda Starter Pack for a 96% email alert reduction within weeks

Introducing Squadcast's Key Based Deduplication

Aug 1, 2023 By Vishal Padghan In Squadcast

We are excited to share another feature update with all our valued customers! We have recently gone live with our Key Based Deduplication feature, enabling you to define dedup keys using customizable templates for configured alert sources. With this feature, you can automatically group similar incidents and effectively deduplicate alerts.

Read Post

Squadcast

Read more about Introducing Squadcast's Key Based Deduplication

Best Practices for SaaS and Network Incident Management

Aug 1, 2023 By Simon Dion In Exoprise

Computer and network systems have (obviously) become vital to business operations. Occasionally, there are SaaS or network incidents and these systems do not operate as needed. Enterprises want to minimize the potential damage and get their systems back online ASAP. Integrated incident management and a strong End User Experience Management (EUEM) platform that provides synthetic and real-user monitoring is a foundation for meeting that objective.

Read Post

Exoprise

Read more about Best Practices for SaaS and Network Incident Management

Why you need an internal status page

Aug 1, 2023 By Isaac Seymour In Incident.io

When we launched incident.io Status Pages a few months ago, we stressed the importance of communicating clearly with your customers about ongoing issues. To help with this, we spent a lot of time carefully designing a status page that’s easy to understand for everyone - whether they come from a technical background, work in a different area, or just want to get on with their day.

Read Post

Incident.io

Read more about Why you need an internal status page

Operations | Monitoring | ITSM | DevOps | Cloud

August 2023