May 2023

How to build organizational resilience: six proven steps

May 31, 2023 By Everbridge In Everbridge

In today’s world, where natural disasters, terrorist threats, and cyberattacks are becoming increasingly common, business leaders must prioritize building resilience to ensure the long-term success of their organizations. However, the ways an organization can adapt, recover, and thrive in the face of adversity are often unclear.

Read Post

Everbridge

Read more about How to build organizational resilience: six proven steps

Reduce MTTR and Address the Talent Gap with Logz.io Alert Recommendations

May 31, 2023 By Matt Hines In logz.io

When our CEO and co-founder Tomer Levy delivered his “Observability is Broken” presentation at last year’s AWS re:Invent, he highlighted numerous challenges faced by today’s organizations as they seek to advance their observability practices. Of the six individual points that he noted, two specifically dealt with the current shortage of available engineering expertise, with another two focused on data overload.

Read Post

logz.io

Read more about Reduce MTTR and Address the Talent Gap with Logz.io Alert Recommendations

Use incident cycle time to optimize your incident response process

May 31, 2023 By Jouhné Scott In FireHydrant

Although the causes and solutions for incidents vary widely, most incidents follow a similar timeline from declaration to resolution. We call the period of time it takes to move from one phase or milestone of an incident to the next cycle time.

Read Post

FireHydrant

Read more about Use incident cycle time to optimize your incident response process

SIGNL4 Onboarding: 3rd Party Integration: Webhook & Email

May 30, 2023 By SIGNL4 In SIGNL4

The SIGNL4 Onboarding series walks users through the process's of SIGNL4 from Signup to Alerts to Settings. Todays video focuses on Scheduling users for duty shifts. Learn how to create an app inside of Signl4 to receive events from third party systems. Learn how to create an app and then receive events from those apps to create alerts. This video is packed with helpful tips to help you get the most out of your account.

View Video

SIGNL4

Read more about SIGNL4 Onboarding: 3rd Party Integration: Webhook & Email

Getting started with Squadcast's On-Call Scheduling

May 29, 2023 By Vishal Padghan In Squadcast

We understand that everyone values a simple and straightforward approach when it comes to setting up schedules. We at Squadcast are fully aware of the difficulties involved in creating an on-call schedule from scratch or migrating it to a new platform. Hence we have come up with a blog to assist you in seamlessly setting up your on-call schedule using Squadcast. Our goal is to provide guidance and support to make the process as effortless as possible for you.

Read Post

Squadcast

Read more about Getting started with Squadcast's On-Call Scheduling

Prometheus Blackbox Exporter: Guide & Tutorial

May 29, 2023 By Squadcast Community In Squadcast

Prometheus is a favored open-source monitoring system that collects, stores, and queries metrics from various sources. In Prometheus, an exporter is a component that collects and exposes metrics in a format Prometheus can scrape. The Prometheus Blackbox Exporter is designed to monitor “black box” systems with internal workings that are not accessible by Prometheus. It sends HTTP, TCP, and ICMP requests to the external systems and measures their response times and statuses.

Read Post

Squadcast

Read more about Prometheus Blackbox Exporter: Guide & Tutorial

Understanding SLAs, SLOs, and SLIs: What's the Difference?

May 29, 2023 By Anjali Udasi In Zenduty

The SLA definition is - An SLA is a written contract outlining quantifiable service quality standards between a service provider and a client. Typically, it includes response times, uptime, and error reporting.

Read Post

Zenduty

Read more about Understanding SLAs, SLOs, and SLIs: What's the Difference?

Data Shows Outage Time & Costs are Increasing - 3 Solutions You Should Consider

May 29, 2023 By Heather Miller In Circonus

The Uptime Institute recently released its Annual Outage Analysis 2023 report. Overall, the report highlights the increasing costs, frequency, and duration of outages, the prominent role of cloud and digital services in outages, the shortcomings of service providers, and the need to address human error and management failures. It also underscores the ongoing challenges of handling failures in complex distributed architectures.

Read Post

Circonus

Read more about Data Shows Outage Time & Costs are Increasing - 3 Solutions You Should Consider

10 Incident Management Best Practices

May 29, 2023 By Diana Bocco In Uptime Robot

Before we dive into the nitty-gritty of incident management, let’s look a bit closer at the actual meaning of ‘incident.’ In the world of IT service management, the official definition for ‘incident’ is an “unplanned interruption to an IT service or reduction in the quality of an IT service.” Whether that means a slowdown in response time or a total system crash, you’re looking at an incident.

Read Post

Uptime Robot

Read more about 10 Incident Management Best Practices

The Swedbank Outage shows that Change Controls don't work

May 29, 2023 By Mike Long In Kosli

This week I’ve been reading through the recent judgment from the Swedish FSA on the Swedbank outage. If you’re unfamiliar with this story, Swedbank had a major outage in April 2022 that was caused by an unapproved change to their IT systems. It temporarily left nearly a million customers with incorrect balances, many of whom were unable to meet payments.

Read Post

Kosli

Read more about The Swedbank Outage shows that Change Controls don't work

Hello World

May 28, 2023 By Kaushik Thirthappa In Spike

It feels great writing this. It's hard to believe that we have been working on Spike.sh full-time for 3 years now. It's been the most rewarding experience of my life. A big thank you to all of our users and your constant feedback, which has only made Spike.sh better month on month. We are - Over the years, we have always kept our heads down and built. During this entire process, we have learnt a huge deal of things when it comes to incidents and how they are being managed.

Read Post

Spike

Read more about Hello World

Admin Panel - Security Settings - xMatters Support

May 26, 2023 By xMatters In xMatters

Keeping your security settings up-to-date is extremely valuable in making sure your specific company security regulations are met. You can specify what protocols you want to have so that you feel secure with the level of protection on your devices.

View Video

xMatters

Incident Management

Read more about Admin Panel - Security Settings - xMatters Support

Manan Verma's First Major Incident | Incidentally Reliable Podcast Ep.2

May 26, 2023 By Zenduty In Zenduty

Catch Manan Verma talk about his First Major Incident, Reliability at Unicorns and the Cost of Incidents for different stakeholders.

View Video

Zenduty

Read more about Manan Verma's First Major Incident | Incidentally Reliable Podcast Ep.2

Debug State Capture for Traditional Infrastructure & Apps

May 25, 2023 By Justyn Roberts In PagerDuty

In our previous blogs on Capturing Application State and using Ephemeral Containers for Debugging Kubernetes, we discussed the value of being able to deploy specific tools to gather diagnostics for later analysis, while also providing the responder to the incident the means to resolve infrastructure or application issues.

Read Post

PagerDuty

Read more about Debug State Capture for Traditional Infrastructure & Apps

5 Immediate Business Benefits of Leveraging Domain-Agnostic AIOps

May 25, 2023 By Moogsoft Team In Moogsoft

Legacy systems and point solutions are part of any business. And while they have their history and benefits, it’s critical to find a balance for your organization. IT teams have been acclimated to disparate event management and monitoring tools. Now, with massive and rapidly increasing data flow, this disconnect is slowing and paralyzing IT teams.

Read Post

Moogsoft

Read more about 5 Immediate Business Benefits of Leveraging Domain-Agnostic AIOps

The Ultimate Guide to Automating and Mobilizing Your Secops Processes with Derdack SIGNL4 and Microsoft Sentinel

May 24, 2023 By emily In SIGNL4

The threat and security landscape is becoming increasingly cluttered. As incidents increase, so do alerts and notifications, leading to too many alerts and too few hours to address them. Many businesses work remote and with the ever-present smartphones, we are always on the go. Yet it is essential that security teams receive and prioritize meaningful threats, but that task is easier said than done.

Read Post

SIGNL4

Read more about The Ultimate Guide to Automating and Mobilizing Your Secops Processes with Derdack SIGNL4 and Microsoft Sentinel

Updating Your Tools for API Scopes

May 24, 2023 By Mandi Walls In PagerDuty

The PagerDuty REST API provides 200+ endpoints for users to programmatically access objects and workflows in the PagerDuty platform. Teams leverage these APIs to streamline creating and managing users, teams, services and other components for their environment. Up until now, access to the REST API has been authorized and authenticated via API Keys.

Read Post

PagerDuty

Read more about Updating Your Tools for API Scopes

How Runbook Automation can Simplify CloudOps Use

May 23, 2023 By Avi Shalisman In MoovingON

.Organizations in every industry continue their transition to cloud services, and while this may be a step forward in general, it does bring with it its own unique set of challenges. Cloud use, and in particular CloudOps, relies on a complex and intricate infrastructure which is difficult to manage and maintain, and it's a critical part of keeping a business' networks functioning. This makes finding a way to simplify the use of CloudOps a top priority for many businesses, but does a solution exist?

Read Post

MoovingON

Read more about How Runbook Automation can Simplify CloudOps Use

Exploring Key Concepts of Site Reliability Engineering (SRE)

May 23, 2023 By Anjali Udasi In Zenduty

Site Reliability Engineering is a process of automating IT infrastructure functions, including system management and application monitoring using software tools. It is used by businesses to guarantee that their software applications are reliable even when they receive frequent upgrades from development teams. SRE allows engineers or operations teams to automate the activities that are traditionally performed by operations teams manually to manage production systems and handle issues.

Read Post

Zenduty

Read more about Exploring Key Concepts of Site Reliability Engineering (SRE)

PagerDuty via Terraform - Intro to API Scopes

May 23, 2023 By PagerDuty In PagerDuty

PagerDuty Innovation Software Engineer and Terraform Provider Maintainer José Antonio Reyes talks about improvements to the PagerDuty API with API Scopes and previews its implementation on PagerDuty’s Terraform Provider.

View Video

PagerDuty

Read more about PagerDuty via Terraform - Intro to API Scopes

Why an incident response plan is a security must-have for every organization

May 23, 2023 By Noam Morginstin In Exigence

“By failing to prepare, you are preparing to fail. Preparation prior to a breach is critical to reducing recovery time and costs.” (RSAConference) For 83% of companies, a cyber incident is just a matter of time (IBM). And when it does happen, it will cost the organization millions, coming in at a global average of $4.35 million per breach. The damage isn’t only financial, nor solely related to customer loyalty and brand equity.

Read Post

Exigence

Read more about Why an incident response plan is a security must-have for every organization

PagerDuty Launches New Innovations to Reduce Tool Sprawl and Optimize Operations

May 23, 2023 By Ariel Russo In PagerDuty

The number of tools used by distributed teams to manage incidents has multiplied over the years, leading to a valley of tool sprawl. Throw in manual processes and you’ve got too much toil and multiple points of failure. Maintaining disparate tools and systems isn’t just unwieldy, it’s expensive. Our latest capabilities add to the PagerDuty Operations Cloud to make it easier than ever for teams to consolidate their incident management stack.

Read Post

PagerDuty

Read more about PagerDuty Launches New Innovations to Reduce Tool Sprawl and Optimize Operations

Our social resurgence: activating our social media presence to revamp Incident Management

May 19, 2023 By Kaushik Thirthappa In Spike

Over the past year, Spike.sh social media activity has been null. As a bunch of shy nerds in a small team working remotely across the world, we really never bothered with social media and our presence on it. We always kept our heads low and maneuvered around it. But no more. As of today, we are coming back on social media channels like LinkedIn, Twitter, and Reddit as well.

Read Post

Spike

Read more about Our social resurgence: activating our social media presence to revamp Incident Management

Learn How PagerDuty Customers Save Money and Achieve Fast ROI

May 19, 2023 By Rachel Schmitz In PagerDuty

Saving time and money is always important, but these days, it’s a mission-critical business imperative. At PagerDuty, we help organizations realize transformational gains in efficiency that drive both immediate financial impact and long-term business success. PagerDuty delivers clear value for any organization at any stage of operational maturity. But you don’t have to take our word for it – the real-life experiences of our customers speak volumes.

Read Post

PagerDuty

Read more about Learn How PagerDuty Customers Save Money and Achieve Fast ROI

How Helpdesks Facilitate Major Incident Management

May 19, 2023 By Ritika Bramhe In OnPage

Helpdesks serve as the initial line of defense for IT incidents, responsible for facilitating incident management, including logging, categorizing, and prioritizing incidents. In the event of a major incident, the helpdesk plays a crucial role in escalating the incident to the appropriate major incident management (MIM) team. The success of this process relies on the expertise of the helpdesk staff in providing situational context to expedite resolution.

Read Post

OnPage

Read more about How Helpdesks Facilitate Major Incident Management

Building A DevTools Saas Company Today! Incidentally Reliable Podcast | Zenduty

May 19, 2023 By Zenduty In Zenduty

Catch Rajesh Tilwani talking about Building a DevTools SaaS Company, and everything reliability only on the Incidentally Reliable Podcast, live now on all major platforms! About Zenduty: Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle. With the Zenduty API, you can supplement and deploy Zenduty in sync with other tools and services, allowing you to create and update incidents, users, teams, services, integrations, schedules etc. and automate your workflows using simple scripts.

View Video

Zenduty

Read more about Building A DevTools Saas Company Today! Incidentally Reliable Podcast | Zenduty

Establishing Zero Trust out of the box at Enterprise scale

May 18, 2023 By Alex Greer In Blameless

At most enterprises CIOs are already multiple waves into enforcing Zero Trust policy across their processes, configurations and teams. As a DevOps Lead, being responsible for juggling user empowerment and adherence to your executive’s policy across many SaaS tools can be tricky. This problem is especially challenging in incident management where highly sensitive data is being shared, incidents rely on multiple different types of team members, and response teams fluctuate from incident to incident.

Read Post

Blameless

Read more about Establishing Zero Trust out of the box at Enterprise scale

Admin Panel - Calendars - xMatters Support

May 18, 2023 By xMatters In xMatters

The calendar section in the xMatters admin tab helps you define custom holidays, company holidays, and site holidays specific to your annual company schedule.

View Video

xMatters

Incident Management

Read more about Admin Panel - Calendars - xMatters Support

The fastest and most robust path to incident declaration from monitoring tools

May 18, 2023 By Joel Smith In FireHydrant

Here’s a crazy question: why do we still require a human to manually declare an incident for the things that we know are incidents? If we have enough confidence to build SLOs and high-severity alert routes for these specific scenarios, why are we still asking a human to confirm it’s an incident and get the assembly process in motion? Isn’t that just another button to push when we could be problem solving instead?

Read Post

FireHydrant

Read more about The fastest and most robust path to incident declaration from monitoring tools

Insights into Observability Tools: Commercial vs. Open-Source

May 17, 2023 By Vishwa Krishnakumar In Zenduty

Observability has become a critical aspect of modern software development and operations, allowing organizations to gain insights into the health and performance of their applications and systems. One of the key decisions when implementing observability is choosing between commercial or open-source tools. We spoke to several professionals who shared their experiences and insights on this topic, shedding light on the pros and cons of each approach.

Read Post

Zenduty

Read more about Insights into Observability Tools: Commercial vs. Open-Source

Process Automation v4.12.0 and v4.13.0 Release Notes

May 17, 2023 By PagerDuty In PagerDuty

Product Managers Jake Cohen and Forrest Evans are back to update us on what’s new in the 4.12.0 and 4.13.0 releases of PagerDuty Process Automation. New in these releases are features to support #Kubernetes automation, managing resources in multiple #AWS accounts, and a new plugin suite for Sensu.

View Video

PagerDuty

Read more about Process Automation v4.12.0 and v4.13.0 Release Notes

What's New in the PagerDuty Operations Cloud? May 2023 Demo Round-up

May 16, 2023 By PagerDuty In PagerDuty

In a one-hour guided tour, watch eight demos of new capabilities coming to the PagerDuty Operations Cloud.

View Video

PagerDuty

Incident Management

Read more about What's New in the PagerDuty Operations Cloud? May 2023 Demo Round-up

Major Incident Management with Zenduty, Grafana, Slack and Zendesk

May 15, 2023 By Anjali Udasi In Zenduty

In the current fast-paced world, businesses are seeking methods to increase their efficiency and simplify their processes. But, there are times when teams are unaware of an issue at the initial stage, leading to a bad customer experience. For example, you are a part of the Infrastructure team, where your primary responsibility is to check resources and notify when they reach their maximum capacity. Let's say due to an anomalous traffic load, our resource CPU utilization goes above 90%.

Read Post

Zenduty

Read more about Major Incident Management with Zenduty, Grafana, Slack and Zendesk

7 Types of Incident Response Tools

May 15, 2023 By OnPage Corporation In OnPage

Incident response tools are software applications or platforms designed to assist security teams in identifying, managing, and resolving cybersecurity incidents. Incident response is a crucial part of an organization’s cybersecurity strategy, making it possible to detect threats, analyze vulnerabilities, respond to attacks, and recover from security breaches. Incident response tools are vital for safeguarding organizations against evolving cyber threats.

Read Post

OnPage

Read more about 7 Types of Incident Response Tools

Welcome To xMatters - Ep 2 - Organizing Your Teams

May 15, 2023 By xMatters In xMatters

Even the most gifted and powerful people could do with a helping hand now and again. Thankfully, they are not alone in the multiverse! xMatters has made the process of organizing your teams and creating a customized on-call schedule as if by magic. This way, when help is urgently needed, the appropriate on-call individual will quickly join the team to save the day. To learn more about organizing your teams with xMatters, check out our tutorial videos on how to get started.

View Video

xMatters

Incident Management

Read more about Welcome To xMatters - Ep 2 - Organizing Your Teams

Forget MTTR - focus on assembly time

May 15, 2023 By FireHydrant In FireHydrant

View Video

FireHydrant

Read more about Forget MTTR - focus on assembly time

Intro to API Scopes

May 12, 2023 By PagerDuty In PagerDuty

PagerDuty Technical Product Manager Nakul Bhagat joins DevOps Advocate Mandi Walls to talk about improvements to the PagerDuty API and API Scopes. API Scopes will be generally available May, 2023..

View Video

PagerDuty

Read more about Intro to API Scopes

How Sony Interactive Entertainment drives better IT operations based on alert data

May 12, 2023 By Elli Dugger In BigPanda

Sony Interactive Entertainment (SIE) is a multinational video game and digital entertainment company owned by global conglomerate Sony. SIE primarily operates the PlayStation brand of video game consoles and products.

Read Post

BigPanda

Read more about How Sony Interactive Entertainment drives better IT operations based on alert data

Insights on Hiring Engineers with Different Tech Stacks

May 11, 2023 By Vishwa Krishnakumar In Zenduty

In the world of software engineering, the choice of programming languages, frameworks, and technologies is constantly evolving. As a result, hiring engineers who have experience in different tech stacks has become a common practice for many companies. However, this practice also raises questions and concerns about the potential challenges and advantages of hiring engineers who work in predominantly different stacks.

Read Post

Zenduty

Read more about Insights on Hiring Engineers with Different Tech Stacks

Learning from incidents is not the goal

May 11, 2023 By Chris Evans In Incident.io

Learning from incidents has become something of a hot topic within the software industry, and for good reason. Analyzing mistakes and mishaps can help organizations avoid similar issues in the future, leading to improved operations and increased safety. But too often we treat learning from incidents as the end goal, rather than a means to achieving greater business success. The goal is not for our organisations to learn from incidents: it’s for them to be better, more successful businesses.

Read Post

Incident.io

Read more about Learning from incidents is not the goal

Status page best practices

May 10, 2023 By Daniel Condomitti In FireHydrant

Although some organizations may hesitate to publicly announce when they have an incident — afraid that acknowledging outages will scare customers away — the opposite is often true. When you proactively communicate with your customers, even during bad times, you have the opportunity to not only build trust but also buy grace during the incident.

Read Post

FireHydrant

Read more about Status page best practices

Will Prioritising Reliability Slow Down Your Deployment? #sre #devops #podcast

May 10, 2023 By Zenduty In Zenduty

Learn About Reliability, SRE and DevOps in our podcast "Incidentally Reliable"

View Video

Zenduty

Read more about Will Prioritising Reliability Slow Down Your Deployment? #sre #devops #podcast

Admin Panel - Location Settings - xMatters Support

May 10, 2023 By xMatters In xMatters

In xMatters, sites, and region settings represent physical locations like street addresses or geographic coordinates. Every user in the system belongs to a single site and it controls some default settings on their profile page, such as their language and time zone. Let’s take a dive into xMatters location settings.

View Video

xMatters

Incident Management

Read more about Admin Panel - Location Settings - xMatters Support

Our Opsgenie integration is now available

May 10, 2023 By Freek Van der Herten In Oh Dear

When we detect a problem with your site we can notify you via mail, a slack message, a webhook, or any of our other notifications channels. For most of our users this is enough, but those work in larger teams often need more flexibility. Today, we are launching our Opsgenie integration, a modern incident management platform.

Read Post

Oh Dear

Read more about Our Opsgenie integration is now available

Automate your DevOps processes, and let go (a little)

May 5, 2023 By xMatters In xMatters

As the demand for instant innovation and real-time delivery of mission-critical processes continues to grow, your organization risks falling behind if it can’t adapt to an automation-centric strategy. To be successful, managers have to loosen the reigns and enable teams to automate their DevOps processes. Automating DevOps processes isn’t an all-or-nothing decision, and implementing automation processes slowly can let teams adapt to the changing environment and let go, little by little.

Read Post

xMatters

Read more about Automate your DevOps processes, and let go (a little)

Squadcast's Improved Slack (V2) Integration | Better Collaboration & Incident Management | Squadcast

May 5, 2023 By Squadcast In Squadcast

This video will give you an overview of the latest improvements supported by the Squadcast-Slack integration, which we hope will help in better collaboration and Incident Management.

View Video

Squadcast

Read more about Squadcast's Improved Slack (V2) Integration | Better Collaboration & Incident Management | Squadcast

Why Incident Management is an Essential Part of Risk Management

May 5, 2023 By OnPage Corporation In OnPage

In any operation or activity, unforeseen happenings can derail progress. The job of a good manager is to try their best to make the hitherto unforeseen visible and planned for. It’s all too easy to find yourself reacting to occurrences that can throw you and the company into turmoil, with frantic fixing on the back foot being the result. The best managers can make it look like they don’t do much.

Read Post

OnPage

Read more about Why Incident Management is an Essential Part of Risk Management

See Global Event Orchestration End-to-End

May 5, 2023 By PagerDuty In PagerDuty

Global Event Orchestration’s powerful decision engine enriches events, controls their routing, and triggers self-healing actions based on event data. Teams can use this functionality across any or all services within PagerDuty. This feature is a continued investment in Event Orchestration, demonstrating PagerDuty’s commitment to providing customers with best-in-class automation capabilities. Check out this live demo from Principal Product Manager Frank Emery.

View Video

PagerDuty

Read more about See Global Event Orchestration End-to-End

Assembly time is where you have the most control of an incident

May 4, 2023 By Robert Ross In FireHydrant

The FDNY EMS Command responds to more than 4,000 calls per day. They range from car accidents to building fires to cats stuck in trees, and responses vary accordingly. Sometimes they might take hours, sometimes they take just a few minutes. With such unpredictable conditions, the FDNY focuses on improving what they call “response time.” That’s the amount of time between a 911 call being made and emergency responders arriving on the scene. This might sound familiar.

Read Post

FireHydrant

Read more about Assembly time is where you have the most control of an incident

Trust shouldn't start at zero

May 4, 2023 By Pete Hamilton In Incident.io

How often have you heard the phrase “trust is earned” in life? While well-meaning, I think this can actually lead to some strange behaviour at work, especially when you’re on a fast growing team. Startups experience a lot of chaos and unknowns your teams need to navigate, so it’s vital to know you can trust the people around you. As you grow, how you set expectations around trust as people join your team can impact your ability to hire, onboard, ship and ultimately, survive.

Read Post

Incident.io

Read more about Trust shouldn't start at zero

How to Manage Customer Support Channels in Slack: A Step-by-Step Plan

May 3, 2023 By Vishwa Krishnakumar In Zenduty

As more and more teams transition to remote work, collaboration tools like Slack have become increasingly popular. Slack's chat-based communication platform makes it easy to keep teams connected and informed, but it can also create challenges when it comes to managing support channels. In this post, we'll explore different approaches to building a Slack-based support system and provide some tips for success.

Read Post

Zenduty

Read more about How to Manage Customer Support Channels in Slack: A Step-by-Step Plan

10 Mistakes to avoid when framing your IT Incident Management Strategy

May 2, 2023 By Shashidhar Reddy In eG Innovations

An IT incident is an unplanned disruption that negatively impacts an IT service. As the importance of IT to the business has increased, the impact of IT incidents has become greater. IT incidents can result in revenue loss, loss of employee productivity, SLA financial penalties, government fines, and more. An effective IT incident management strategy is now essential in every organization. For a business like Amazon whose entire business relies on IT, a single second of slowness can cost over $15,000.

Read Post

eG Innovations

Read more about 10 Mistakes to avoid when framing your IT Incident Management Strategy

Four steps for organizations to proactively address chronic hazards

May 2, 2023 By Everbridge In Everbridge

Global climate change continues to have a profound impact on businesses worldwide, with chronic hazards such as flooding, wildfires, and extreme weather conditions posing a significant risk to industries. As organizations continue to operate in an increasingly interconnected world, they face a growing range of challenges. One such challenge is the impact of chronic hazards on their operations.

Read Post

Everbridge

Read more about Four steps for organizations to proactively address chronic hazards

How to get started with incident management metrics

May 2, 2023 By Jouhné Scott In FireHydrant

Tracking incident metrics can help you discover patterns in the causes and costs of incidents and help you understand brittle parts of your organization. We've seen them help teams zero in on things like: But it can be intimidating to get started. Do you really need metrics if you're a small team or just beginning to formalize your incident management program? I say yes. The key is to start with something manageable and grow.

Read Post

FireHydrant

Read more about How to get started with incident management metrics

How Abbott transformed its incident management process with Workflow Automation

May 2, 2023 By BigPanda In BigPanda

Eliminating errors and streamlining the incident management process are top priorities for many ITOps, NOC, SRE, and DevOps teams. With organizations using multiple tools in their IT stack, manually finding the right information at the right time becomes crucial during incident triage. By automating tasks and workflows, businesses can eliminate manual tasks that are time-consuming, repetitive, and prone to mistakes.

Read Post

BigPanda

Read more about How Abbott transformed its incident management process with Workflow Automation

Debugging Kubernetes with Automated Runbooks & Ephemeral Containers

May 2, 2023 By Jake Cohen In PagerDuty

In our previous blog, we discussed the difficulty in capturing all relevant diagnostics during an incident before a “band-aid” fix is applied. The most common, concrete example of this is an application running in a container and the container is redeployed—perhaps to a prior version or the same version—simply to solve the immediate issue.

Read Post

PagerDuty

Read more about Debugging Kubernetes with Automated Runbooks & Ephemeral Containers

The Rise of ServiceOps: Unifying IT Service Delivery

May 1, 2023 By xMatters In xMatters

With the complex and steadfast growth of IT service delivery processes, organizations and their internal teams have come to rely on several tools in their toolbox to deliver best-in-class products and services. The use of AIOps, AI/ML, and overall automation has shaped modern delivery methods, but what we call this process, and how we grow to advance it, has yet to find a definition that’s universally recognized.

Read Post

xMatters

Read more about The Rise of ServiceOps: Unifying IT Service Delivery

Reflecting on one of the biggest incidents in our history

May 1, 2023 By Luis Gonzalez In Incident.io

We have to come clean. During KubeCon, we experienced an incident that we weren’t ready to discuss until now. This incident caused quite a disruption and, had it been left unresolved, would have had a massive snowball effect. At the time, we didn’t want to raise any alarms, so we kept it quiet while our team rallied to resolve it. And to be honest, most folks probably didn’t even realize that it happened since we moved so quickly.

Read Post

Incident.io

Read more about Reflecting on one of the biggest incidents in our history

It's time to rethink the way you do external comms

May 1, 2023 By incident.io In Incident.io

April was a month to remember at incident.io. Not only did we attend our second conference ever with KubeCon in Amsterdam, but we also very subtly released our brand-new Status Pages product. OK, it probably wasn't subtle. Both moments required months of preparation, feedback loops, iteration, and so much more behind-the-scenes work to get right. So if you ran into us at KubeCon, thank you for stopping by and meeting with our team.

Read Post

Incident.io

Read more about It's time to rethink the way you do external comms

Mastering IT Response Time

May 1, 2023 By Ritika Bramhe In OnPage

In today’s fast-paced digital landscape, businesses heavily rely on their IT departments to ensure smooth operations and deliver exceptional customer experiences. When it comes to IT support, one critical metric stands out: response time. A prompt and efficient response can be the difference between a satisfied customer and a frustrated one. In this blog post, we will explore strategies to improve IT response times, enhance customer satisfaction, and optimize overall productivity.

Read Post

OnPage

Read more about Mastering IT Response Time

Operations | Monitoring | ITSM | DevOps | Cloud

May 2023