January 2023

Terraform Time with Jon Bass from Sym

Jan 31, 2023 By PagerDuty In PagerDuty

Jon Bass, CEO of Sym, joins PagerDuty Developer Advocate Scott McAllister on Terraform Time to talk about Sym’s use of Terraform and integration with PagerDuty.

View Video

PagerDuty

Read more about Terraform Time with Jon Bass from Sym

Analytics in Squadcast | Visualize Team and Organization Level Analytics | MTTA MTTR | Squadcast

Jan 31, 2023 By Squadcast In Squadcast

Analyzing incident data plays a key role to do better SRE. Squadcast's Analytics Dashboard helps you analyze the performance of your Organization/ Team, for a given time period. It also gives you more insight into past outages that affected your systems.

View Video

Squadcast

Read more about Analytics in Squadcast | Visualize Team and Organization Level Analytics | MTTA MTTR | Squadcast

OnPage - Never Miss a Critical Alert Again (For IT, Clinical Comm. and Collab. & Crisis Comm.)

Jan 31, 2023 By OnPage In OnPage

OnPage is an Incident Alert Management platform that elevates critical notifications to the right person on call to remediate critical events. With Alert-Until-Read capabilities, dynamic digital schedules, escalation policies, incident reports, and redundancies, OnPage aims to ensure that critical alerts are never missed. OnPage serves many industries including, healthcare, information technology, managed services, IoT, and manufacturing. With over 250+ integrations, the solution extends incident alert management to popular ITSM (ticketing), RMM, monitoring and cybersecurity tools. On the healthcare front, OnPage integrates with popular scheduling, IoT, nurse calls, and EMR systems.

View Video

OnPage

Read more about OnPage - Never Miss a Critical Alert Again (For IT, Clinical Comm. and Collab. & Crisis Comm.)

A Complete Guide to PagerDuty Alternatives

Jan 31, 2023 By Aman In Zenduty

Exploring Options for Incident Management: A Comparison of PagerDuty and Other Tools Effective incident response is crucial for managing operational issues and resolving them in a complex technology environment. With the increasing complexity of systems built from numerous services, it is important for companies to have a way to keep these systems running smoothly.

Read Post

Zenduty

Read more about A Complete Guide to PagerDuty Alternatives

What's New: January 2023

Jan 31, 2023 By PagerDuty In PagerDuty

We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent development and app updates from the product team include Incident Response, PagerDuty® Process Automation, the PagerDuty Mobile App, Integrations, as well as Community & Advocacy Events updates. We continue to help customers further automate to optimize cloud operations and reduce the amount of issues escalated to other teams. Get started now and learn about.

Read Post

PagerDuty

Read more about What's New: January 2023

What are Network Operation Centers (NOC) and how do NOC teams work?

Jan 30, 2023 By Vishal Padghan In Squadcast

Modern-day markets are highly competitive and in order to foster stronger customer relations, we see businesses striving hard to be always available and operational. Hence, businesses invest heavily to ensure higher uptime and to have dedicated teams that constantly monitor the performance of an organization's IT resources. In this blog, we will explore what NOC teams are and why they are important.

Read Post

Squadcast

Read more about What are Network Operation Centers (NOC) and how do NOC teams work?

Create a Status Update Notification Template in less than 2 minutes

Jan 30, 2023 By PagerDuty In PagerDuty

Now generally available! With organization-based templates, companies can now customize and standardize communications based on impact, service areas, and more. This functionality will be also available via API, so teams are able to customize and leverage status update notification templates to fit their needs in any tool or context.

View Video

PagerDuty

Read more about Create a Status Update Notification Template in less than 2 minutes

What is Incident Management?

Jan 28, 2023 By Alloy Software In Alloy Software

Ensuring continuity of IT Services through effective incident management process.

Read Post

Alloy Software

Read more about What is Incident Management?

What are AIOps use cases?

Jan 27, 2023 By BigPanda In BigPanda

The past decade has seen organizations embrace AI and data analytics at scale. In 2022, IBM found that 35% of organizations have embraced AI—a 4% increase from 2021. The trend of AI adoption will continue to play out in the next several years across virtually every organizational function. At the vanguard of this movement is AIOps, which sees AI used to improve IT operations (ITOps).

Read Post

BigPanda

Read more about What are AIOps use cases?

Unified event & alert visibility - across your entire technology stack

Jan 27, 2023 By Interlink In Interlink

Disparate, siloed monitoring tools - no coherent view of threats to service availability? Elevate your approach: Hybrid IT Infrastructure Monitoring brings all your IT event & alert information together for unified visibility.

View Video

Interlink

Read more about Unified event & alert visibility - across your entire technology stack

Signl4 Signup and Mobile App download

Jan 27, 2023 By SIGNL4 In SIGNL4

The Sigl4 customer Journey series walks users through the process's of Signl4 from Signup to Alerts to Settings. Todays video focuses on the step for Signup a new Subscription and downloading the Mobile app

View Video

SIGNL4

Read more about Signl4 Signup and Mobile App download

Signl4 Users and Teams Setup

Jan 27, 2023 By SIGNL4 In SIGNL4

The Sigl4 customer Journey series walks users through the process's of Signl4 from Signup to Alerting to Settings. This video goes over the Adding, Editing, and Removing of Users and Teams inside of Signl4.

View Video

SIGNL4

Read more about Signl4 Users and Teams Setup

What Is IT Mapping and How Can it Prevent the Next Production Incident?

Jan 26, 2023 By OnPage Corporation In OnPage

IT infrastructure mapping is the process of creating a visual topology of a network infrastructure. This mapping process helps understand the geographic and interactive layout of a network, which applications depend on. Using infrastructure mapping for troubleshooting, you can quickly understand the relationship between application issues and hardware issues.

Read Post

OnPage

Read more about What Is IT Mapping and How Can it Prevent the Next Production Incident?

Create Better UX with Incident Response and Service Intelligence

Jan 25, 2023 By xMatters In xMatters

Incidents that impact user experience are some of the most common challenges that IT, security, and operations teams must face. Users have high expectations for application uptime, and organizations are responsible for ensuring applications are available for them. From application performance to user interface design, many factors can affect a customer’s experience—and resulting confidence—in your product’s capabilities.

Read Post

xMatters

Read more about Create Better UX with Incident Response and Service Intelligence

Say Goodbye to the 'Executive Swoop and Poop' with Status Update Notification Templates

Jan 25, 2023 By Hannah Culver In PagerDuty

Incidents are unpredictable, but how you share updates with stakeholders doesn’t have to be. Status Update Notifications Templates help teams streamline communication with internal stakeholders during a major incident. We are excited to announce that this feature has added new capabilities.

Read Post

PagerDuty

Read more about Say Goodbye to the 'Executive Swoop and Poop' with Status Update Notification Templates

DataScan transforms incident response & business continuity tests

Jan 24, 2023 By Noam Morginstin In Exigence

With more than $80 billion of loan collateral in its systems, DataScan is an industry leader in providing solutions for wholesale asset financing and inventory risk management. The company’s InfoSec leadership understood that they needed to take a whole new approach to incident response and to advance its security maturity. Having multiple tools for managing incidents and conducting business was translating into inefficiencies, prolonged resolutions, and stress.

Read Post

Exigence

Read more about DataScan transforms incident response & business continuity tests

5 Exciting Predictions for SRE in 2023

Jan 24, 2023 By Emily Arnott In Blameless

SRE is a field defined by its constant evolution: from Google’s in-house secret recipe, to the hottest new practice for the biggest enterprise orgs, to a diverse and holistic mentality practiced by orgs of all sizes. Earlier this year, we co-sponsored the Catchpoint State of SRE survey, where we took the temperature of SRE where it was. Now, as we did in 2021 and 2020, we’ll turn to the future to speculate on what 2023 will bring for SRE. ‍

Read Post

Blameless

Read more about 5 Exciting Predictions for SRE in 2023

Using AIOps for Better Adaptive Incident Management

Jan 23, 2023 By xMatters In xMatters

An effective incident management strategy is crucial for any business, especially those offering consumer-facing digital services. This is because when incidents occur, they may be easily detected by your users, impact your reputation, and ultimately affect your bottom line. So, to minimize the reach and severity of incidents, your response needs to be swift and effective. One way to ensure your approach meets these requirements is to implement AIOps.

Read Post

xMatters

Read more about Using AIOps for Better Adaptive Incident Management

Runbook Automation as a Baseline for Controllability and Observability

Jan 23, 2023 By Amalya Shnaps In MoovingON

Some of the highest priorities for engineers - from NOC Engineers, DevOps & Site Reliability Engineers - are the automation and optimization of their production environments. Many companies today face tough challenges with their Network Operations Centers (NOCs) or production environments. These challenges fall into the hands of engineering teams.

Read Post

MoovingON

Read more about Runbook Automation as a Baseline for Controllability and Observability

What is Observability Engineering all about? One minute overview.

Jan 23, 2023 By Interlink In Interlink

Observability Engineering: strengthen your capabilities to better understand the health of your business-critical applications and head customer impacting issues issues off at the pass!

View Video

Interlink

Read more about What is Observability Engineering all about? One minute overview.

ITIL vs. ITSM - What's the difference?

Jan 23, 2023 By iLert In iLert

Companies depend on IT services to support their business operations, and to meet the demands of their customers. ITIL (Information Technology Infrastructure Library) and ITSM (Information Technology Service Management) are frameworks to help organizations manage their IT services. While these two do have elements in common, they also have important differences. ITIL is a set of best practices for IT service management which emphasizes the alignment of IT with the needs of the business.

Read Post

iLert

Read more about ITIL vs. ITSM - What's the difference?

How we approach integrations at incident.io

Jan 23, 2023 By incident.io In Incident.io

If you pick a random SaaS company out of a jar and go to their website, chance are they integrate with another tool. Typically, the end goal of integrations is to meet users in the middle by working with other tools they’re already using on a day-to-day. Put another way, integrations are a strategic business decision. But the question remains: why don’t companies just build a tool with similar functionality in order to make the product stickier?

Read Post

Incident.io

Read more about How we approach integrations at incident.io

The Risks Of Using Small Status Page Vendors

Jan 22, 2023 By StatusCast In StatusCast

Servers are down. Employees are scrambling. Customers are upset. The pressure is on. When internal operations are in disarray, and your business is experiencing a service outage, the last thing you need to worry about is the reliability of your incident communication solution. Keeping users informed when services are down is mission-critical, in order to prevent a flood of support requests, which compound the effects of the incident, straining employee productivity and bandwidth.

Read Post

StatusCast

Read more about The Risks Of Using Small Status Page Vendors

What are Webhooks and why should developers use them?

Jan 20, 2023 By Vardhan NS In Squadcast

Webhooks and APIs are a developer-friendly approach to building modern-day web applications. In this blog, we explain what a webhook is, do a detailed webhooks vs. API comparison, and explain why we recommend developers use them with Squadcast.

Read Post

Squadcast

Read more about What are Webhooks and why should developers use them?

PagerDuty and FiberPlane Integration Demo

Jan 20, 2023 By PagerDuty In PagerDuty

Presenter: Aparna Valsala, Solutions Engineer at Fiberplane, Using the PagerDuty and Fiberplane integration, the responding engineer can immediately start the investigation using a predefined and configurable Fiberplane template visible to all while allowing multiple engineers to collaborate on the investigation with complete visibility and context.

View Video

PagerDuty

Read more about PagerDuty and FiberPlane Integration Demo

TTS Dictionary - xMatters Support

Jan 20, 2023 By xMatters In xMatters

The TTS dictionary in xMatters is used to customize how the Text-to-Speech engine pronounces words used in your voice notifications. You can use this to change common acronyms or abbreviations used by your company into something that listeners can understand better when spoken aloud.

View Video

xMatters

Incident Management

Read more about TTS Dictionary - xMatters Support

A powerful new incident analytics experience is now generally available for Enterprise customers

Jan 19, 2023 By Dylan Nielsen In FireHydrant

Automatically measure MTTR, impacted infrastructure, task completion, and more with new incident analytics.

Read Post

FireHydrant

Read more about A powerful new incident analytics experience is now generally available for Enterprise customers

Causes of Data Center Outages and How to Overcome Them

Jan 19, 2023 By Chandana In Infraon

With the increasing computing requirements and complexity of data center systems, unplanned downtime has become a severe threat to enterprises in terms of process violations, revenue losses, and reputational issues. Although data center failures are quite common, it can be difficult to predict every scenario that might have a severe impact on the expansion of your company. Especially when some factors, like a natural disaster, can simply be beyond your control and result in data center outages.

Read Post

Infraon

Read more about Causes of Data Center Outages and How to Overcome Them

APIs Impact on DevOps: Exploring APIs Continuous Evolution

Jan 18, 2023 By xMatters In xMatters

An application programming interface (API) is a set of rules and protocols that enables different software applications to communicate and share data and functionality. The concept of an API has been around for a long time. However, APIs as you know them emerged in the late 1990s and early 2000s with the rise of the internet and web-based services. As more businesses began to offer online services, the need for a standardized way for these services to interact and share data became apparent.

Read Post

xMatters

Read more about APIs Impact on DevOps: Exploring APIs Continuous Evolution

Interlink Software arrives on the Cisco AppDynamics Marketplace!

Jan 17, 2023 By David Arrowsmith In Interlink

We are delighted to share the news that our integration with leading, real-time Application Performance Monitoring (APM) vendor Cisco AppDynamics is now listed on the AppDynamics Marketplace.

Read Post

Interlink

Read more about Interlink Software arrives on the Cisco AppDynamics Marketplace!

How to talk to your executive leadership team about reliability

Jan 17, 2023 By Blameless In Blameless

Product reliability requires investment from all areas of the business. Technology leaders must effectively communicate the implications of service reliability to the rest of the organization. As a leader, how do you prove that a more reliable product is critical to success? Experts from BetterCloud, Machinify and Blameless come together to discuss how to talk to your executive leadership team about reliability in this webinar.

View Video

Blameless

Read more about How to talk to your executive leadership team about reliability

3 tips for reducing stress during incident response efforts

Jan 17, 2023 By Malcolm Preston In FireHydrant

Panic takes time and energy away from swift incident response, leading to second-guessing, a higher likelihood of mistakes, and analysis paralysis. Here are three tips to minimize it.

Read Post

FireHydrant

Read more about 3 tips for reducing stress during incident response efforts

How to talk to your executive leadership team about reliability

Jan 17, 2023 By Blameless In Blameless

View Video

Blameless

Read more about How to talk to your executive leadership team about reliability

The Inevitable - Failures in Distributed Systems

Jan 16, 2023 By Aman Swami In Zenduty

Experiencing failure at scale is as the popular Marvel character Thanos would say “Inevitable”. Memory leaks, software or hardware or network I/O failures are just a few. It’s a problem of simple mathematics, the probability of failing rises as the total number of operations performed increases. With each component used to scale the application, the failure quotient increases. So how do you tackle this so-called “Inevitable” problem that comes with scaling?

Read Post

Zenduty

Read more about The Inevitable - Failures in Distributed Systems

IT Workflow Explanation

Jan 15, 2023 By Interlink In Interlink

IT Workflow Automation serves to automates the execution of IT tasks and processes. This can include everything from provisioning new servers and deploying software updates to monitoring and troubleshooting IT systems. Workflow automation helps organizations reduce the time and effort required to perform these tasks by automating manual processes and eliminating the need for manual intervention. It can also improve the accuracy and consistency of these processes, as there is less room for human error.

View Video

Interlink

Read more about IT Workflow Explanation

10 Points of consideration for investing in an Observability Platform for your organization.

Jan 15, 2023 By Interlink In Interlink

10 Points of consideration for investing in an Observability Platform for your organization: Scalability Can the observability platform handle the volume of data that your organization generates? Compatibility Is the observability platform compatible with your organization's existing systems and technologies? Ease of use Is the observability platform user-friendly and easy for your team to adopt and use?

View Video

Interlink

Read more about 10 Points of consideration for investing in an Observability Platform for your organization.

[PODCAST] Episode 1 Season 2; How to successfully build and defend your 2023 ITOps budget

Jan 13, 2023 By BigPanda In BigPanda

It’s that time of year when ITOps leaders quantify their plans in budgets that must compete with other equally hungry groups for limited corporate resources. How can the thankless task of proactively preventing outages and speeding time to resolution win against funding flashier projects? Real-world facts can make that difference. Some of the major topics Nigel and Craig will discuss is how to help organizations successfully build and defend their 2023 ITOps budget for investments in tooling, headcount, and workflow improvements.

View Video

BigPanda

Read more about [PODCAST] Episode 1 Season 2; How to successfully build and defend your 2023 ITOps budget

PagerDuty Status Pages Enable Real-Time, Proactive Customer Communication During Incidents

Jan 12, 2023 By PagerDuty In PagerDuty

Integrated, Intuitive Feature Saves Time and Money, Aligning Technical and Customer-Facing Teams, Allowing Further Consolidation on to the PagerDuty Platform, and Building Customer Trust During Large-Scale Events.

Read Post

PagerDuty

Read more about PagerDuty Status Pages Enable Real-Time, Proactive Customer Communication During Incidents

5 best incident management tools of 2023

Jan 12, 2023 By incident.io In Incident.io

Put simply, managing incidents—big or small—is good for business. Not only is it a regulatory requirement, but also a factor in your profits. Your customers expect smooth operations, good customer service and protection. A dedicated incident management tool can help protect all of these. While many may think of incidents as an IT or DevOps issue, it’s hard to over emphasize that they can happen in any department.

Read Post

Incident.io

Read more about 5 best incident management tools of 2023

Incident Management Tools - Do I Even Need Them?

Jan 12, 2023 By Aaron Lober In Blameless

Software is hard… Maintaining software reliability is harder than it used to be. Software systems have grown dramatically in complexity, as they’re applied in a wider range of applications and environments. Many of which have become fundamental to the everyday function of our society. On the other hand, the pace of software development and release is also faster than ever. Innovating new features faster than competitors has become the key to success in a rapidly-changing market.

Read Post

Blameless

Read more about Incident Management Tools - Do I Even Need Them?

Managing incidents in a growing organisation - incident.fm

Jan 12, 2023 By Incident.io In Incident.io

In this week's episode, we're joined by Matt Huxtable, CTO at Ziglu (an e-money issuer, offering a variety of digital finance services, particularly well known for its cryptocurrency services). Matt talks about how the engineering team at Ziglu has evolved over time, building an agile culture and why "keep it boring" is his mantra. Chris, Pete and Matt cover how to context switch between solving and communicating during an incident, their most creative incident fixes and why AI isn't ready to solve incidents for us just yet.

View Video

Incident.io

Incident Management

Read more about Managing incidents in a growing organisation - incident.fm

Applying AIOps: Sending alerts from Moogsoft to Mattermost

Jan 12, 2023 By Mattermost In Mattermost

In this video, you'll learn how to send alerts from Moogsoft to Mattermost and leverage AIOps in your incident resolution workflow.

View Video

Mattermost

Read more about Applying AIOps: Sending alerts from Moogsoft to Mattermost

Easy to manage fine-grained access control and roles

Jan 12, 2023 By Kaushik Thirthappa In Spike

A neatly setup access control telling which user can do exactly what on an incident management platform can save a lot of time and hassle in the future. In the past, Spike.sh had only 2 roles - Admin and Member. The only difference in these roles were that only Admins can remove members. It was fairly simple and most users liked it. However, with larger teams coming onboard, it gets a little difficult to control for admins. So, we have empowered the existing system by adding two more roles.

Read Post

Spike

Read more about Easy to manage fine-grained access control and roles

Need your own incident post-mortem template? Here's ours

Jan 12, 2023 By incident.io In Incident.io

Having a dedicated incident post-mortem is just as important as having a robust incident response plan. The post-mortem is key to understanding exactly what went wrong, why it happened in the first place, and what you can do to avoid it in the future. It’s an essential document but many organizations either haphazardly put together post-incident notes that live in disparate places or don’t know where to start in creating their own post-mortems.

Read Post

Incident.io

Read more about Need your own incident post-mortem template? Here's ours

CDI's evolution with BigPanda: from partner to customer

Jan 11, 2023 By Fred Koopmans In BigPanda

CDI’s partnership with BigPanda has catapulted them to the forefront of modern IT operations. Through reselling and implementing BigPanda’s technology for customers, CDI saw the remarkable value of the platform and began to integrate it into their own business. In the process, they’ve become a partner and a customer—leveraging the product to transform their own operations in ways that previously seemed unimaginable.

Read Post

BigPanda

Read more about CDI's evolution with BigPanda: from partner to customer

Introducing PagerDuty Status Pages for Improved Customer Communication and Savings

Jan 11, 2023 By Hadijah Creary In PagerDuty

In 2023, the fight to retain customers will be one of the biggest factors determining whether a business can survive the recession all are predicting. One of the key findings from the 2022 State of Service Report from Salesforce is that great service is at the heart of customer retention: 48% of customers will switch brands for better customer service when something goes wrong, and they view open communication as a key factor in how a customer might gauge the quality of customer service.

Read Post

PagerDuty

Read more about Introducing PagerDuty Status Pages for Improved Customer Communication and Savings

What is incident management? Maximize uptime and minimize disruptions with ServiceDesk Plus

Jan 11, 2023 By ManageEngine In ManageEngine

Incident management is the process of restoring IT services to normalcy as quickly as possible. You can check out our comprehensive guide on incident management to learn more about how you can implement incident management best practices in your organization..

View Video

ManageEngine

Read more about What is incident management? Maximize uptime and minimize disruptions with ServiceDesk Plus

Lessons from the CircleCI Security Incident

Jan 9, 2023 By Quentin Rousseau In Rootly

In some respects, security and reliability are competing priorities. Security controls may reduce reliability, and responding to security incidents may require mission-critical systems to be paused or shut down until they're secure. The recent security incident involving CircleCI, however, shows that it's not always necessary to choose between prioritizing security or reliability.

Read Post

Rootly

Read more about Lessons from the CircleCI Security Incident

How to create a Weekly On-call Schedule for Business & Non-Business Hours | Squadcast

Jan 9, 2023 By Squadcast In Squadcast

In this video, you will understand how to set up Weekly On-call rotational shifts for both Business and Non-business hours on Squadcast.

View Video

Squadcast

Read more about How to create a Weekly On-call Schedule for Business & Non-Business Hours | Squadcast

How to Create Weekly On-Call Schedules in Squadcast | SRE | Squadcast

Jan 9, 2023 By Squadcast In Squadcast

In this video, you will understand how to set up Weekly Schedules for your team's On-Call rotations on Squadcast.

View Video

Squadcast

Read more about How to Create Weekly On-Call Schedules in Squadcast | SRE | Squadcast

How to Create Weekend On-Call Schedule on Squadcast | Squadcast On-call Schedules | Squadcast

Jan 9, 2023 By Squadcast In Squadcast

In this video, you will understand how to set up Weekend On-call rotational shifts on Squadcast.

View Video

Squadcast

Read more about How to Create Weekend On-Call Schedule on Squadcast | Squadcast On-call Schedules | Squadcast

How to Create Schedule Overrides in Squadcast | Override an existing On-Call Schedule | Squadcast

Jan 9, 2023 By Squadcast In Squadcast

In this video, you will understand how to override an existing On-call Schedule in Squadcast.

View Video

Squadcast

Read more about How to Create Schedule Overrides in Squadcast | Override an existing On-Call Schedule | Squadcast

How to Create a Daily On-Call Schedule | On-Call Rotation | Squadcast

Jan 9, 2023 By Squadcast In Squadcast

In this video, you will understand how to create a Daily On-Call Schedule on Squadcast.

View Video

Squadcast

Read more about How to Create a Daily On-Call Schedule | On-Call Rotation | Squadcast

How to adjust Day Light Savings in Squadcast's On-Call Schedules | Squadcast

Jan 9, 2023 By Squadcast In Squadcast

In this video, you will understand how to adjust your Schedule timings to account for Day Light Savings in Squadcast's platform.

View Video

Squadcast

Read more about How to adjust Day Light Savings in Squadcast's On-Call Schedules | Squadcast

2022 BigPanda product year in review

Jan 9, 2023 By Fred Koopmans In BigPanda

The start of a new year often includes reflecting on what you accomplished over the past year and setting new goals for the year ahead. In 2022, BigPanda set big goals to help organizations prevent and resolve IT and service outages through our innovative Incident Intelligence and Automation platform, powered by AIOps. On average, our customers sent us 2.3 billion events and changes per month, with our largest customers by volume sending us approximately 165 million events each.

Read Post

BigPanda

Read more about 2022 BigPanda product year in review

2023 Predictions: Doing more with less

Jan 6, 2023 By BigPanda In BigPanda

As we head into 2023, it’s clear that one of the challenges many businesses will face is figuring out how to do more with less. According to Business Insider, layoffs loom for many industries, including tech. All of this can add up to an increased chance for potential outages and disruptions.

Read Post

BigPanda

Read more about 2023 Predictions: Doing more with less

Why SRE Benefits Your Organization's Teams & Your Customers

Jan 5, 2023 By Emily Arnott In Blameless

Wondering why you should choose SRE for your organization? We will explain what it is and all the benefits it can bring to your organization. What are the benefits of SRE?

Read Post

Blameless

Read more about Why SRE Benefits Your Organization's Teams & Your Customers

Critical Metrics and Alerts in the Continuous Delivery Process

Jan 5, 2023 By Gilad Maayan In OnPage

Continuous delivery is a software development approach in which code changes are automatically staged for production release. A foundation for modern application development, continuous delivery extends continuous integration by automatically deploying code changes to test and production environments after the build phase. When properly implemented, developers have deployable build artifacts that have passed a standardized testing process and can be deployed to environments as needed.

Read Post

OnPage

Read more about Critical Metrics and Alerts in the Continuous Delivery Process

Playbooks: A new superpower for designers

Jan 5, 2023 By Michael Gamble In Mattermost

From one designer to another, you should know why Playbooks is a fantastic addition to your design tool belt. Playbooks were designed with technical workflows in mind, from incident response to release management, but its flexibility makes it a perfect fit for any repeated process. I love it for creating reusable templates of design checklists and an excellent way to do design review sign-off.

Read Post

Mattermost

Read more about Playbooks: A new superpower for designers

Failure Analysis: Engineering incidents are a bigger problem than you think

Jan 5, 2023 By Aaron Lober In Blameless

Engineering incidents can be quite harmful for companies, both in terms of financial costs and reputational damage. In some cases, engineering incidents can even put people's lives at risk, which can have serious legal and moral implications for the company involved.

Read Post

Blameless

Read more about Failure Analysis: Engineering incidents are a bigger problem than you think

How communication can make or break your incidents - incident.fm

Jan 4, 2023 By Incident.io In Incident.io

In this episode, Pete and Lisa discuss why great communication is essential to the success of any incident management process. From keeping your wider team in the loop to minimise disruption, to using customer communication to strengthen your brand when things go wrong, the team share their experiences and top tips for having a transparent incident communication culture.

View Video

Incident.io

Incident Management

Read more about How communication can make or break your incidents - incident.fm

PagerTree Broadcasts

Jan 4, 2023 By PagerTree In PagerTree

PagerTree broadcasts are a great way to send mass messages to multiple teams or users (think of an all hands on deck situation). When using the broadcasts feature you can send one way messages and optionally request a response. PagerTree intelligent on-call alert routing gives teams flexible schedules, escalations, & reliable notifications via email, SMS, voice, chatbots, & smartphone app.

View Video

PagerTree

Read more about PagerTree Broadcasts

How to Avoid Common Software Deployment Challenges

Jan 4, 2023 By xMatters In xMatters

Software deployment is the manual or automated process of making software available to its intended users. It’s often the final—and most important—stage in the Software Development Lifecycle (SDLC). Software deployment is a three-stage process: All software deployments pose challenges, and issues can arise in any of the three stages.

Read Post

xMatters

Read more about How to Avoid Common Software Deployment Challenges

The State of AIOps: A New Years' Message from Chief Moo Phil Tee

Jan 4, 2023 By Phil Tee In Moogsoft

Well, that was fast! Another year has come and gone. It is safe to say 2020, ‘21 and ‘22 were exceptional, and only sometimes for good reasons. But I take heart in society’s steady progress toward digital maturity through it all. Nearly 100% of IT leaders say the pandemic accelerated their organization’s rate of digital transformation.

Read Post

Moogsoft

Read more about The State of AIOps: A New Years' Message from Chief Moo Phil Tee

PagerTree Routers

Jan 4, 2023 By PagerTree In PagerTree

With routers, you can perform complex matching and actions on alerts. PagerTree intelligent on-call alert routing gives teams flexible schedules, escalations, & reliable notifications via email, SMS, voice, chatbots, & smartphone app.

View Video

PagerTree

Read more about PagerTree Routers

PagerTree Notification Rules

Jan 4, 2023 By PagerTree In PagerTree

With notification rules, you can perform custom notification sequences. PagerTree intelligent on-call alert routing gives teams flexible schedules, escalations, & reliable notifications via email, SMS, voice, chatbots, & smartphone app.

View Video

PagerTree

Read more about PagerTree Notification Rules

How JPMorgan Chase uses Grafana and AI to monitor SLOs, SLIs, and more

Jan 3, 2023 By Mary Margaret In Grafana

For the team at JPMorgan Chase, the daily stakes of having a stable system are high. “We are in the business of making sure that trades are executed, and systems are stable and up and running for a positive client experience,” said Askari Imam, VP, Asset Wealth Management (Product and Integration Delivery).

Read Post

Grafana

Read more about How JPMorgan Chase uses Grafana and AI to monitor SLOs, SLIs, and more

A better way: 3 incident response areas prime for automation

Jan 3, 2023 By Robert Ross In FireHydrant

By automating some rote parts of incident response, you reduce decision fatigue and help responders get to solving the problem faster with less stress. In this post, we talk about three areas of the incident response process that are prime for automation.

Read Post

FireHydrant

Read more about A better way: 3 incident response areas prime for automation

Identify and resolve incidents faster with InsightFinder's offering in the Datadog Marketplace

Jan 3, 2023 By Bowen Chen In Datadog

InsightFinder is a SaaS platform that uses AI-backed predictive analytics to predict and prevent production incidents. Using InsightFinder with Datadog, you can quickly identify hidden correlations in your application metrics, logs, and events and address application issues before they devolve into production outages and create customer impact.

Read Post

Datadog

Read more about Identify and resolve incidents faster with InsightFinder's offering in the Datadog Marketplace

Gartner IOCS Blog - Lucid Motors Case Study

Jan 3, 2023 By Adam Blau In BigPanda

Assaf Resnick, CEO and co-founder of BigPanda, sat down with Sanjay Chandra, vice president of information technology at luxury electric automaker Lucid Motors, at Gartner IT IOCS 2022. They discussed Lucid’s unique ITOps journey and how BigPanda helps minimize downtime of critical applications and services. Sanjay is a visionary ITOps leader, responsible for IT, enterprise systems, global infrastructure, operations and security at Lucid Motors.

Read Post

BigPanda

Read more about Gartner IOCS Blog - Lucid Motors Case Study

What is Automated Diagnostics? How to reduce escalations and accelerate resolution with automation

Jan 3, 2023 By PagerDuty In PagerDuty

Join PagerDuty’s Jake Cohen (Senior Product Manager) with RedMonk’s Kelly Fitzpatrick for a conversation and demo on automated diagnostics, process automation, and incident response. It’s all about automation helping first responders determine if there is an issue, which domain experts (if any) should be brought in to assist, and resolving the issue as quickly as possible.

View Video

PagerDuty

Read more about What is Automated Diagnostics? How to reduce escalations and accelerate resolution with automation

PagerDuty and RedMonk Present: What is Automated Diagnostics? Part 1 - Use Case

Jan 3, 2023 By PagerDuty In PagerDuty

View Video

PagerDuty

Read more about PagerDuty and RedMonk Present: What is Automated Diagnostics? Part 1 - Use Case

PagerDuty and RedMonk Present: What is Automated Diagnostics? Part 2 - Demo

Jan 3, 2023 By PagerDuty In PagerDuty

View Video

PagerDuty

Read more about PagerDuty and RedMonk Present: What is Automated Diagnostics? Part 2 - Demo

How communication can make or break your incidents

Jan 3, 2023 By Charlie Kingston In Incident.io

In this episode, Pete and Lisa discuss why great communication (both internally and externally) is essential to the success of any incident management process. From keeping your wider team in the loop to minimise disruption, to using customer communication to strengthen your brand when things go wrong, the team share their experiences and top tips for having a transparent incident communication culture.

Read Post

Incident.io

Read more about How communication can make or break your incidents

Operations | Monitoring | ITSM | DevOps | Cloud

January 2023