Facility managers, including service technicians, are expected to operate their facilities safely to meet the expectations of customers. They focus on the smooth functioning and maintenance of many components that fall within the scope of their facility. Typical components include roads, pavements, HVAC and plumbing systems. As a facility manager, staying on top of these siloed and geographically dispersed systems can be challenging.
What are the differences between incident management and incident response? The answer varies widely depending on whom you ask.
With 50% of the US adult population vaccinated, there’s a lot to look forward to this summer, life no longer feels like it’s on hold, and we’re fully embracing that. Get your fire hoses ready, 'cause extinguishing incidents just got easier. We’re rolling out a summer full of new integrations, product releases, events, and more.
When you’re feeling the stress and pain around incidents, making the decision to find an incident management tool is a no-brainer. But how do you choose the one that will work for you, your team, and your business? You might be asking yourself: Where do I start? What do I need to know? What questions do I ask? What are the options? How can I be sure we’re choosing the right tool?
Attempting an upgrade or switch to a new ITSM tool is obstacle-ridden for IT directors. From having to address fears surrounding the cost of switching vendors to assessing service management maturity, building a case around why and how an ITSM can advance the business can be a harrowing feat. Thankfully, Info-Tech pulled together this selection guide.
Incidents and outages caused by animals highlight the importance of flexibility and out-of-the-box thinking when it comes to SRE.
Single sign-on (SSO) services provide a unified view into applications, logins and devices through a secure identity cloud. SSO allows users to access SaaS-based applications through one simple login process. We, at OnPage, are excited to announce that we’ve extended our integration catalog to include SSO services like Okta and OneLogin. Through a single sign-on process, OnPage enterprise-level users can access the OnPage dashboard from their Okta and OneLogin accounts.
Streamlining your incident management process is what we do best, and one of the ways we do that is by acting as the connective tissue across all of your applications. We’ve partnered with Checkly to bring you a new integration that empowers you to detect problems and resolve incidents faster.
We’re happy to announce our integration with Google Meet to create incident bridges automatically. Using the power of FireHydrant Runbooks, a Google Meet can be added with fully customizable titles and agendas based on your incident details.
Datadog Notebooks simplify the way teams across an organization find and share knowledge. By bringing together live data and rich Markdown text, Notebooks help teams create powerful, data-driven documents—from runbooks and support playbooks to incident postmortems and data reports. And with collaboration functionalities like real-time editing and commenting, team members can simultaneously make changes to a document and gather feedback along the way.
With so many IT vendors claiming they provide AIOps platforms, how do you understand the differences between them, and decide what flavor of AIOPs to choose for your organization? Join us in a CTO Perspective discussion with Elik Eizenberg, CTO and co-founder at BigPanda, to find the answer. Read the skinny for a brief summary, then either lean back and watch the interview, or if you prefer to continue reading, take a few minutes to read the transcript. Enjoy!
How available is your website, service, or platform? What must you monitor and measure to ensure availability? How do you translate uptime into availability? This chart has numbers that every Site Reliability Engineer (SRE) should know. Below the chart, you will find answers to commonly asked questions about SRE and associated metrics.
Mattermost v5.35 is generally available today. Incident Collaboration: Ad hoc tasks, stakeholder overview, and more (Cloud and E20 Edition). We are excited to release multiple new features for the Incident Collaboration product:
When I asked Charlie for permission to attend this year’s AICon (virtual, natch) I thought it would be a shoo-in; learning’s part of my OKRs after all. But he never makes things easy and his ‘yes’ came with a caveat that’s typical when dealing with him. This time, he claimed he didn’t have the budget for the ticket (a likely story!) and I’d have to find another way to get one.
That's a wrap! We hosted "WTF is Incident Management" on May 12, 2021. We invited four very knowledgeable panelists to discuss how they define incident management, what changes they'd make if they could start again from scratch, how to manage team stress after an incident, and other subjects. Our panelists were: host Matt Stratton (Staff Developer Advocate at Pulumi), Emily Ruppe (Incident Commander at Twilio), Alina Anderson (Sr.
Over time, Enterprise Alert continues to grow and more and more teams are starting to benefit from Enterprise Alert’s reliable alerting. As part of this process, Enterprise Alert almost always becomes a central component of the NOC and has practically trained the NOC admins. For this reason, here in support we rarely have the pleasure of presenting the features of our alarm center.
Enterprise Alert is constantly evolving to provide our customers with new ways to implement event sources and use new features. With version 9, several new features have been implemented that make it easier for customers to create alerts for specific processes and events. These include the new “Website Monitoring” event source.
Enterprise Alert is constantly evolving to provide our customers with new ways to implement event sources and use new features. With version 9, several new features have been implemented that make it easier for customers to create alerts for specific processes and events. One of them is the new event source “Alert Timer”.
A few days ago I had an insightful conversation with one of our customers who inspired me to write this blog. He, like so many other customers, was facing the problem that his Enterprise Alert management overhead was increasing with each new team he added, as he had been managing resources such as event sources, notification channels and alert policies for the new teams as well. His question to us, therefore, was whether he could not also put these management tasks in the hands of the teams.
In a network operations center (NOC), alerts originating from hundreds of servers, application monitoring systems, emails and ticketing services compete to catch a NOC analyst’s attention. NOCs face many challenges in parsing through alerts to identify actionable notifications and mobilize the right response team into action.
Maintaining business continuity when an issue arises has proven to be a challenge many organizations struggle with. A global pandemic being thrown into the mix in Q1 of 2020 (one that many businesses are still navigating through) introduced a new set of problems for both service providers and businesses reliant on those services.
There are several metrics in use to determine incident management success. Two of them are MTTD and MTTR, which we will be discussing in this piece.
Service Level Objectives (SLOs) are a key component of any successful Site Reliability Engineering initiative. The question is, what are SLOs; and how do you determine what your SLOs should be? Once you've done that, how should you use them?
In today’s data-centric world, metrics or numbers define all performance benchmarks. The time between when an event starts and ends shows how well a system can handle and process such events. One of such metrics is MTTR. MTTR usually stands for Mean Time To Resolution, but it has held several meanings over the years. MTTR is a metric used to measure how well a system can bounce back from errors and provide long-lasting solutions.
I don’t know about you, but April traveled at the speed of light. A blink and it happened. Our teams have been working at the same speed throughout one of our favorite months of the year. With an incredible amount of updates, we’ve made our product even more transparent and easier to use. It’s not just our world-class documentation that enables you, it’s also the in-product visualizations and enablement that help guide you without you even realizing it.
Gartner’s latest “Quick Answer” report discusses how clinical communication and collaboration (CC&C) systems can enhance pandemic-related provider and patient engagement. Modern healthcare delivery organizations (HDO) invest in CC&C solutions to simplify communication among care teams consisting of physicians, nurses and critical support personnel. The OnPage team is pleased to be recognized as a vendor in Gartner’s latest CC&C publication.
That’s a wrap! Gremlin hosted Failover Conf 2: Fail Smarter on April 27, 2021. In attendance were over 500 SREs, developers, sales engineers, product managers, DevOps experts, C-level execs, and other reliability pros from around the globe! This year’s conference included discussions around the future of DevOps, strategies for building reliable teams, analyzing human error to create better systems, and more.
Cutting-edge messaging systems simplify communication and collaboration for organizations with complex communication needs. These systems are equipped with secure mobile messaging and a full suite of automation capabilities that can route notifications and voice calls across on-call teams. These platforms simplify on-call management through digital on-call schedules and escalation policies.
Let's all face it, on call work isn't fun. But it can be better. Even if you have to work on call, it would be nice to have at least some of the work done for you, before you drag yourself out of bed at 3am to respond to an incident.
A protracted, exasperating customer service experience popped into my mind while reading this sentence in the Ivanti Voice data sheet: “One of the most frequent customer complaints about call centers is having to repeat information.” Ain’t that the truth. Here’s a brief personal experience.
Coined by Gartner in 2016, the term ‘AIOps’ refers to the combining of big data AI and machine learning to automate and improve IT operations processes. Back then, this very broad definition led to some confusion, with different IT vendors characterizing AIOps differently, depending on what they were actually offering.