Operations | Monitoring | ITSM | DevOps | Cloud

Operations Management

The latest News and Information on IT Operations Management and related technologies.

PD Summit21: Responding to Chaos with Gremlin and PagerDuty

Incident response is something you hope to never need, but when you do, you want it to go smoothly and seamlessly. Normally the knowledge of how to handle incidents within your company will be built up over time, getting better with each incident. While tools such as PagerDuty's Major Incidents Application can help you recover quickly, the process you follow is just as important. This documentation will allow you to learn from the start something which has taken us years to build up. Giving you a head start on how to deal with a major incident in a way which leads to the fastest possible incident recovery.

Evolving in CloudOps Maturity? Investing in People and Teams Pays Off

CloudOps is on the up. This is in part due to the rapid acceleration of the shift to cloud that was caused by the pandemic. The shift allowed companies to innovate faster, enjoy greater flexibility and scalability, and become more cost efficient. Many organizations who rapidly adopted cloud or increased their usage now realize that they need to better manage their cloud investments in order to fully embrace these benefits.

HUG Relies on PagerDuty When Healthcare Incidents Arise

The Geneva University Hospital (HUG) is one of the five university hospitals in Switzerland and one of the largest hospitals in Europe. Pierryves Fournier, SRE Team Lead at HUG, explains how PagerDuty and Rundeck help automate his team's incident response process, empowering the right action when seconds matter.

Enabling Faster Incident Response and Mitigating Security Risks in Financial Services

Software is eating the world. Digital Transformation is top of mind for companies looking to meet ever-growing consumer demands and digitize manual processes. This isn’t unique to the technology industry. Ecommerce, finance, healthcare, and other industries are all moving in this direction.

A guide to installing recycling equipment in the factory

While the current health crisis has seen a slowdown across many sectors of the business community, many vital service operations have had to continue in often trying conditions. For manufacturing and industrial operators, as well as those involved in transport and logistics, the need to maintain supply remains a priority. A consequence of heavy demand is the need to deal with waste materials management, and supporting plant and equipment needs to be maintained and upgraded.

Enterprise Alert's Automation Engine: Creating BMC Incidents

Recently we have received a lot of requests for Enterprise Alert to not only alert on critical situations but to also take a proactive approach to initiate, record and track those situations through ITSM tools such as ServiceNow and BMC Remedy. This post will center around what happens when critical systems fail and tickets are not being created in BMC due to a break in the workflow.