Operations | Monitoring | ITSM | DevOps | Cloud

Keep Your Business Stakeholders Updated While You Save the Day

Imagine this: An airline encounters a major IT incident in a data center that affects their ticketing system. Behind the scenes, technical responders are scrambling to diagnose and fix the issue. However, because today’s systems are so complex, this issue is taking longer than expected to resolve, and hours have passed since the system went down. Meanwhile, passengers are stranded and taking their anger out on customer service agents and sharing their frustrations on social media.

Building better software with automated monitoring and alerting

This is a guest article by Dan Holloran from VictorOps – an on-call alerting and incident response tool recently acquired by Splunk. They are experts in incident management. In software development and IT operations, we tend to focus a lot of our time on the delivery and deployment pipeline. But, what happens after you deploy new services? How are you responding to incidents in production and identifying reliability concerns?

Intent-based Capacity Planning and Autoscaling with Kubernetes

Intent-based Capacity Planning is Google's approach to declare reliability intent for a service and then solve for the most efficient resource allocation plan dynamically. Learn how you can start using this approach to effectively manage the reliability of your services running on your Kubernetes cluster.

6 Best Practices For Outstanding Critical Incident Management

"Businesses need to face the inevitability of being hacked at some point. It's not a question of if, but when — and that's why being proactive to minimize the risk is essential." Robert Egan. When a critical incident hits, what happens to an organization without an efficient incident management plan? Essentially, all stakeholders are left "fighting fires," trying to recover their systems, and get their business back up and running.

Four Healthcare Workflows for Better Clinical Communications

Healthcare organizations strive to enhance patient experience, ensuring that patients receive proper treatment at the right time, every time. However, due to antiquated communication tools, such as the pager, this goal is often difficult to achieve for some healthcare providers. Today’s healthcare facilities require an advanced pager replacement solution, integrating with intelligent scheduling systems and EMR solutions for better patient outcomes.

HBO's "Chernobyl": Is there a lesson here for IT incident management?

I’m a big fan of historical TV dramas and last week I finished watching the stunning and shattering HBO TV miniseries about the 1986 Chernobyl disaster. As a monitoring expert and a product manager, I have visited dozens of IT operations centers, control rooms and NOCs, so I couldn’t help but compare them to the Chernobyl control room scenes in the show.

What's All the Fuss About Business Continuity Planning

Digital transformation has created more gateways for vulnerability and risk. So in addition to natural disasters that can impact a business, organizations are faced with cyberattacks that can truly cripple their business. A solid business continuity plan makes sure that your company is ready for whatever may come its way, be it fire, flood, critical technical failure, or a cyberattack.

StatusCast Updates Status Page Service

StatusCast is always working to improve how IT Managers and Helpdesk Teams keep users apprised of system statuses during incidents and scheduled downtime. As a leading provider of corporate and SaaS status pages, we interview users and managers to better understand the status page landscape and use that information to constantly improve our corporate status board service.

How No-Code Integrations Help Incident Management Scale

Do you think no-code is just another buzzword that with no real meaning? Well, maybe it is in some contexts. But if you want an example of how no-code solutions can matter in the real world, look no further than the context of incident management. Let us explain by walking through what no-code solutions mean in the context of incident management, how they work and how they can help teams scale and streamline their operations.