Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Did you know anyone can be affected by IT Downtime?

Discover the hidden risks of IT downtime that affect everyone! Whether you're a tech enthusiast, business owner, or just curious about the digital world, this video is a must-watch. IT downtime is more than just a technical glitch – it's a phenomenon that can impact individuals and businesses alike.

Simplifying Service Dependency With Squadcast's Service Graph

Microservices are fantastic for agility and innovation, but the trade-off is complex service management and ownership. With hundreds of interconnected services, troubleshooting and Incident Response can become a potential blocker. The traditional siloed approach to service ownership and the increasing deployment makes service management more complex.

Navigating Challenges with Precision: A Guide to Remote Incident Response for Data Center Operations Managers

In the era of distributed workforces, the need for effective remote incident response is more critical than ever. This blog serves as a comprehensive guide for data center operations managers, offering insights and strategies to navigate incidents with precision and efficiency, regardless of the geographical location.

Mastering Remote Management and Monitoring: A Guide for Data Center Operations Managers

In the fast-paced world of data center operations, the landscape is constantly evolving, and with the rise of remote work, the challenges and opportunities for operations managers have reached new heights. In this blog, we’ll explore the ins and outs of remote management and monitoring, providing insights and strategies to help data center operations managers navigate this dynamic terrain seamlessly.

Safeguarding Operations: A Comprehensive Guide to Disaster Recovery and Business Continuity for Data Center Managers

In the dynamic world of data center operations, preparedness is key. This blog serves as a comprehensive guide for data center operations managers, exploring the critical aspects of disaster recovery (DR) and business continuity (BC) planning. Learn how to fortify your data center against unforeseen events and ensure seamless operations even in the face of adversity.

The Debrief: Building AI-Related Incidents

Recently we went live with one of our biggest product launches to date AI. And this product was unique in that it was broken up into four smaller projects: So naturally most folks might be wondering: What were the biggest differences between these projects and what went into actually building out each of these features? In this episode, you'll hear from Rob and Isaac, both Product Engineers who played a really critical role in the building out of related incidents, to get a peek behind the curtain.

APAC Retrospective: Learnings from a Year of Tech Outages, Restore: Repair vs Root Cause

As our exploration of 2023 continues from the third-part of our blog series, Dismantling Knowledge Silos, one undeniable fact persists: Incidents are an unavoidable reality for organisations, irrespective of their industry or size. Recent APAC trends show that regulatory bodies are cracking down harder on large corporations for poor service delivery, imposing harsh penalties as a result of the negative consequences.

Finding relationships in your data with embeddings

With the world still working out the limits of LLMs and ever more powerful models being released each month, it’s a little hard to know where to begin. Whether it’s summarising and generating text, building a useful chat assistant, or comparing the relatedness of strings with embeddings, almost all of this now can be done via a few simple API calls. It has never been easier to incorporate these new technologies into your own product.