VictorOps

victorops

The Technical Guide to Chaos Engineering

Simplicity is dead – accept it. I know it feels wrong, but the only area you can simplify is the customer’s experience when they interact with your organization. Your life in technology will never be as simple as that first “hello world” Python program. Using modern development and deployment best practices, like microservices on cloud-native infrastructure, are exponentially more complex than any previous iteration of application architecture we’ve had in the past.

victorops

The Comprehensive Guide to Monitoring Signals in Production

In some ways, modern application monitoring has become a paradox. On one hand, today’s applications and the environments that host them spew out more data than ever, which theoretically gives IT teams an unprecedented ability to monitor and observe the applications. On the other hand, however, there’s often so much data to parse that gaining meaningful insight through monitoring becomes impossible in practice.

victorops

Making the Most of Holidays While You're On-Call

Software engineers and IT professionals know the pains of being on-call during the holidays all too well. While many parents are woken up at the crack of dawn with kids jumping on their bed, on-call engineers also have to worry about those critical notifications. While the holidays are a great time for family and friends, IT professionals and DevOps engineers also know how stressful they can be.

victorops

The Production Environment Review Checklist

You’ve written code, you tested it and built it. Now, your release is ready to deploy into production. But, is your production environment ready for the release? That’s a question that every IT professional and platform engineer should be asking before accepting a new release – whether the release is an update of an existing app or a totally new deployment.

victorops

Showing the Value of Site Reliability Engineering (SRE)

For a while, the world of Agile software development, continuous delivery and integration (CI/CD) pushed the value of speed over reliability. For years, IT professionals and developers worked together for long periods of time – six months to a year – to create, test and release software to customers. As cloud-based systems and Agile practices became more and more common, developers realized they could provide new services to customers faster than ever before.

victorops

Incident Management in a Complex Serverless Framework

Serverless frameworks can lead to highly efficient, scalable systems that allow developers to build complex software faster and more reliably. Serverless frameworks allow engineering teams to focus on individual functions across multiple applications or microservices and eliminates numerous problems with maintaining physical hardware. Serverless capabilities are also often referred to as Functions as a Service (or FaaS).

victorops

Top Metrics for Measuring DevOps Delivery Value

Software developers and operations teams are constantly improving the way they move code into production and execute tests to maintain consistent delivery of reliable services. But, how do most organizations track the success of organizational changes? When a company adopts DevOps principles, how do they show the value of these changes to the engineering teams and the overall business?

victorops

Healthcare as a Guide for Incident Response and Incident Management

The core of the DevOps movement is about breaking down barriers between developers and operations, allowing both sides to work as a team. This means making sure that everyone has access to the systems they require while enabling cross-tool visibility and collaboration.

victorops

IT Alerting From One-to-Many to One-to-Right

Remember the WUPHF app from “The Office” Season 7 Ep 9? With the advent of notifications, for a long time, it seemed like the best path for any alert was to get all the notifications in all ways, Woof! Still today, the typical NOC operates on the “Woof!” notification strategy. Where everyone in the company receives a notification for all alerts, in all the notifications methods. Which most of the time is just email.

victorops

Configuration Drift: Why It Happens and How It Complicates Life for Incident Response Engineers

Keeping systems constantly up while maintaining total system security is vital to modern businesses. But, speed and consistency sometimes clash. Do you fix a problem fast or do you fix it right? Fast fixes usually win and often cause further systemic problems. And, because system health is so important, every issue becomes a red alert. This leads to alert fatigue, or the “exposure to a high volume of frequent alerts, causing desensitization to critical issues.”