Operations | Monitoring | ITSM | DevOps | Cloud

The ultimate guide to incident management KPIs and metrics

IT incident management aims to swiftly identify, address, and resolve IT disruptions to restore normal service operations. Tracking IT incident management key performance indicators (KPIs) is a vital step toward minimizing disruptions for customers and users. But there are several different KPI and metrics choices, and it’s not easy to identify the right ones that can drive meaningful improvements in incident management.

Getting started with IT operations automation

Tech companies face a daunting challenge: a staggering 90% of their IT teams are stuck doing mundane, repetitive tasks, leaving only 10% to focus on strategic innovation. Companies know that automation is the solution to these repetitive, low-level incident response actions; however, many need support to begin automating.

What you can't do with Kubernetes network policies (unless you use Calico): Policies to all namespaces or pods

Continuing from my previous blog on the series, What you can’t do with Kubernetes network policies (unless you use Calico), this post will be focusing on use case number five — Default policies which are applied to all namespaces or pods.

Network Latency & How To Improve Latency

Cloud-based services have changed how individuals and businesses get things done. That doesn’t mean it’s all positive — there are some tradeoffs and compromises that come with cloud services and the internet. One major tradeoff is speed. For instance, if your website fails to load within three seconds, 40% of your visitors will abandon your site. That’s a serious dent for anyone doing business online. The culprit here is latency.

Lessons in Incident Response I Learned While Waiting Tables

Before I stumbled into the tech industry (a story for another day), I spent several years in the customer service world as a server and front-of-house manager in restaurants. It was in these jobs that I first honed some critical skills that would later lead me on the path to incident response.

Database Observability Provides the Features Customers Need for Effective Monitoring

I began working with database customers back in the day with VividCortex until it was purchased by SolarWinds. Since then, I’ve had the opportunity to work with tons of our database solution customers as an account manager and now lead our DPM renewals initiative. In these roles, I’ve helped our customers transition from VividCortex to Database Performance Monitor (DPM) and now migrate into Database Observability.

Lumigo Releases 1-Click OpenTelemetry for Microservices Troubleshooting

Lumigo is excited to announce its microservice troubleshooting platform now provides developers and DevOps with the power of OpenTelemetry (OTel) with a single click. Lumigo has long been the leading troubleshooting platform for serverless, but now, users can harness its best-in-class debugging and observability platform for all microservices-based environments.