Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

PagerDuty Integration Spotlight: InfluxData

InfluxData is an Open Source Platform built for metrics and events — a platform that is purpose-built for time series data. The essential time series toolkit — dashboards, queries, tasks and agents all in one place. InfluxDB is even more programmable and performant with a common API across OSS, cloud and enterprise editions. Send events to PagerDuty to keep your teams informed. Check out InfluxData’s integration.

Evaluating Splunk On-Call Alternatives

Splunk On-Call (Formerly VictorOps) is a popular incident response and on-call management platform that allows engineering and operations teams to collaborate with ease and resolve issues faster. As part of the Splunk Observability Suite, Splunk On-Call is combined with related products to achieve the goal of bringing monitoring, troubleshooting, and investigation, into a single, comprehensive view — simplifying the process from incident detection to resolution.

PagerDuty Integration Spotlight: LogDNA

LogDNA’s Cloud logging platform helps your DevOps teams find and fix production issues faster so your teams can get back to doing what they do best, building amazing products. Send incident alerts from LogDNA directly to PagerDuty. Check out the LogDNA integration with PagerDuty to get started.

How Service Catalog Increases Productivity

Productivity is defined by measuring the amount of output over a given time frame. However, this discounts the quality of output, which is crucial in moving toward a more complete definition of productivity. Relating to services, increases in productivity generally highlight the amount of feature releases over time. This leaves out the critical measurement of quality compared to quantity. This is where a Service Catalog can greatly enhance true productivity within an engineering organization.

Reliability is not an engineering metric

If you're an engineer reading this, you might be wondering what I mean by the title. You might be a Site Reliability Engineer whose primary responsibility is to maintain the reliability of your company’s product/solution. You might be a software builder, a programmer responsible for building new capabilities and shipping them to production. All of these are important for any business to remain competitive.

Then and Now: Distributed Systems Alerting and Monitoring

Distributed systems are everywhere. Although many teams don’t think of their applications as distributed systems, if they’re developing using container-based microservices and serverless functions instead of a monolith, they’re creating a distributed system. This change also means that monitoring needs are becoming more complex.