Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Building Auvik Into Your MSP's SOP (Video)

Standard operating procedures—more commonly known as SOPs—are written, step-by-step instructions that describe how to perform a routine activity. While you can create an SOP for anything, an MSP SOP that outlines technical procedures is a well-known path to increasing efficiency in your business. Whether you’ve got existing MSP SOPs you’re interested in updating, or you’re looking for some basic steps to build brand-new SOPs around, you’ve come to the right place.

What's the Perfect IT Support Staff Ratio?

On a fairly regular basis, users will post to Reddit or Spiceworks or another IT forum to ask about the best support staff ratio of techs to users, and what other companies are finding sustainable. The question usually comes from an overworked tech who’s drowning in tickets and trying to understand what’s considered. The answers can get interesting. On Spiceworks, many people responded with details about the environments they support—and the range was notable.

Authors' Cut-Actionable SLOs Based on What Matters Most

SLOs—or Service Level Objectives—can be pretty powerful. They provide a safety net that helps teams identify and fix issues before they reach unacceptable levels and degrade the user experience. But SLOs can also be intimidating. Here’s how a lot of teams feel about them: We know we want SLOs, we’re not sure how to really use them, and we don’t know how to debug SLO-based alerts. Don’t worry, we’ve got your answer—observability!

Telegraf Tips from InfluxDB University Experts

Telegraf is a very powerful open source plugin-based agent that gathers data from stacks, sensors, and systems and sends it to a database. It collects data from an input and sends it to an output, and gives you the option to transform data with aggregators and processors before it reaches its endpoint.

New in Grafana 9.1: Service accounts are now GA

With the Grafana 8.5 release, we introduced the concept of service accounts. Now with the Grafana 9.1 release, we’re making service accounts generally available. This is a project that came out of technical necessity, but it has given us the opportunity to reflect on API tokens and machine-to-machine interaction across Grafana Labs.

How much does RPKI ROV reduce the propagation of invalid routes?

Earlier this year, Job Snijders and I published an analysis that estimated the proportion of internet traffic destined for BGP routes with ROAs. The conclusion was that the majority of internet traffic goes to routes covered by ROAs and are thus eligible for the protection that RPKI ROV offers. However, ROAs alone are useless if only a few networks are rejecting invalid routes.

How the right monitoring tools can bolster operational resilience in finance

The financial services industry has been under increasing pressure during the past several years to view operational resilience and their risk management postures as being symbiotic in the wake of rising operational incidents and increasingly frequent security threats.

Find the root cause faster with Datadog and Zebrium

When troubleshooting an incident, DevOps teams often get bogged down searching for errors and unexpected events in an ever-increasing volume of logs. The painstaking nature of this work can result in teams struggling to resolve issues before new incidents appear, potentially leading to an incident backlog, longer MTTR, and a degraded end-user experience.

Search all apps - understand the impact of an error across your entire tech stack

One of the most requested features for Crash Reporting has been the ability to perform a search across all of your applications rather than by one application at a time (the default behavior). It’s not hard to see why it’s a popular feature request - rather than manually performing the same search across many applications, it would be super handy to perform one search and understand the impact of the search results across all of your applications immediately.