Operations | Monitoring | ITSM | DevOps | Cloud

Blog

What Slack Downtime Costs, and What We Can Do About It

This morning, though, all of our backlogs were a little harder to sift through thanks to a Slack outage in Europe and the US. To calm down, some of us might have turned to our Google Home or Chromecast to unwind while the outage hours piled up, only to find those were down too! What a morning!Now that Slack is running again, let’s take a moment to reflect on what the outage means and what we can learn from it.

Introducing the All-New, Reimagined ChangeTower - Here's What's Changed

Meet the new ChangeTower. It’s everything that’s important to you across the internet, in one visual stream. Website content monitoring is about more than just knowing when a change occurs. It’s knowing what changes were made made and how they’re important to you. We’re excited to introduce an all-new, reimagined ChangeTower that helps you filter out the ‘noise’ and stay effortlessly in-the-know across websites.

Check output metric extraction with InfluxDB & Grafana

Sensu is an extremely powerful standalone monitoring framework, but the real beauty of Sensu lies in its ability to harmoniously interact with, support, and instrument other tools to create a customized and complete monitoring solution. Take metrics for example: Sensu offers multiple mechanisms to monitor performance metrics such as check output metric extraction and StatsD.

Using the Content Match feature to detect website defacement

The Content Match feature has the potential to detect and protect against page defacement, as well as fulfilling a few other handy use-cases. Today we’re going to take you through a few of the most common uses of this feature to help you ensure you are getting the most out of it!

CFEngine 3.12.0 LTS Released

Today we are happy to announce the general availability of CFEngine 3.12.0 LTS! This release has a lot of new features, and we are very excited about all the new possibilities you get with CFEngine 3.12.0 LTS. If you are using the previous LTS, 3.10 you will also benefit from all the new features, improvements and testing of the 3.11 release, which you can read more about in the CFEngine 3.11 release post.

6 Reasons Why PagerDuty Engineering Stands Out From the Crowd

The other day, a newer Engineering Manager here at PagerDuty, Dileshni Jayasinghe, started a Slack thread expressing joy at how fantastic our engineering team is after attending a conference with engineering folk from other organizations. She explained that she’d shared our practice of owning what we build with someone—who then responded by gazing off into the distance and saying, “That’s my dream.”

Don't Worry About Your (Con)figure, Have The PI !

Congratulations! You have Foglight installed and it is collecting meaningful performance data. There’s no doubt it is providing a ‘smorgasbord’ of actionable information about your mission-critical environment. Help yourself to delicious servings of baseline data, proactive alerts, and custom dashboards and reports. Now make room for dessert! Foglight is probably best known for its generous helping of PI. PI is an abbreviation for Performance Investigator.

Metrics At Scale: How to Scale and Manage Millions of Metrics (Part 2)

With businesses collecting millions of metrics, let’s look at how they can efficiently scale and deal with these amounts. As covered in the previous article (A Spike in Sales Is Not Always Good News), analyzing millions of metrics for changes may result in alert storms, notifying users about EVERY change, not just the most significant ones. To bring order to this situation, Anodot groups correlated anomalies together, in a unified alert.

6 Things You Need in an IT Incident Management Platform

Your incident management process is greatly impacted by the tools you have available. And technology is key when it comes to gaining visibility and obtaining contextual data. You need tools to send alerts when incidents arise, as well as track activity for compliance reporting purposes. Whether you’re in healthcare, information technology or work at a small MSP – you need a robust incident management platform that gives you results and helps mitigate MTTR.