Operations | Monitoring | ITSM | DevOps | Cloud

Find and Fix Fastify Slowdowns with AppSignal for Node.js

In part one of this series, we set up basic performance monitoring for our Fastify application using AppSignal and explored key performance indicators. Now that we have our monitoring foundation in place, it's time to leverage these insights to actively improve application performance. You'll learn how to detect performance regressions, find optimization opportunities, and implement custom instrumentation with OpenTelemetry.

Sliding Through Log-Time Space

This post kicks off a new series written by the Graylog Development Team. In these updates, we’ll highlight the features and fixes that make daily work in Graylog smoother. We want to show the work we care so much about and present the challenges we faced and overcame. Today, we’re starting with one of those minor but functional enhancements: Graylog time-range stepping.

Jira Service Management (JSM) Review for Alerting (2025)

Atlassian is shutting down OpsGenie. New sales stopped on June 4, 2025, and the platform will be completely offline by April 5, 2027. As an OpsGenie user, you now face a critical decision: Migrate to Jira Service Management (JSM), Atlassian’s recommended path, or choose a different solution. And if you’re not sure JSM is the right fit for your team’s alerting needs, this review will help you decide. I signed up for JSM and put it through real-world testing.

Product Update - Turn Off Alerts, Use Microsoft Teams, and Custom Domains

Over the last few months IncidentHub has added several new features to make it easier to fine tune your alerts. IncidentHub now also integrates with Microsoft Teams and supports custom domains for your public status pages. Let's take a comprehensive look at what's new.

Whose Fault Is It When the Cloud Fails? Does It Matter?

On Monday, October 20th, a significant portion of the digital services we use every day became inaccessible. For hours, banking, communication, and entertainment applications were unavailable. The root cause was later identified as a major outage within Amazon Web Services (AWS), the infrastructure that powers a vast number of online services. The initial response for any business affected by such an event is a frantic effort to diagnose the problem. Is it our application? Is our network down?

Your Root Cause Analysis is Flawed by Design

There’s a nagging feeling of déjà vu that haunts every network operations leader. You invest significant time and resources to resolve a major performance issue. Your best engineers isolate a culprit—a misbehaving load balancer, perhaps—and after a frantic effort, service is restored. You close the ticket, confident the problem is solved. Then, two weeks later, it’s back.

5 Best Practices for Incorporating AI Into Your Team

Honeycomb’s Jessica Kerr and Fred Hebert recently hosted a webinar with Courtney Nash of The VOID where they dug into one of the biggest questions in tech right now: How do we build systems (and teams) that actually learn with AI, not just use it? The conversation was surprisingly optimistic about what happens when we stop treating AI as a productivity tool and start seeing it as a teammate. You can watch the full webinar here, or read on below for a quick recap.

APM in 2026: The New Standard for Business Reliability and Growth

Global IT spending is expected to reach a record $6.08 trillion by 2026, with software investments growing by 15.2%. This shows how critical application performance has become for businesses today. For almost 80% of companies, even one hour of downtime can cost more than $300,000. In a world where every digital experience affects your revenue and brand reputation, keeping your applications performing well is no longer optional.

Sidecar or Agent for OpenTelemetry: How to Decide

Getting telemetry out of a distributed system isn’t the hard part. Getting it out cleanly, without noise, drop-offs, or odd performance side-effects — that’s where things get interesting. Before you worry about processors or storage costs, you need a clear plan for where the OTel Collector should run. Most teams narrow this down to two options: a sidecar that sits next to each service, or a node-level agent that handles data for everything running on the node. Both patterns are solid.