Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Datadog On Datadog

At Datadog, over 2,000 engineers deploy and ship new features daily. As a leading observability and security platform used by thousands of companies, ensuring quality and reliability is no small feat. Part of our commitment to excellence lies in our dogfooding culture where our engineering organization is one of the largest and most demanding users of the Datadog platform.

Why Monitoring iManage is Critical for Enhancing End-User Experience in Legal Firms

As a Performance Field Technical Consultant working with customers in the legal industry, my primary focus is to ensure that technology enhances productivity rather than hinders it. Legal professionals rely on iManage as a business-critical application for document management, collaboration, and compliance. However, with the increasing shift to the cloud and integration with platforms like O365, ensuring a seamless user experience has become more complex.

How we responded to a 2+ hour partial outage in Grafana Cloud

On Tuesday, Feb. 18, 2025, we experienced an outage that lasted approximately 150 minutes and impacted roughly 25% of our Grafana Cloud services. To our customers: we are very sorry and more than a little embarrassed that we stepped outside our own processes and advice to cause this. You rely on us to help monitor and troubleshoot your environments, and this type of incident obviously makes it harder for you to do that.

Combine Fixtures & Page Object Models for DRYer Test Code in Playwright

If you're using Playwright for end-to-end testing or synthetic monitoring with Checkly, you've likely considered reusing your test code across different test cases. A common approach for this is using Page Object Models (POMs). However, if you're like me, you might have mixed feelings about POMs—while they help organize your code, they can sometimes feel cumbersome to set up and maintain.

Escaping the technical debt black hole with APM

Technical debt accumulates when short-term solutions lead to long-term software inefficiencies, increasing maintenance costs, slowing development, and degrading performance. To effectively manage technical debt, teams need full-stack observability, from a high-level application view down to code execution and thread-level analysis. Tackling technical debt ensures long-term software sustainability.

How to Set Up Logging in Node.js (Without Overthinking It)

Logging in Node.js might not be the most exciting part of development, but it’s one of the most important. Whether you're troubleshooting bugs or keeping track of how your app is running, good logs make life easier. Let’s break down how to set up logging the right way.

What Is a Status Page Aggregator?

Businesses today rely on multiple cloud services to manage their operations. Whether it's hosted services like AWS, customer relationship tools like Salesforce, or marketing platforms like HubSpot, these services play a crucial role in day-to-day business functions. However, businesses can suffer significant disruptions when a third-party service experiences downtime. A single outage in a critical service can halt operations, causing frustration for both employees and customers.