Operations | Monitoring | ITSM | DevOps | Cloud

How to use Gremlin's Reliability Report

Modern applications can easily include hundreds of discrete services, all of which need to be reliable in order for the application to function correctly. While running tests on a handful of critical services can lead to small reliability improvements, real impact requires testing and increased reliability visibility across your entire organization. That’s the logic behind the new, improved Reliability Reports within Gremlin.

Reliability lessons from the 2025 Cloudflare outage

On November 18, 2025, X, ChatGPT, Shopify, and many other major sites went offline simultaneously. Even Downdetector, Ookla’s popular outage tracking website, briefly went offline. What caused this issue? Why were so many major websites affected by it? And what steps can you take to reduce the impact on your own applications? ‍

Automating Chaos Engineering with Terraform

Automating chaos engineering with Terraform eliminates manual setup across environments by enabling you to version control your entire chaos infrastructure, from service discovery to security governance policies. The Harness Terraform provider supports end-to-end automation including Kubernetes infrastructure setup, custom image registries, Git-based ChaosHub management, and granular security controls that ensure safe experiment execution in production.

Reliability lessons from the 2025 Microsoft Azure Front Door outage

On October 29th, 2025, Azure Front Door suffered an outage that impacted Microsoft services on a global level, including Microsoft 365, Outlook, Xbox Live, Copilot, and more. It also affected Microsoft Azure, meaning companies like Costco, Starbucks, and Alaska Airlines ran into issues for both customer-facing and internal systems. The root of the issue was a misconfiguration in the data plane for Azure Front Door and the Azure Content Delivery Network.

Improve Kubernetes reliability faster with Gremlin and Dynatrace

It’s now easier than ever to start testing Kubernetes with Dynatrace and Gremlin. With a new strategic integration, Kubernetes services set up in Dynatrace are automatically discovered in Gremlin to make testing set up simple and fast. At a time when AI is driving massive expansions in infrastructure and dramatically increasing deployment speed, being able to set up and test new services quickly is more important than ever. ‍