Operations | Monitoring | ITSM | DevOps | Cloud

DOES Cache Rule Everything Around Me? - Using Compression for our Prometheus Cache

Checkly is a key part of a professional developer’s workflow, making it easy to know if your service is up or down, and measure performance. As we integrate with almost any development workflow, we also have Prometheus endpoints to let you use the popular Grafana stack to keep track of your site checks’ status. As large enterprise users grew in usage, their check performance data grew in parallel, and our endpoint started returning occasional 429 status codes.

Why Monitoring as Code Is the Future of Application Reliability for Modern Teams... and how it can save you $1 million!

I recently talked to a customer of Checkly and he shared some thoughts about Monitoring as Code. Let’s call him Karl in this article. Karl and I talked about why Monitoring as Code (MaC) is becoming essential for teams operating at scale. As the Head of Platform Engineering at a major e-commerce company processing millions of transactions daily, his experience shows how MaC solves a lot of the messy challenges that come with traditional synthetic monitoring setups.

How to Run Playwright Test in "Parallel," "Serial," or "Default" Mode

Join Stefan Judis, Playwright Ambassador, as he looks into different Playwright test order execution modes. Learn how to effectively use the "fullyParallel" option and understand the differences between "parallel", "serial" and "default" test case execution. If you have questions or feedback, drop a comment below! And don't forget to subscribe for more Playwright tips!

How LinkedIn Stopped Relying on Users to Report Bugs

When making changes to your production services, it’s important to have a plan for how to detect problems and roll back changes. How many roll out plans would include: “if it breaks, don’t worry, the users will tell us!” But if your monitoring coverage of production services isn’t complete, you’re implicitly relying on your users to tell you when something breaks.

How good is GitHub Copilot at generating Playwright code?

People keep asking us here at Checkly if and how AI can help create solid and maintainable Playwright tests. To answer all these questions, we started by looking at ChatGPT and Claude to conclude that AI tools have the potential to help with test generation but that "normal AI consumer tools" aren't code-focused enough. High-quality results require too complex prompts to be a maintainable solution.

Prometheus Blackbox Exporter vs Kuberhealthy for K8s monitoring

We all implement tools to monitor our nodes and keep our entire cluster up and running. But how often do updates, failures, or errors mean that users suffer outages, even though our status boards look green? As Kubernetes has enabled more complex microservice architecture, the gap between the state of the dashboard, and the health of services for the user, has grown wider.