Operations | Monitoring | ITSM | DevOps | Cloud

AI-Powered Monitoring with Checkly

Most monitoring tools weren't built for the AI-first world. By nature, traditional monitoring platforms force you out of your natural coding environment and trap you in clunky web interfaces, brittle configuration panels, and rigid APIs. And sadly, when monitoring providers do offer "AI features," it's usually a chatbot bolted onto their existing UI, being nothing more than a pale imitation of the AI tools you’re reading about every day on Hacker News. All this creates friction.

InfluxDB 3 Core: a complete rewrite designed for speed and simplicity

InfluxDB has been a popular time series database for the better part of a decade, and the latest release represents years of work behind the scenes to address several major feature requests users have been asking for since the earliest days of the time series database.

Best Ways to Find Troublesome Containers and Virtual Machines Using Cycle's Portal

The best problems are the ones you never have to deal with. That's why smart teams catch issues early on, before they impact production. Cycle gives great visibility to spot troublesome workloads, control resource usage, and take action before things go sideways.

Opsgenie is shutting down: Complete guide to alternatives in 2025

Atlassian just pulled the plug on Opsgenie. On December 3, 2024, they announced that Opsgenie will reach end-of-life by April 2027. New sales stopped on June 4, 2025, and if you're using the JSM-bundled version, you'll lose access even sooner—October 2025. Here's the kicker: Atlassian wants you to migrate to their fragmented JSM + Compass combo, which splits your incident management across multiple tools. The reality? Teams are frustrated.

Maximizing Uptime: How to Monitor Network Ports

Keeping critical services running smoothly starts with visibility, and that begins at the port level. Whether you're managing a lean environment or a complex network infrastructure, knowing which ports are active, listening, or down can make or break your response time. In this video, we walk through how to fully configure port discovery and monitoring in SL1. You'll learn how to track availability, respond to port failures with automated alerts, and ensure your systems are always one step ahead of potential issues.

How we created a single app to automate repetitive tasks with Datadog Workflow Automation, Datastore, and App Builder

For many organizations, scaling up their systems means incorporating new tools to build out infrastructure, optimize code performance and security, improve communication, and track cost changes. While these changes are necessary to support an increased workload, they often result in a situation where even the most basic tasks involve switching between multiple platforms.

Choosing the Right APM Software: 5 Key Factors to Consider

When applications slow down, users leave, and engineering teams scramble. Whether you're troubleshooting a spike in response times or chasing down intermittent backend failures, Application Performance Monitoring (APM) provides the visibility you need to detect, diagnose, and resolve performance issues before they impact your users or business goals. For engineers, APM isn’t just a convenience - it’s essential. But not all APM tools are created equal.

Balancing Reliability at the Crypto-Finance Frontier with Brian Shaw (Uphold)

Sylvain Kalache sits down with Brian Shaw, Senior Engineering Leader at Uphold, to explore the reliability challenges that arise when operating at the intersection of traditional finance and crypto markets. Brian shares how unexpected market events can create massive traffic spikes, how their platform architecture and Kubernetes setup help them stay resilient, and why Uphold's transparency and regulatory approach make them both trustworthy and a high-profile target.