Operations | Monitoring | ITSM | DevOps | Cloud

Two AI agents, one incident: Rocky AI comes to the terminal

A Playwright Check fails at 2 am. The login flow is broken. Until today, that alert triggered a human to get up, open the Checkly dashboard, copy Rocky AI root cause analysis (RCA), and then tell an agent to get to work. There were two AI agents, one incident, and no way for them to talk to each other. The extended checkly checks and new checkly rca CLI commands close that gap. Your coding agent can now pull Rocky AI's analysis into its ongoing work, read the diagnosis, and go fix the code.

VM Migration to Kubernetes: What Breaks and How to Prevent It

Here is what nobody putting together the business case for a VM migration to Kubernetes will tell you upfront: the compute is the easy part. Moving workloads off vSphere and onto Kubernetes is conceptually straightforward. The tooling has matured. The architecture is proven. Compute moves, storage remaps, and the platform team has a plan. The network is where projects quietly stall.

How to run a proof of concept that de-risks your monitoring decision

Part 3, key insights from a fireside chat with Chris Yates. Read part 1 here, and part 2 here. Most database monitoring proof of concepts (POCs) answer the wrong questions. Here's how to structure a proof of concept that genuinely de-risks your vendor decision with the questions to ask during the process. A POC is often treated as the final hurdle in vendor evaluation, but too often, it becomes theatre. A guided tour of the flashiest features, run by one person, under unrealistic conditions.

End-to-End Trace Propagation Across SQS and Lambda with OpenTelemetry

SQS doesn't propagate trace context automatically. You instrument both sides, deploy, and get two disconnected traces. This post shows how to wire them into one waterfall — and the ESM format gotcha that silently breaks it every time. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

5 Best SOC 2 Continuous Monitoring Tools for SaaS: Closing the 20% Manual Evidence Gap

Landing a big-logo customer feels great-until their security questionnaire hits your inbox. For most B2B SaaS teams, SOC 2 compliance is the roadblock. You connect a tool, dashboards turn green, and then stall: about 20% of evidence still needs screenshots, sign-offs, or frantic Slack chases. That last-mile grind drags engineers back into spreadsheets just when the audit seems done.

Why Copilot alone won't fix your business workflows

Microsoft has been pushing Copilot hard over the past year. Between the rebrand of Office to Microsoft 365 Copilot, the launch of Copilot Tasks, and the more recent arrival of Copilot Cowork, there is a clear message: AI is supposed to handle the heavy lifting. For many businesses, though, the reality is more complicated than the marketing suggests. Copilot is a strong productivity tool within its own ecosystem, but expecting it to fix workflows that span multiple disconnected systems is where things start to fall apart.

The Role Played by Artificial Intelligence in Product Design Nowadays

Ever since artificial intelligence became the new normal, building products has also taken a completely different form. Before, designers used to depend on guesses and long testing periods. That isn't the case anymore. AI is able to study data, see the patterns in them and suggest better options. It isn't surprising that it has now become a necessity for several companies.