Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

DEX in IT Routine: How Digital Experience-Driven Decisions Elevate Operational Quality and Results

In a scenario where IT teams face growing pressure to deliver positive business outcomes, relying solely on technical metrics is no longer enough. During the webinar held on March 26, 2026, Leandro Silva and Bob Kruger spoke about how Digital Employee Experience (DEX) - a tangible discipline supported by specialized tools - transforms IT decision-making, resource prioritization, and strategic value delivery for organizations.
Sponsored Post

Increase customer retention & stop leaving money in the shopping cart

We all know the pain and frustration associated with broken software. It's no secret that the internet is rife with broken links, slow pages, and broken shopping carts, often feeling like it's being held together with glue and duct tape. These issues aren't just causing frustration for customers; it costs businesses millions. According to the Consortium for Information and Software Quality, poor software quality cost US companies $2.08 trillion in 2020. Every interaction between a customer and your technology is an opportunity to build or destroy trust.

Level up your Code on Arm and Ubuntu | Ubuntu Summit 26.04

What are the latest developments in Arm tooling on Ubuntu? In this talk, David explores Arm tooling to analyze and optimize workload performance, and how AI-assisted development using agentic AI and static analysis can accelerate porting and tuning applications for the Arm architecture. About David David Haikney is a Technical Product Director at Arm. He is responsible for Arm Performix, a free performance toolkit that helps developers understand and improve real-world performance on Arm architectures.

NVIDIA Earth-2: OSS and Science for AI Weather and Climate | Ubuntu Summit 26.04

Discover how NVIDIA Earth-2 brings open source software and open science to weather and climate forecasting. Niall Robinson (NVIDIA) introduces a new way of making production-ready weather AI fully accessible for organizations to run, fine-tune, and deploy on their own infrastructure: NVIDIA Earth-2.

AI ROI: How to measure and provide the return on AI investments in 2026

Every quarter, the same scene plays out in boardrooms across the Fortune 500. The CEO asks: “What is the return on everything the company is spending on AI?” The CTO talks about productivity gains and developer velocity. The CFO points at a cloud bill that doubled but cannot isolate which line items are AI. The board nods politely and tables the discussion until next quarter, when the same question will produce the same non-answer. (If this sounds familiar, you are not alone. Keep reading.)

CloudZero AI Hub: The nexus of autonomous AI cost control

CloudZero originated as a way to make sense of your cloud costs. Costs spread across bills with billions of line items belonging to resources that might or might not have been tagged (or taggable), spun up by engineers working across teams, on different microservices, features, and products, that served a wide range of customers. Kubernetes. Multi-cloud. Check, check, check.

Your AI App Is Lying to You - Here's How to Fix That #devops #observability #programming

You shipped your AI app. But do you have all the answers? Do you actually know which model ran, how many tokens it consumed, or why it stopped? This is what LLM observability gives you, and most AI engineers are skipping it entirely. I built an SOS detection app and used OpenTelemetry to get full visibility into every single call. Token usage, model version, finish reason, and cost per call all in one place, standardised across any provider. Check out the OpenTelemetry GenAI docs in the link below; there is a lot more you can track than you think.

How to generate real-world load tests using Grafana Cloud k6 and production telemetry

For many development teams, a load test starts with a set of assumptions. You pick 100 virtual users because it sounds reasonable. You ramp for 30 seconds because that's what the tutorial showed. You set a 500ms threshold because it feels like a good target. The test passes, you ship the release, and production falls over at 6 p.m. on a Tuesday because your synthetic load never resembled how real users interact with your application.

The Bug Hiding in Your Production Traffic

Your logs showed 500 errors. The traces showed the dependency graph. Neither showed the actual bug, a DEL control character getting appended to the query string. This is how I found it. In this video I walk through Speedscale BYOC (bring your own cloud): capture real production traffic, store it in your own Elasticsearch cluster inside your VPC, pull it down locally with a single script, and reproduce the exact bug using proxymock. The data never leaves your environment.

21 AI concepts every beginner should know before their first interview

If you’re prepping for your first AI or MLOps interview, the hardest part usually isn’t always the hands-on element. For me, it’s the vocabulary. Interviewers sometimes lob single-word concepts at you (“what’s quantization?”) and watch how far you can carry the thread. The questions sound clear-cut, but each one is really a doorway into a bigger topic, and the interviewer is judging how cleanly you walk through it.