Operations | Monitoring | ITSM | DevOps | Cloud

Kubernetes for AI Workloads

Kubernetes has been facilitating container orchestration for around a decade for both stateful and stateless application workloads. With the recent rise of AI and the advent of tools like Kubeflow and Argo Workflows, Kubernetes is also becoming a first-class citizen when it comes to running AI workloads. When you are training a model on K8s, you may be tweaking many parameters and have to test each of them one by one.

February 2025 Box Outage: Timeline and Post-Mortem

Box.com is a cloud-based content management and file-sharing platform designed for the enterprise and used by nearly 100,000 companies around the world. When a Box outage strikes, businesses can experience costly disruptions. On February 19, 2025, a disruption in core Box services including uploads, downloads, and the All Files page, affected thousands who depend on the cloud storage and collaboration platform.

How IoT Brands Waste Money #iot #embeddedprogramming

IoT margins are already tight—why make it worse? Many companies are throwing away money on preventable costs like unnecessary RMAs, bloated customer support, and costly technician visits. But there’s a better way: Observability and OTA updates can help reduce churn, cut support costs, and eliminate waste. We just watched a customer slash support tickets by 30% and RMAs by 50% using Memfault’s observability data. These are real numbers, real savings, and real impact.

How to Monitor Snowflake with OpenTelemetry

Snowflake is a powerful, cloud-based data platform designed for high-performance analytics. Whether you're running massive analytical queries, managing structured and semi-structured data, or optimizing data pipelines, visibility into your Snowflake instance is essential. Performance bottlenecks, query execution delays, and unexpected cost spikes can quickly become issues without proper monitoring.

Maximizing Azure Network Insights with VNet Flow Logs

Join Kentik’s Phil Gervasi and Chris O’Brien in this LinkedIn Live replay as they discuss how VNet flow logs in Microsoft Azure boost network observability far beyond what’s possible with NSG flow logs. Learn how easier deployment, comprehensive visibility, and advanced analytics—integrated with AI-driven query capabilities—can help optimize your Azure (and multi-cloud) environment.

Spoiler Alert: How "Zero Day" Might Have Played Out Differently with Teneo and Palo Alto Cortex XDR

This weekend, I binge-watched Netflix’s new series Zero Day, starring Robert De Niro. The series has sparked excitement and curiosity among cybersecurity enthusiasts and political thriller fans alike. As the title suggests, the show revolves around a cyberattack that exploits unknown vulnerabilities—so-called “zero days”—to wreak havoc on critical systems. But what if the organizations targeted in Zero Day had the right cybersecurity strategy in place?