Operations | Monitoring | ITSM | DevOps | Cloud

Optimize and troubleshoot AI infrastructure with Datadog GPU Monitoring

As organizations bring more AI and LLM workloads into production, the underlying GPU infrastructure that supports these workloads becomes even more critical in ensuring these workloads remain fast, reliable, and scalable. Inefficient GPU resource usage, for instance, can lead to longer runtimes and reduced throughput, negatively impacting overall model performance. Additionally, idle and underutilized GPUs can quickly drive up costs and lead to needless spending.

How to Monitor Kafka Producer Metrics

Your Kafka producer pushed a million messages yesterday. Nice. But can you tell if they all made it? Or why did latency spike at 2 PM? Producer metrics help you determine that. They expose how long messages take to send, whether messages are getting stuck, and whether retries are piling up. Let’s go over which ones help while debugging and how to monitor them.

The Future of IT Is Human + Agentic: How Zero Ticket IT Is Reshaping Tech Careers

Automation has always stirred up fears of job loss. For IT professionals, the conversation has only grown louder with the rise of AI. But the truth is that the future of IT is not about replacement—it’s about reinvention. For decades, IT has been defined by its firefighting: manually resolving tickets, managing endless alerts, and fielding repetitive service requests. These tasks are ripe for automation, but automation doesn’t eliminate the need for IT talent.

Apache Spark security: start with a solid foundation

Everyone agrees security matters – yet when it comes to big data analytics with Apache Spark, it’s not just another checkbox. Spark’s open source Java architecture introduces special security concerns that, if neglected, can quietly reveal sensitive information and interrupt vital functions.

Implementing Grafana Play privacy policies with Grafana k6: A behind-the-scenes look

Grafana Play is a free and publicly accessible sandbox environment that allows users to explore and learn Grafana without setting up their own instance. Grafana Play comes preloaded with ready-made sample dashboards, and showcases how to work with different data sources, create visualizations, and use advanced Grafana features.

Yes, Sentry has an MCP Server (...and it's pretty good)

Unless you’ve been living under a rock, “MCP” is probably a term you’ve heard thrown around in the AI space. Each of the editors and LLM providers have been racing to add and enhance their MCP support. Sentry was fortunate enough to be included in Anthropics release announcements for MCP.

Cisco and Splunk Strengthen Enterprise Digital Resilience in the AI Era

In an era where hybrid environments and AI-driven innovations redefine enterprise operations, organizations face increasing complexity, disruption, and vulnerability in their systems. To overcome this growing challenge, Cisco and Splunk are working together to harness the power of AI to help customers ensure that digital resilience is an inherent part of their systems.

OWASP CI/CD Part 6: Insufficient Credential Hygiene

This post, part six of our OWASP CI/CD Top 10 series, looks at some of the common risks associated with Insufficient Credential Hygiene. By better understanding the flaws that affect credential hygiene, we can better understand how even the most sophisticated pipelines were compromised.

Getting OpenTelemetry Data Into Graylog

OpenTelemetry is emerging as the common framework for collecting observability data, and for good reason. It’s vendor-neutral, open source, and designed to collect traces, metrics, and logs in a consistent way. But while most of the buzz is around tracing and metrics, let’s not forget: logs are still the backbone of investigation and response. That’s why Graylog now supports native collection of OpenTelemetry data over gRPC.

The truth you can't afford to miss: Listen as your logs spill the tea

When you hear “spill the tea,” you probably think of pop culture, not outages or anomalies. But the origin may surprise you: before it was slang for juicy gossip, ‘tea’ was actually ‘T,’ which represents truth. We know what you’re thinking: “Are you trying to say ‘spilling the tea’ is a good thing?” And yes, that’s exactly what we’re saying, especially when your logs are doing the talking.