Operations | Monitoring | ITSM | DevOps | Cloud

Create and monitor LLM experiments with Datadog

To efficiently optimize your LLM application before pushing to production, you need a comprehensive testing and evaluation framework. By running experiments, you can optimize prompts, fine-tune temperature and other key parameters, test complex agent architectures, and understand how your application may respond to atypical, complex, or adversarial inputs. However, it can be difficult to manage your experiment runs and aggregate the results for meaningful analysis.

Introducing Bits AI SRE, your AI on-call teammate

Getting paged pulls engineers away from meaningful work, yet incident response in many organizations remains manual, reactive, and draining. An alert fires and teams scramble to find the root cause, relying on siloed knowledge, incomplete context, and a few on-call experts who are already stretched thin. The rise of AI coding agents has only intensified this challenge: As teams ship code faster with less human oversight, production systems grow increasingly complex and harder to understand.

Retail's Next Bold Move: Embracing Artificial Intelligence for the Frontline

In the fast-paced world of retail, staying ahead of changing consumer demands is more challenging than ever. As retailers strive to enhance customer experiences and maintain competitiveness, many are turning to the transformative power of artificial intelligence (AI). Zebra Technologies is at the forefront of this movement, offering AI solutions that unlock the potential of frontline operations, allowing retailers to make smarter business decisions.

Cisco and Splunk Strengthen Enterprise Digital Resilience in the AI Era

In an era where hybrid environments and AI-driven innovations redefine enterprise operations, organizations face increasing complexity, disruption, and vulnerability in their systems. To overcome this growing challenge, Cisco and Splunk are working together to harness the power of AI to help customers ensure that digital resilience is an inherent part of their systems.

Yes, Sentry has an MCP Server (...and it's pretty good)

Unless you’ve been living under a rock, “MCP” is probably a term you’ve heard thrown around in the AI space. Each of the editors and LLM providers have been racing to add and enhance their MCP support. Sentry was fortunate enough to be included in Anthropics release announcements for MCP.

How IPM helped a top tech brand catch an OpenAI outage before it became a crisis

Today’s digital businesses are more interconnected than ever. Industry research shows that 74% of organizations now take an “API-first” approach, and the average application is powered by between 26 and 50 APIs. While this accelerates innovation, it also introduces new risks: when an external provider fails, the impact can be immediate and far-reaching.

You Can Build Your Own AI Agent for ITOps-But Should You?

Most internal AI projects for IT operations next exit pilot. Budgets stretch, priorities shift, key hires fall through, and what started as a strategic initiative turns into a maintenance burden—or worse, shelfware. Not because the teams lacked vision. But because building a production-grade AI agent is an open-ended commitment. It’s not just model tuning or pipeline orchestration. It’s everything: architecture, integrations, testing frameworks, feedback loops, governance, compliance.