Operations | Monitoring | ITSM | DevOps | Cloud

Measure the real impact of AI coding tools on software delivery with Datadog AI Impact

Engineering teams have rapidly adopted AI coding tools, but organizations still struggle to understand their impact. Existing dashboards focus on activity, such as daily active users, acceptance rates, or lines of generated code, but these metrics don’t answer a more important question: Are teams actually shipping more, faster, and with fewer issues?

Your agent can't fix what it can't see

Agents are getting better and better at fixing bugs. They’re even getting better at testing their work, thanks to headless browsers, sandboxes, simulators, etc. But what about the bugs that only show up once you bring in different browsers, languages, extensions, internet speeds, and all the other variables that get mixed in the second you ship to prod? Or all the bugs that only show up when you account for… well, humans being humans and doing weird stuff you didn’t expect them to do?

How to Reduce Help Desk Demand (Hint: It's Not a Help Desk Issue)

Most IT organizations are trying to reduce help desk demand the same way they have for years: by making the help desk itself more efficient. They improve routing, tighten SLAs, expand self-service, and add AI into the support flow. These changes can make the queue move faster, but they do not stop the work from arriving in the first place. The same problems keep finding their way back to IT. Employees lose time to slow devices, unreliable apps, failed updates, access issues, or confusion after a rollout.

What Is Internet Congestion and How to Fix It

Your VoIP calls are choppy. File uploads are crawling. Your team is complaining that the CRM is sluggish, and remote desktop sessions keep freezing. You check your firewall, your switches look clean, and there are no alerts on your LAN. The problem isn't inside your network. It's upstream, and it's happening quietly every day during peak hours.

Preview launch: the Agent Impact Leaderboard and the Business Impact & ROI Dashboard

The Agent Impact Leaderboard and the Business Impact & ROI Dashboard are live in preview inside GitKraken Insights today. We built them because the questions engineering leaders are getting asked about AI shifted faster than the tools to answer them. Here’s what shipped and how to get access.

Lessons From a CI/CD Supply Chain Attack at Grafana Labs

When a compromised GitHub Actions workflow targets your CI/CD pipeline, how do you respond — and what do you change so it never happens again? Nick and David from Grafana Security walk through a real supply chain incident triggered by a pull_request_target misconfiguration, showing exactly what broke, what tools caught it, and what the team rebuilt afterward.

Getting Started with gcx: A CLI for AI Agents and Grafana Telemetry | Demo

AI agents are only as useful as the context they can access. With gcx, your coding agents can connect to Grafana and query real-time production telemetry from your Cloud, Enterprise, or OSS environment. The best part: it avoids the upfront context bloat that can come with loading tools before you even send a prompt. gcx uses a CLI approach, so there’s zero token cost until your agent actually needs to run a query.

You probably don't need private PKI for internal infrastructure

Running your own certificate authority sounds like the responsible choice for internal infrastructure. Distribute your root cert to every machine and issue certs internally. In practice, you spend the next six months chasing down every device, contractor laptop, and vendor console that didn’t get root installed. The warnings come back. And when they do, people click through them, because they always have. There’s a simpler path, and most teams don’t know it exists.

Operator now has Long-Term Support (LTS) version

VictoriaMetrics Operator has been developing at a neck-breaking pace, bringing numerous improvements, features, and fixes to our community. We usually make at least a single release every two weeks. While this rapid iteration cycle is great for delivering fixes and improvements quickly, it can be challenging for administrators managing critical production environments.

Best Practices in the Slack Experience

PagerDuty’s slack experience is evolving to help your teams organize better and resolve incidents faster. Use Triage Channels to collect telemetry and updates from your systems. Create dedicated Incident Channels for coordination and resolution. Give stakeholders the updates they need in Announcements Channels. Everyone in your organization can get the information they need easily.