Operations | Monitoring | ITSM | DevOps | Cloud

Latest posts

How to Reduce Help Desk Demand (Hint: It's Not a Help Desk Issue)

Most IT organizations are trying to reduce help desk demand the same way they have for years: by making the help desk itself more efficient. They improve routing, tighten SLAs, expand self-service, and add AI into the support flow. These changes can make the queue move faster, but they do not stop the work from arriving in the first place. The same problems keep finding their way back to IT. Employees lose time to slow devices, unreliable apps, failed updates, access issues, or confusion after a rollout.

What Is Internet Congestion and How to Fix It

Your VoIP calls are choppy. File uploads are crawling. Your team is complaining that the CRM is sluggish, and remote desktop sessions keep freezing. You check your firewall, your switches look clean, and there are no alerts on your LAN. The problem isn't inside your network. It's upstream, and it's happening quietly every day during peak hours.

Preview launch: the Agent Impact Leaderboard and the Business Impact & ROI Dashboard

The Agent Impact Leaderboard and the Business Impact & ROI Dashboard are live in preview inside GitKraken Insights today. We built them because the questions engineering leaders are getting asked about AI shifted faster than the tools to answer them. Here’s what shipped and how to get access.

Lessons From a CI/CD Supply Chain Attack at Grafana Labs

When a compromised GitHub Actions workflow targets your CI/CD pipeline, how do you respond — and what do you change so it never happens again? Nick and David from Grafana Security walk through a real supply chain incident triggered by a pull_request_target misconfiguration, showing exactly what broke, what tools caught it, and what the team rebuilt afterward.

Getting Started with gcx: A CLI for AI Agents and Grafana Telemetry | Demo

AI agents are only as useful as the context they can access. With gcx, your coding agents can connect to Grafana and query real-time production telemetry from your Cloud, Enterprise, or OSS environment. The best part: it avoids the upfront context bloat that can come with loading tools before you even send a prompt. gcx uses a CLI approach, so there’s zero token cost until your agent actually needs to run a query.

You probably don't need private PKI for internal infrastructure

Running your own certificate authority sounds like the responsible choice for internal infrastructure. Distribute your root cert to every machine and issue certs internally. In practice, you spend the next six months chasing down every device, contractor laptop, and vendor console that didn’t get root installed. The warnings come back. And when they do, people click through them, because they always have. There’s a simpler path, and most teams don’t know it exists.

Operator now has Long-Term Support (LTS) version

VictoriaMetrics Operator has been developing at a neck-breaking pace, bringing numerous improvements, features, and fixes to our community. We usually make at least a single release every two weeks. While this rapid iteration cycle is great for delivering fixes and improvements quickly, it can be challenging for administrators managing critical production environments.

Best Practices in the Slack Experience

PagerDuty’s slack experience is evolving to help your teams organize better and resolve incidents faster. Use Triage Channels to collect telemetry and updates from your systems. Create dedicated Incident Channels for coordination and resolution. Give stakeholders the updates they need in Announcements Channels. Everyone in your organization can get the information they need easily.

Your developers are using AI agents, your data exposure just multiplied

Your developers are already using AI agents. GitHub Copilot, Cursor, Claude Code. Not just for autocomplete, but to generate features, run test suites, and iterate across branches. Each agent needs a database to work against. And in most organizations, nobody has checked what's actually in that database, or whether it should be there.

What Is Hybrid Cloud Monitoring (And How To Actually Do It Well)

Most IT teams running a real hybrid setup are not short on data. They are short on a place where the data agrees with itself. By the end, you will know what to ask a vendor for, where teams usually trip, and how to scope a proof of concept that does not burn a quarter. Hybrid cloud monitoring is the ongoing collection of telemetry across your on-prem kit and one or more public clouds, treated as one environment instead of two or three. The goal is not just visibility.