Operations | Monitoring | ITSM | DevOps | Cloud

How to Manage Complex On-Call Rotations and Schedules

A simple round-robin rotation works well when you have a small team with a single service and predictable incident patterns. It breaks down quickly when you have engineers across three continents, multiple services with different criticality levels, a mix of senior and junior responders, and a team that expects fair, sustainable coverage across weekends, holidays, and different time zones.

AI Dev Tools: What 100K Engineers at Google Really Taught Us

AI developer productivity, agentic workflows, and the lessons learned running engineering tools for 100,000+ software engineers at Google. John Montgomery, CCO at GitKraken, sits down with Asim Hussain, co-founder of Alterion AI and former Google VP of Engineering Productivity, to get real about what AI actually changes for engineering teams in 2025.

Resolve Reels - Ep. 4 - Agent Lab

Episode 4 of Resolve Reels is live! See how Agent Builder helps teams create purpose-built AI agents with the right guardrails, routing logic, and orchestration for enterprise operations. In this episode: Build specialized agents with defined responsibilities Improve routing with conversation starters and guardrails Test and operationalize agentic AI at scale This is how enterprises move toward Autonomous Operations and Zero Ticket IT.

12 IT Infrastructure Best Practices Every IT Leader Should Follow

Why do IT infrastructure issues continue to slow down teams even when tools keep improving? In most IT environments, the challenge is not a single failure. It is a set of ongoing operational gaps that are easy to overlook but difficult to control over time. A few of the common challenges include: In 2026, IT environments are more distributed and fast-changing than before. Hybrid infrastructure, cloud adoption, and strict compliance requirements make consistency harder to maintain.

Keep your Agents Under Control with agent-belt

You’re shipping a product with an AI-facing interface, or embedding AI-facing interfaces across your existing product line – skills your customers trigger, MCP servers their agent reaches for. Indie author or enterprise, your code runs in someone else’s agent runtime, against a model that updates every other day and a CLI that updates every other week. Cursor 2026.05.05-84a231c rolls out. Claude Code 2.1.132 lands the same week. OpenAI bumps gpt-5.5.

How much engineering time is your infrastructure consuming?

Most engineering teams underestimate the time infrastructure demands from them. The hidden cost isn't in provisioning, it's in the accumulated friction of environment drift, manual handoffs, and repetitive infrastructure maintenance that quietly consumes hours your team should be spending on product.

Cloud has a climate cost. Here's our plan to reduce ours.

Cloud hosting is not invisible. Every project deployed, every resource provisioned, every region selected carries a real energy cost, and that energy cost has a climate cost. At Upsun, we've known this for a while. What we're sharing today is where we stand, what we measured, and what we've committed to doing differently from 2026 onwards. Our ambition is calibrated to what we can credibly deliver, and we think being upfront about that matters more than overpromising.

Why SRE agents need orchestration, not just more tools

Single agents are a useful starting point for SRE workflows. They are not where the architecture should end. The first version is simple enough: connect an LLM to a few tools, give it a system prompt, and point it at your infrastructure. It can summarize an alert, pull logs, answer questions, and draft a useful next step. Then the workflow gets real. You add GitHub for runbooks, Kubernetes for cluster state, PagerDuty for incident context, Prometheus for metrics, and Mezmo for telemetry.