Operations | Monitoring | ITSM | DevOps | Cloud

Sponsored Post

How Right-Sizing Ephemeral Environments Reduces Cloud Costs

Ephemeral environments supercharge development velocity-but if left unchecked, they can quietly drain your cloud budget. The answer? Right-sizing: a strategy that tailors resource allocation to real-world usage. Done right, it can slash cloud expenses by 30% to 70%. Let's dive into how this works-and why more teams are making it part of their CI/CD pipelines.
Sponsored Post

How to Reduce Continuous Monitoring Costs

Continuous monitoring is a crucial practice in the fields of DevOps, cybersecurity, and compliance. It involves the proactive and ongoing process of observing, assessing, and collecting data from various systems, applications, and infrastructure components in real-time or near real-time. Continuous monitoring is closely related to observability, which goes beyond simple monitoring to provide a deep understanding of complex and dynamic systems.

Ribbon Expands Portfolio of DISA JITC-Certified Solutions in Support of U.S. Department of Defense Network Deployments

Ribbon Communications Inc. announces the expansion of its portfolio of Joint Interoperability Test Command (JITC)-certified solutions. The Ribbon Policy Engine Server (PSX), Ribbon Application Management Platform (RAMP), and Ribbon Analytics have been added to the U.S. Department of Defense (DoD) Defense Information Systems Agency (DISA) Approved Products List (APL), reinforcing Ribbon's commitment to delivering secure, mission-critical communications infrastructure.
Sponsored Post

How to Choose the Right Incident Management Tool for Your Team

IT disruptions are inevitable. What separates a resilient organization from the rest is its ability to respond quickly, efficiently, and collaboratively to incidents. The cornerstone of such responsiveness? The right incident management tool. But with a market flooded with tools, each promising to revolutionize your workflows, how do you pick the one that truly fits your team's needs? In this blog, we'll break down the key factors to consider when selecting an incident management tool, ensuring you make an informed decision that enhances your team's effectiveness and reliability.

A Practical Guide to Python Application Performance Monitoring (APM)

When your Python app starts slowing down, maybe queries are taking longer, memory keeps creeping up, or API calls are lagging—basic server metrics won’t tell you why. You need to see what’s happening inside the application itself. That’s the role of Application Performance Monitoring (APM). It gives you a breakdown of database queries, external API calls, memory usage, error rates, and more, so you can connect the dots between code and performance.

The Journey to Zero Ticket IT with Agentic Automation

See how RITA, Resolve’s intelligent AI agent, instantly resolves IT issues before they become tickets. In this live demo, you'll get a front-row seat to how RITA automates routine service desk requests, helps technicians troubleshoot faster, and brings the Zero Ticket IT vision to life. Hosted by Ian Coppock and demoed by Derek Pascarella, this is automation in action.

Redgate Monitor Adds Support for Oracle Multitenant

Redgate Monitor now gives you full visibility into activity across Oracle Multitenant environments. With metrics and alerts at both the Container Database (CDB) and Pluggable Database (PDB) level, you can quickly pinpoint performance issues and ensure compliance. With the introduction of Oracle multitenant architecture in Oracle 12c, many teams began consolidating databases into a single Container database (CDB). This helped them to optimize resource usage and simplify patching.

DeepSeek Pricing: Models, How It Works, And Saving Tips

Some teams won’t touch DeepSeek because it’s Chinese. Others are quietly running pilots and rethinking how much reasoning and context they actually need, or can afford. For SaaS teams staring down runaway AI costs, DeepSeek’s mix of open-source freedom, massive context windows, and token rates 10–30X cheaper than OpenAI or Anthropic is tough to ignore. However, DeepSeek pricing comes with cache hits, cache misses, off-peak discounts, that September pricing shift, and more.

Automating GDPR compliance for web applications with CircleCI

Since 2018, the General Data Protection Regulation (GDPR) compliance has been an important milestone in the evolution of privacy laws for web application users across Europe. GDPR requires companies to obtain explicit user consent for data collection and processing, and only for specified, legitimate purposes. It’s a law based on principles of transparency and purpose limitation. This law applies to global companies dealing with EU citizen data, giving individuals control over personal data.

NHibernate Tutorial: Database-First & Model-First Using Visual ORM Designer

Learn how to create and configure an NHibernate model with Entity Developer — Devart’s powerful visual ORM designer for.NET. In this step-by-step tutorial, we’ll cover both Database-First and Model-First approaches, show how to update models, generate mappings and classes, and even produce SQL scripts directly from your model. Whether you’re using Visual Studio or working in VS Code, Entity Developer makes NHibernate development faster, easier, and more reliable.

dotConnect for SQLite: Fast .NET data access with ORM and encryption support

Supercharge your.NET applications with dotConnect for SQLite — the high-performance data provider trusted by thousands of developers. Whether you're building lightweight apps or enterprise-scale solutions, dotConnect delivers secure, encrypted access to SQLite with seamless Visual Studio integration, powerful ADO.NET architecture, and full ORM support, including Entity Framework Core. Chapters: Try it free for 30 days and see why thousands of developers trust dotConnect for SQLite in mission-critical environments.

Unprecedented industry turbulence: leading through change ft. Pat Kua, author of LevelUp newsletter

Engineering leadership has never been more challenging, or more critical. In this episode, Rob sits down with Pat Kua, seasoned technology leader, author of three books including Building Evolutionary Architectures, and creator of the popular Level Up newsletter for technical leaders.

Introducing 400G Ports

Discover why 400G is growing in popularity across industries, and how you can deploy it with Megaport. The Megaport team is excited to introduce 400G ports. This addition to our network offering gives enterprises, cloud builders, and service providers access to ultra-high bandwidth on demand, making it easier than ever to interconnect clouds, data centers, and services at massive scale. If your IT team is already using 100G bandwidth, the need for 400G might creep up more quickly than you think.

Colocated vs Dedicated vs Remote Servers: How to Choose the Right Hosting for Your Business Projects

Businesses have three main options for hosting servers off-premises: colocation, dedicated server hosting, and cloud (remote) hosting. In colocated hosting (colo), the customer owns the server hardware and simply rents space, power and network connectivity in a thirdparty data center. In this model, "you bring or ship your servers" to the provider's facility and lease rack space, power and bandwidthliquidweb.com. By contrast, dedicated hosting involves leasing an entire physical server from a provider.

Windows Security Event Collection for Microsoft Sentinel with Datastream

Collecting Windows Security Events has always been a necessary but difficult job. Traditional methods depend on third-party collectors that must be installed, configured, and constantly maintained. They break, they lag behind updates, and they create unnecessary operational work. At the same time, they often flood Microsoft Sentinel with redundant or irrelevant data, driving up costs and slowing down investigations.

Enhancing Banking Operations with AirDroid Business

This comprehensive Mobile Device Management solution ensures your banking devices operate efficiently and securely. With features like Kiosk Single App Mode, automatic lock time adjustments, and real-time device monitoring, AirDroid Business empowers banks to provide seamless service to their clients.

From Ticket Chasers to Trailblazers: Building a Proactive IT Culture

You know the drill. The service desk queue refreshes, another tsunami of tickets floods in, and your team gears up for another deep, deep dive. Resolution times are king, SLA compliance is queen, and the more tickets you churn through, the more you’re celebrated as a hero of the help desk. Does this sound familiar to you? That’s ticket-centric culture for you.

FinOps For Claude: Your Strategy For Managing Claude API And Anthropic Costs At Scale

Anthropic’s Claude is one of the most powerful and developer-friendly large language models (LLMs) available. But as usage grows, so does cost. Here’s the reality: A single unoptimized development loop or unmonitored QA job can multiply costs 10x overnight. Most teams experimenting with Claude lack the visibility and guardrails needed to prevent runaway costs, especially once usage moves from R&D into production.

You don't have to live with outages and late nights

Outages don’t have to be part of your life and engineers don’t have to burn out being a hero. Spread out your effort and build reliability without the drama. Transcript:  You should be great at dealing with outages, but your customers don't care. There's no medals here. No one should have incentive to be paged. There's nothing good about being in a war room for 10 days or in the holiday season in 12 hour shifts around the clock just in case something happens.

What is Database Monitoring

Database monitoring transforms from a reactive troubleshooting exercise into a proactive optimization strategy when you have the right tools and approaches in place. This blog shares practical ways to choose monitoring solutions, set up observability for different database platforms, and design workflows that scale in modern distributed systems.

We need to talk about HFC-227ea

Data centres often hold the fluorinated gas (f-gas) HFC-227ea, traded under the name FM-200, as an emergency measure to stop fires in data halls and technical space without harming electrical equipment. Robust fire prevention helps avoid devastating human, operational, financial, and environmental consequences, but there’s a problem with this particular gas – if released, it has an immense global warming potential.

Why open source might be the best move for ISVs

Why do ISVs choose to partner with Canonical? Canonical’s promise to deliver secure, open source software empowers Independent Software Vendors (ISVs) to focus on building and improving their products every day. We help vendors meet regulatory compliance, reduce their environmental footprint through containerization, and stay ready for the future.

Automating Connectivity: The Future of Digital Infrastructure

Connectivity powers our world, but the way it’s bought and sold is being transformed. In this episode of Uplink, Michael Reid talks with Ben Edmond, founder and CEO of Connectbase, about building a global marketplace for digital infrastructure. With 4.2 million quotes processed each month and 1.9 trillion rows of connectivity data, Connectbase is revolutionizing how providers and enterprises access real-time intelligence.

How to get fast, easy insights with the Gremlin MCP Server

Chaos Engineering and reliability testing give you visibility into the actual reliability of your services by simulating real-world failure conditions. But what if you could dig into the testing and results data using AI to quickly uncover new insights? That’s the logic behind the Gremlin MCP Server. Released as part of Reliability Intelligence, the Gremlin MCP Server allows you to bring your LLM of choice to explore your Gremlin data and find opportunities to get more out of Gremlin.

Proxmox vs Cycle: Toolkit or Platform?

If you've ever run a homelab, chances are you've tried Proxmox. Its mix of open-source accessibility, strong VM support, and lightweight containers has made it popular among enthusiasts and small IT teams alike. Beyond hobby projects, Proxmox has also found adoption in organizations that value cost efficiency or wanted to avoid locking themselves into VMware's catalog. That adoption has seen some positive movement in the wake of Broadcom's changes to VMware's licensing and support model.

What is Incident Escalation

When incidents strike, your on-call engineer jumps in first. They assess the issue, triage it, and try to resolve it. But sometimes, they can’t solve the problem or aren’t available. That’s when escalation policies step in to find the right backup. In this guide, I’ve explained how escalation policies work, why every team needs them, and how you can set up one. Also, I’ve included ready-to-use templates to help you get started fast.

12 Cloud Cost Optimization Examples For Your Cost Journey

Organizations face increasingly complex cloud environments — from hybrid clouds to multi-cloud deployments — where costs can quickly spiral without real-time visibility and intelligent controls. This is why setting clear goals for cloud cost optimization is necessary to keep your organization proactive. The key to success lies not just in setting goals, however, but in ensuring those goals are clear, realistic, and supported by continuous measurement and actionable insights.

SharePoint Archiving Best Practices for Compliance

SharePoint Online has become the backbone of document management for many organizations. From project files to legal contracts, HR records to financial reports, it holds critical business data that grows relentlessly. But as usage increases, so do two unavoidable challenges: The dilemma? Simply deleting files may reduce storage bills, but it risks non-compliance. Retention policies may satisfy regulators, but they don’t stop your storage from exploding in cost.

Introducing AppJet.ai : a GitHub-native AI that codes full-stack from prompt to deploy

If you ever tried any vibe-coding tool on the market you know they are mostly supercharged NodeJS code editor, full of pre-made components and often unable to really understand your existing code base. We created AppJet for a simple reason: AI is now at a maturity point where it can realistic to use it as a real coding companion.

Introducing "Resolved by Timer"

Today, we are introducing Resolved by Timer. It is a timer you can set on your incidents. When the timer runs out, the incident resolves on its own. Not all incidents need manual attention. Sometimes they just sit on dashboards, adding noise long after they have stopped mattering. And when that happens, Spike also treats them as “open incidents,” which can end up suppressing new alerts if the same problem re-triggers later. Resolve Timer solves both problems.

Console Connect expands in Thailand enabling 13 new data centre locations

Thailand is emerging as one of Asia’s most important digital hubs. Fuelled by large-scale investments in data centres, cloud regions, and network infrastructure, the country is becoming a central link between Southeast Asia and global markets. Recognising this, Console Connect, PCCW Global’s Network-as-a-Service platform, has expanded its footprint by enabling 13 new data centre locations across Thailand, bringing the total to 16 nationwide.

Running Ad-hoc Operations with Puppet Enterprise: Tasks and Plans

Discover how Puppet Enterprise tasks and plans transform your infrastructure management by enabling powerful ad-hoc operations that go beyond traditional desired state management. In this brief overview, learn when and how to leverage these essential tools for running single commands or complex command sequences across your entire infrastructure.

All the As-a-services, Compared

Head spinning with all the “aaS” acronyms floating around these days? Our complete glossary will bring you up to speed. We’re watching the world go as-a-service in real time, and it has made for some heated Reddit threads about how our existence is being monetized at every turn. It’s difficult to refute that individual consumers should have the option to pay a one-off fee for software, platforms, and media as opposed to paying for temporary access.

Secure Streamlit app deployment with AWS Cognito, Streamlit, and CircleCI

As you develop internal tools or public-facing data applications, implementing authentication mechanisms becomes essential. Without authentication, you risk exposing sensitive information or allowing unauthorized access. Fortunately, integrating secure user access does not have to be complex. AWS Cognito provides a straightforward way to handle authentication, user management, and access control across multiple identity providers.

What is Dynamic SQL in SQL Server?

Dynamic SQL in SQL Server is built for scenarios where queries can’t be fully defined in advance. It’s the method of choice when structure depends on user input, variable schemas, or runtime conditions, cases where static SQL falls short. However, without proper structure, this flexibility introduces security and maintenance challenges. To make it work at scale, you need a disciplined approach.

Shift Left on Performance Testing - Without Killing Developer Velocity

Traditional performance testing often comes late in the delivery cycle, typically just before release. By then, performance issues are usually quite expensive to fix, can delay deployments, and frustrate development velocity. A Shift Left testing approach addresses this by integrating performance testing early in the development cycle so issues surface while they’re still easy and cheap to fix.

Claude Pricing: A 2025 Guide To Anthropic AI Costs

When OpenAI surged into the spotlight with ChatGPT, not everyone inside the company agreed on the path forward. In 2021, a group of senior researchers broke away. They had concerns about safety, transparency, and the direction of AI development. They went on to found Anthropic. And their answer to ChatGPT was Claude. Anthropic’s mission is for openness now. Yet, Claude’s pricing can feel as mysterious as the model weights behind the scenes.

Rethink Cloud Finance: From Cost Control To Strategic Growth

Cloud costs keep rising, and most companies are struggling to contain it. That’s where today’s finance teams can step up their game, not only as a professional opportunity but as a leading protagonist on the cloud cost optimization stage. A bit of background first: Global public cloud spending is projected by Gartner to exceed $720 billion in 2025. That’s up from nearly $600 billion in 2024. And a lot of that is sheer, unmitigated waste.

Monitor Apple Silicon GPU on macOS with macmon + Hosted Graphite

Your Mac’s GPU is a massively parallel processor that handles anything from animating the UI to heavy lifting in video editors, 3D tools, games, and on-device machine learning models. Think Final Cut Pro exports, Blender renders, Stable Diffusion, WebGPU demos, or shader builds in Xcode - which are all tasks that require heavy GPU.

Failover and cloud aren't enough for reliability

Amin Momin of @CapgeminiGlobal talks about reliability takes dedicated effort beyond just using the cloud and setting up failover. Full transcript: There are two misconceptions about reliability. One is people only think failover is reliability. Just doing the failover, that will be enough from the reliability point of view. That's the first one. And the second one: we are deployed into the cloud, so it is the service provider's responsibility to provide the reliability.

Best Tool for Composing Git Commits in your IDE, Commit Composer in GitLens 17.4

In GitLens 17.2 we introduced Commit Composer as an early preview of a set of AI-powered tools to help you craft cleaner, more meaningful commits. With GitLens 17.4, Commit Composer has leveled up. Based on your feedback, it’s now a fully interactive drafting experience that lets you compose commits in a single click, and puts you in control of your commit history.

5 DevOps Team Structures (Plus Actionable Strategies for Automation, Monitoring & Culture Change)

An effective DevOps team is about creating the right structure, culture, and processes that enable collaboration across traditionally siloed departments. The right DevOps team structure can dramatically improve software delivery speed, reliability, and overall customer satisfaction. But what exactly makes a great DevOps team? And how can you build one that works for your organization?

Implementing a Zero Ticket Operations Maturity Model

You bought the chatbot. You built the self-service portal. You shifted left like the IT operations and engineering best practices told you to. And yet, tickets keep coming, alerts keep escalating, and your team keeps firefighting. Why is this craziness still happening? Because shifting left, while helpful, is a singular tactic that is often mistaken for total transformation. It’s important to IT operations and engineering, but it’s not the full arsenal.

5 Signs Your Network Operations Need an Upgrade

Network operations form the foundation of how businesses function in today's connected world. Every service, tool, and application depends on the network working smoothly. When network operations fall behind, the problems show up quickly. Employees face disruptions, customers lose patience, and the business as a whole struggles to keep up with modern demands. The challenge is that many teams keep patching small issues without realizing the system itself has outgrown its usefulness.

AWS Reserved Instances 101: The Complete Guide

With 240 distinct services, ranging from compute to storage to networking and content delivery — each offered at different price points — choosing the right AWS service requires meticulous consideration.. By default, AWS services are available on-demand and you pay a monthly bill for services used. However, the on-demand pricing model can get expensive if you use a lot of services and deploy a fleet of instances.

Incident Response for DevOps, SREs, and IT Teams

That 3 AM alert is never fun. Your heart races as you try to figure out what broke this time, and how fast you can fix it. But with an incident response in place, that panic turns into a calm, step-by-step fix. It helps you handle everything, from a server crash to a security breach, in an organized way. In this guide, I’ll walk you through what exactly an incident response is, why you need it, its key components, and how to build one.

Visualize Logs Alongside Metrics: Complete Observability for Slow PostgreSQL Queries

When latency creeps into your app, metrics tell you that performance regressed, but logs tell you why. PostgreSQL’s slow-query logging gives you the exact statement, duration, user, and database which is perfect for hunting down missing indexes, inefficient filters, or N+1 patterns.

Real-time OS examples: use cases across industries

In sectors where precision and predictability are non-negotiable, timing is everything. Whether coordinating robotic arms on a factory floor, maintaining ultra-reliable latency in telecom networks, or ensuring an automotive braking system responds instantly, the success of these systems depends on meeting strict timing deadlines.

OpenTelemetry API vs SDK: Understanding the Architecture

When you're instrumenting applications with OpenTelemetry, you'll encounter two core components: the API and the SDK. The API defines what telemetry data looks like and how it is created, while the SDK handles how that data is processed and exported. Understanding this split helps you build more maintainable observability and avoid tight coupling between your business logic and telemetry infrastructure.

Design Concept: User-Created AI Agents with External Tool Support

Here's an early look at user-created AI agents with external tool support in Mattermost — designed to integrate AI into daily workflows while maintaining governance and security. We’d love your feedback on this design. Contact the Fast Futures team at fastfutures@mattermost.com.

Open Source Data Lakehouse Architecture with Spark and Kyuubi: Engineering Deep Dive

Subscribe. Fuel your curiosity. This webinar gives a detailed exploration of an open source data lakehouse architecture and how we implement it at Canonical. Watch to discover how Spark’s scalable processing engine and Kyuubi’s user-friendly SQL gateway enable efficient, secure, and high-performance analytics on unified data sets. Let’s dig deeper into how this combination simplifies big data storage, interactive analytics, and ETL – all through a single, streamlined open source lakehouse architecture.

What is Database Change Management (DCM)?

Database change management is the foundation for building a stable, secure, and high-performing application. In today’s fast-paced technological landscape, where agile and DevOps are the go-to for developing database application, rapid releases and continuous iteration are the norms. But with frequent deployments comes the risk of untracked database changes.

A complete security view for every Ubuntu LTS VM on Azure

Azure’s Update Manager now shows missing Ubuntu Pro updates for all Ubuntu Long-Term Support (LTS) releases: 18.04, 20.04, 22.04 and 24.04. The feature was first introduced for only 18.04 during its move to Expanded Security Maintenance. With this addition, Azure highlights where Ubuntu LTS instances would benefit from Expanded Security Maintenance updates if the administrator attaches an Ubuntu Pro license, even for instances running more recent Ubuntu releases.

Top AI Prompts for Engineering Leaders using the Cortex MCP

AI assistants have transformed how developers work. And now coupled with the Cortex MCP that connects AI assistants directly to live service data, ownership records, and organizational standards, developers can get accurate, context-rich answers about their services and standards right in their IDE. → Tips and prompts for developers using the Cortex MCP But what about engineering leaders?! Your opportunities with AI assistants extend far beyond code generation.

Fix issues faster with Recommended Remediations

You’ve successfully run a Fault Injection test and uncovered a new failure mode before it impacted customers. And the failure could have taken down your whole system if it had happened in production. Now what? Since this is a potential P1 outage, you absolutely need to address the issue, but that’s going to take some time as you dig through the service to track down the problem. Unfortunately, this is a common conflict.

True reliability takes the whole team

Reliability takes the whole team working together. Full transcript:  If you really want to get good at measuring your reliability, then you have to work together as a team. Once your software engineer organization has decided, "We're gonna test these applications to make sure that they have redundancy, availability, resilience." Just stick to that framework that you come up with as a team.

The Complete SaaS Unit Economics Guide (2025 Edition)

Measuring and monitoring unit economics can help your SaaS brand make informed business and engineering decisions. But how do you get that data, and what exactly are SaaS unit economics? We’ll cover exactly what SaaS unit economics are, metrics you should monitor, how to calculate your unit economics, and the tools you can use to be successful.

Self-Service Query UI for Logs in Azure Data Explorer (ADX)

This video focuses on how to create a self-service user interface (UI) for querying logs using Azure Data Explorer (ADX) and the Business Activity Monitoring (BAM) module. Perfect for developers and business users aiming to gain actionable operational insights from log data with simple visualizations and monitoring.

IT Alerting: Everything You Need to Know

Behind every reliable service is a team of people watching for problems. But they don’t stare at screens all day. They rely on IT alerting systems. An IT alerting system tells you when something is wrong. It finds problems fast, so your team can fix them before your business or customers are affected. This article will explain everything you need to know about IT alerting. You’ll learn what it is, why you need it, how to set it up, and which tools work best. Table of Contents.

Operational Challenges in Hybrid Physical-Digital Environments

Creating an ideal digital workplace environment may require some creativity. Hybrid cloud structures are excellent solutions for scaling companies with increased consumer demands. They also pose operational challenges that teams can overcome with the right strategies. Planning for those moments will prepare business leaders and their teams for the best possible outcomes.
Sponsored Post

Atlassian Bitbucket Monitoring on Microsoft SCOM

As part of a customer project, we developed a custom Bitbucket Management Pack for Microsoft System Center Operations Manager (SCOM). This tailored solution enables IT operations teams to monitor key performance and health metrics of Bitbucket environments, ensuring planning and bug-tracking platforms remain available and performant. With this Use Case paper, we aim to share our knowledge with the SCOM community, highlighting the possibilities of advanced monitoring on Microsoft SCOM and helping teams improve their day-to-day tasks.

Vector Database Explained: Architecture, Use Cases & Examples

Vector databases are rapidly becoming the cornerstone of modern AI applications—and for good reasons. If you are very familiar with AI technologies like ChatGPT, you have already seen what vectors can do. When you ask ChatGPT a question, such as, “What’s the weather like today?” to provide an accurate answer, the AI would first convert your question into a vector, which implies a series of numbers that capture the intent and context of your sentence. The cool part?

What A Great FinOps Onboarding Looks Like In 2025

I’ve seen firsthand how persona-centric FinOps creates realized savings through synergy. I’m a Certified AWS Solutions Architect, FinOps Engineer, and Customer Success leader who’s had the joy of turning cloud confusion into clarity. I’ve added a customer story below — but hold up, we’ve got onboarding optimizing to do.

What is Bitbucket? | Atlassian

Bitbucket is better than ever. With code and CI/CD on the Atlassian cloud platform, AI is not just a vibe — it’s how developers break through the friction in their daily workflows, and focus on the work that matters. About Atlassian: Behind every great human achievement, there is a team. From medicine and space travel to disaster response and pizza deliveries, we help teams all over the planet advance humanity through the power of software. Our mission is to help unleash the potential of every team.

Simple Talks Podcast | S3, Episode 1 - Coffee chat with Mike Bowers

To kick off Season 3, host Steve Jones is joined by Mike Bowers, Chief Architect at data technology company FairCom. The conversation includes an overview of Mike’s background, the work he does at FairCom, plenty of SQL talk, and advice on getting into the industry.

Buy vs Build: The Technical Reality of On-Prem, Hybrid, and Cloud

Most conversations about buy vs build turn into budget debates. But engineers know that the deeper question is: what exactly are we signing up to run, and who is going to run it? The operating model you choose: is what defines what layers of the stack you own, what skills your team needs, and how you spend your nights on-call. This article reframes the decision around the work itself, not just the invoice.

The Influencer Making Network Engineering Cool Again

What happens when a social media obsession turns into one of the most unconventional and impactful careers in tech? In this episode of Uplink, Alexis Bertholf, Global Technical Evangelist at Megaport, explores how she’s making network engineering cool again, and why connectivity is the oxygen that cloud and AI can’t live without.

dotConnect for PostgreSQL: Advanced .NET Data Access with Full ORM Support

Supercharge your.NET applications with dotConnect for PostgreSQL — a fast, feature-rich ADO.NET data provider tailored for modern PostgreSQL development. In this video, explore how dotConnect for PostgreSQL simplifies data connectivity, enhances developer workflows, and empowers scalable, high-performance applications within the.NET ecosystem. What you’ll see: From SaaS platforms to data-driven enterprise systems, dotConnect for PostgreSQL offers the speed, reliability, and flexibility your applications need.

APM Logs: How to Get Started for Faster Debugging

When application performance monitoring detects a spike in latency or error rates, the immediate challenge is determining the underlying cause. APM logs address this by correlating performance metrics with the specific log events that occurred at the same time. Instead of switching between monitoring dashboards and manually searching through log files, APM log correlation consolidates both views.

Unlock the Power of Self-Service with the ServiceNow Puppet Spoke

Discover how the ServiceNow Puppet Spoke integration streamlines automation, eliminates bottlenecks, and empowers users with self-service capabilities - all without needing expertise in Puppet Enterprise! This demo showcases the integration's potential to transform IT workflows by combining the power of ServiceNow and Puppet.

Don't Just Monitor SLAs - Validate Them Automatically

Service level agreements (SLAs) are the contractual backbone between customers and technology vendors, outlining expected service availability, performance metrics, and remedies like service credits when service providers fail to meet agreed-upon service levels. This service agreement assures both the technical quality as well as the service quality of the services provided, and underpins the value perspective of the client.

Impact review: Scribe under the microscope

In December 2024 we launched Scribe to help responders never miss a detail from their incident calls. By automatically transcribing calls and highlighting key information, Scribe eliminates manual note-taking, reduces time spent getting up to speed, and preserves valuable context for post-incident analysis. The feature quickly gained popularity among our customers, but with success came an influx of requests for bug fixes, extra functionality, and wider call platform support.

Black Hat USA 2025 recap

They say what happens in Vegas stays in Vegas—but this year, we couldn’t keep the latest in cybersecurity to ourselves. Though it wasn’t our first time attending Black Hat USA (we’re no strangers to the neon lights and desert heat), our anticipation was high when we landed at LAS. We couldn’t wait to get to the show, connect with security professionals, learn more about where the industry is headed, and put our own solutions to the test.

Better Automation. Easier Management. More Resilient IT. | Perforce Puppet

With Puppet, the power of IT automation empowers you. Too many companies use patchwork solutions for configuration management and IT automation, leading to unmanageable complexity and huge security risks. IT operators are on-call day and night to address security breaches, and toil for weeks manually provisioning servers. But no one would expect you to wash 10,000 dishes by hand — so why are IT operators expected to configure 10,000 servers manually?

Pulseway vs. NinjaOne: Why schools chose Pulseway

For a one-person IT team at a growing school, every minute counts. At American Heritage Charter School in Idaho, USA, IT professional Josh Siqueiros needed a solution that was more than just a monitoring tool. He needed a partner that could centralize his operations, save him time and provide rock-solid support. Josh ultimately chose Pulseway over NinjaOne for four key reasons that directly addressed his unique challenges. One of the biggest pain points for any IT professional is onboarding new devices.

A Detailed Guide to Azure Kubernetes Service Monitoring

Azure Kubernetes Service (AKS) continuously generates a high volume of telemetry, ranging from node-level CPU and memory usage to request latencies and error rates within individual pods and services. Without a structured monitoring strategy, this flood of metrics can easily become noise, leaving teams blind to early warning signs. Effective monitoring in AKS is about identifying the right signals, correlating them across layers, and acting before they impact application performance or cluster stability.

Your Apps Are Green. Your Infrastructure Is Dying.

Launch Week Day 3: Introducing Discover Infrastructure Your dashboard looks perfect. APIs responding in 80ms, background jobs processing smoothly, error rates at 0.02%. Everything's green. Then production breaks. "Why is checkout so slow?" "The payment service keeps timing out!" You run kubectl get pods and discover payment-service pods restarting every 3 minutes due to OOM kills. Then you check your database host—CPU at 98% because someone forgot the new ML training job runs there too.

Nginx Logs & Performance Monitoring with Loki and Telegraf | MetricFire

When a web service slows down or errors spike, metrics can tell you what changed (active connections rise, error rate increases), but the root cause can sometimes be found in your logs (which IPs are hammering POST endpoints, 4XX/5XX occurrences). Put the two together and you get the full observability picture. Time-series metric trends to spot incidents, and line-level details to fix them fast.

Discover Infrastructure: Kubernetes & Hosts - Launch Week / Day 03

Stop debugging infrastructure issues across multiple dashboards. See how Last9's Discover Infrastructure monitors K8s pods and traditional hosts together—with resource analysis, pod-level debugging, and AI that correlates app problems to infrastructure root causes. One setup (K8s + host monitoring) → Complete infrastructure visibility that connects to your services and jobs. No more blind spots between application performance and underlying resources.

Frontline Reliability: Protecting User Journeys with SLOs with Shery Brauner (Razor, ex-Zalando)

What does it really take to move from firefighting incidents to building reliability at scale? In this episode of Humans of Reliability, Shery Brauner (Razor, ex-Zalando) shares her unique journey from frontend and backend engineering to leading site reliability practices. She explains why protecting the user journey is the key to effective incident management, how SLOs cut through noisy alerts, and why observability must come first.

10 Best Kubernetes Alternatives In 2025 (By Category)

Containers and microservices are revolutionizing how distributed applications are built, run, and optimized. They enable apps to be highly scalable. You can also isolate some areas for updates and patches without shutting down the entire application or service. Yet, managing containers and microservices at scale can be tricky. That’s where a container management platform like Kubernetes comes in – or, as you’ll see below, where the top Kubernetes alternatives shine.

Spectrum Delivers Bare-Metal RPC Infrastructure for Next-Gen Blockchain Operations

In today's fast-evolving web3 environment, infrastructure plays a decisive role in how decentralized applications (dApps) perform and scale. Spectrum, a global Remote Procedure Call (RPC) provider, is meeting this challenge head-on with a bare-metal infrastructure that spans continents and supports over one billion daily RPC requests across more than 175 blockchain networks.

How to Set Up and Manage LTO Tape Backup Systems That Last

Building a dependable LTO tape backup system starts with more than just the right hardware. From physical setup to ongoing tape management, each step plays a direct role in long-term data protection. Skipping planning or using mismatched components can lead to wasted time, damaged media, and incomplete backups. Hardware, software, labelling, storage, and tape rotation: This guide has you covered. It's all here, explained simply. We'll also look at what's usually missed. Testing. Without routine validation, your backups can fail silently. Keep your system running smoothly!

Part Two - Event Intelligence vs. AIOps: Key Differences, When to Use Each and Why

The IT environments of large enterprises have become so complex that operational teams have turned to two solution categories in particular to help them improve visibility and gain faster incident response, automate and enable more effective decision-making.

Reliability upholds your promise to users

Consistent systems are reliability systems according to Ganesh Seetharaman, Managing Director at @Deloitte. Full transcript:   Strong reliability is demonstrated when systems consistently work as expected even during peak demand or unexpected events. When issues do happen, they are resolved quickly and transparently so users experience minimal disruption. Reliability also means data integrity. No matter how much stress the system is under, information needs to be accurate and secure.

Transforming Snapshots with Templates

Ever wished you could save and reuse your API traffic transformations? With Speedscale's new snapshot templates feature, you can! In this demo, we'll walk you through how to: Find and filter API traffic to create a snapshot. Transform sensitive data, like authentication tokens, with templates. Apply those saved templates to new snapshots instantly, saving you time and ensuring consistency. Whether you're looking to automate your testing or handle sensitive data more efficiently, our new templates feature makes it easier than ever to manage your API snapshots.

Why Your IT Automation Tool Needs to Work Both Doors... Or Get Out of the Way

You read that tagline correctly, and if you work in IT, you know that it’s true. Even in a time when vendors insist that more tools automatically equate to a more unified digital environment, IT automation tools handle a sliver of what IT actually needs. They focus on either user-driven requests or machine-generated incidents... but not both. That lack of cross-functionality is unacceptable when business success hinges on speed, scale, and seamless experience.

The Top AI Models And Trends Shaping SaaS in 2025

Two years ago, a “state-of-the-art” AI model could write decent copy or summarize a meeting transcript. Today, the top AI models can generate working code, analyze video in real time, and reason through complex scenarios. For SaaS teams, these changes represent a strategic crossroads. Choose the right model and you unlock new revenue streams, slash time-to-market, and wow your users.

What is Real User Monitoring

Real User Monitoring (RUM) measures how real users interact with your application in production. Unlike synthetic monitoring, which relies on scripted tests, RUM collects data from actual sessions. This means performance is observed across different devices, networks, and usage patterns. The result is a clear view of how the application behaves under real conditions, where latency is introduced, which features take longer to load, and at what points users drop off.

To Bitbucket from Jenkins: Enhancing Developer Experience

Atlassian’s Bitbucket Cloud has tightly integrated CI/CD capabilities via its Bitbucket Pipelines feature set. However, some of our Bitbucket Cloud and Bitbucket Data Center customers still use Jenkins for CI/CD. In this blog, I present a practical walkthrough of the benefits of Bitbucket Pipelines over a tool like Jenkins in the context of two key stats from our recent State of DevEx 2025 report.

FrogML SDK: the Gateway to Model Governance

Data-driven decisions are critical. And to support high-stakes decision-making – from fraud detection in credit card transactions to demand forecasting in retail – organizations are increasingly relying on complex models. According to McKinsey, 78% of organizations report using AI in at least one business function, highlighting just how embedded AI and ML models have become in operational and strategic decision-making.

Puppet Enterprise Installations with the Administration Module (PEADM)

In this real-time walkthrough, learn how to install and upgrade Puppet Enterprise using the Puppet Enterprise Administration Module (PEADM). This module helps you with admin and maintenance tasks and really helps you ensure that installations are consistently executed each and every time. Follow along from configuration to execution as Tony Green runs a monolithic installation example and provides an overview of the resources and options available to you.

Your APIs Are Green. Your Background Jobs Are Dying.

Launch Week Day 2: Introducing Discover Jobs Your dashboard looks perfect. APIs responding in 80ms. Error rates at 0.02%. Kubernetes pods healthy. Everything's green. Then Slack explodes: "Why didn't my invoice generate?" "Where's my password reset email?" "The data export I requested yesterday is still processing?" You check your job queue. Sidekiq dashboard shows 47,000 jobs processed today. Redis looks fine. Workers are running. But somehow, your business logic is silently falling apart.

Best of both worlds: relaxAI API brings sovereignty and affordability to OpenAI

The UK’s Competition and Markets Authority (CMA) recently published its final verdict on the state of the cloud industry. While the tone may have softened since its initial findings, the conclusion was still damning: hyperscalers like AWS and Microsoft continue to unfairly dominate the cloud market through opaque, inflated pricing and technical lock-in strategies.

How to Build a Strategic Roadmap for Site Reliability Engineering Implementation

Getting your site reliability engineering solutions in place can seriously boost how your systems perform. But implementing site reliability engineering (SRE) isn't a simple flip of a switch-it's a process. If you want to keep your systems running smoothly, with minimal downtime and top-notch performance, you need a solid, strategic plan. This roadmap should guide you step-by-step, from setting clear goals to constantly improving your processes.

Zero Trust Architecture Needs Zero Guesswork

The Zero Trust model has fundamentally shifted how organizations secure their applications and infrastructure. Instead of assuming anything inside your network is safe, the Zero Trust security model requires continuous verification of every identity, every device, and every access request across the entire trust model, forcing users and devices to prove that they can access what they are trying to access.

Mastering Cloud Governance: Build A Strategy That Works

One of the biggest benefits of the cloud is that it gives engineering teams the freedom to deploy and iterate applications quickly. Unlike traditional IT environments where engineers require a series of approvals before embarking on projects, in the cloud, engineers can choose from several managed services and deploy them at the click of a button. This means your team can innovate faster and respond quickly to market demands.

Stop Asking What AI Costs, Ask If It Is Worth It

AI is surging into products. And the invoices are exploding with it. The key question is no longer, “How much did we spend?” It’s now: “Was it worth it?” That shift, from totals to value, is at the heart of FinOps. The FinOps community defines the practice as bringing financial accountability to the cloud, so teams make tradeoffs with clear business context. In plain English, measure value per dollar, then optimize the system and not just the bill.

How to Spot More Threats in Less Time Using AI

Can AI really help security teams build better threat models? Microsoft's Senior Gaming Security Architect, Audrey Long breaks down the strengths and limits of AI in threat modeling, shows how she uses Azure OpenAI for attack tree automation, and reveals why human review still matters. Includes practical examples and live demos. Git Blog: gitkraken.com/blog.

Built for simplicity and scalability: why organizations choose Redgate Flyway

Redgate Flyway is renowned for its low-friction, SQL-first approach. Our customer stories share the wins seen in organizations worldwide for Redgate Flyway. And when it comes to community endorsement for Flyway OSS, the GitHub and Docker statistics prove it, with 9k+ GitHub stars and 50M Docker downloads advocating for its appreciation.

Console Connect expands in Africa's biggest cloud hub

South Africa has cemented itself as Africa’s biggest cloud hub, with Johannesburg emerging as a key centre for cloud connectivity in the country. It has seen significant investment from the three major hyperscalers - AWS, Microsoft Azure, and Google Cloud - all of which have a local presence in Johannesburg. This makes the city a strategic launchpad for cloud services, AI innovation, and digital transformation across the African region.

The 15 Best DevOps Monitoring Tools for Lightning-Fast Incident Response

When incidents strike, every second counts. The difference between a minor hiccup and a major outage often comes down to how quickly your team detects and responds to issues. That's why choosing the best DevOps monitoring tools for incident response can make or break your operational excellence. Modern DevOps teams need more than just basic uptime checks.

The Service Discovery Problem Every Developer Knows (But Pretends Doesn't Exist)

Launch Week Day 1: Introducing Discover Services Picture this: It's 2 AM, alerts are firing, and you're staring at a dashboard trying to figure out which service is causing the cascade of failures. Your service map is a six-month-old Miro board, and you have no idea what's actually talking to what in production right now. If you've been there, you're not alone. In fast-moving teams, new services get deployed faster than you can track them.

Hybrid Logic Apps & Azure Migration with Harold Campos

Lex is joined by Harold Campos from Microsoft to discuss the latest advancements in Azure integration. The conversation explores the newly announced Hybrid Logic Apps and its role in enabling seamless connectivity across cloud and on-premises environments. Harold shares insights on migration strategies, common challenges enterprises face, and how these updates simplify complex integration scenarios.

How Experiment Analysis uncovers the cause behind failures

Chaos Engineering has proven itself to be incredibly effective at tracking down failure modes, remediating reliability issues, and preventing risks before they happen. Unfortunately, it can also come with a steep adoption curve. In order to get the most out of Fault Injection testing, a practitioner needs to have a deep knowledge of the service, its expected behavior, and the code behind it. Ultimately, the rewards are worth the time.

RKE2: Enterprise Kubernetes Made Simple & Secure!

Still wasting weeks on complex configs? Meet your new secret weapon — RKE2! Prasun Das from @Infosys reveals how you can go from zero to a hardened Kubernetes cluster in minutes: Upstream Kubernetes, ready for production One-step CIS hardening — no 200-page manuals! Copy. Start. Done. That easy. Why work harder when you can work smarter? Get speed, security & enterprise power without the grind.

Cortex MCP set up

Learn how to set up the Cortex MCP in under 5 minutes. The MCP integrates directly into your IDE, giving instant access to Cortex data without leaving your coding environment. It reduces context switching by enabling natural questions about services and teams, and streamlines workflows with real-time data from Cortex, Jira, GitHub, and more.

Amazon SageMaker Pricing Guide: 2025 Costs (And Savings)

Amazon SageMaker makes it easy to prepare data for machine learning (ML) and then train, deploy, and modify ML models. SageMaker is a fully managed service that automates much of the ML lifecycle. So, if you want a single partner to help you through all stages of your Artificial Intelligence (AI) lifecycle, SageMaker might be the answer. Perhaps more important for this post is the promise that Amazon SageMaker can reduce your machine learning model costs. But does SageMaker pricing reflect this?

Tips and prompts for developers using the Cortex MCP

AI coding assistants are already transforming how developers work, helping them write code faster, answer tough questions, and automate repetitive tasks. It’s exciting, it’s powerful… and it’s just the beginning. Cortex MCP connects your AI assistant directly to your live service data, ownership, and organizational standards so it can give accurate, context-rich answers right in your IDE.

AI Cost Optimization At Scale: How One CloudZero Customer Manages Spend Across 50+ LLMs

AI adoption isn’t just accelerating, it’s compounding. From GPT-5 to Claude to Llama and beyond, engineering teams are integrating diverse LLMs across products, experiments, and services. And finance teams are now grappling with a new kind of cloud complexity: token-based economics and volatile inference costs, often spread across multi-model, multi-cloud, and multi-region architectures. The modern FinOps stack needs to keep up. CloudZero was built for this moment.

Visualize Logs Alongside Metrics: A Complete Guide for Monitoring Slow MySQL Queries

When a service slows down, metrics will tell you that it’s happening but logs tell you why. For MySQL, slow queries can be a silent performance killer, gradually chewing through resources until users start complaining. By enabling MySQL’s slow query log and forwarding it to Loki (via Promtail), you can visualize query-level details right alongside your metrics on Grafana dashboards. This makes it easy to correlate what is slow (metrics) with what is causing the slowdown (logs).

Practicing What I Preach, Just At Scale

I’ve spent most of my career building and optimizing cloud, on-prem, and data platforms for growing companies. It’s been an amazing journey so far. Through it all, FinOps has become more than just a methodology for me (Fred FinOps didn’t just come from my love of the Flintstones, though I do appreciate a good cartoon). It’s a community, a discipline, a tribe I’ve come to call home. Lately, some tough questions have kept me up at night: These challenges got me thinking.

How engineers can improve creativity ft. Corey Latislaw of Trainline

Engineering leadership isn't just about technical execution—it's about unlocking the creative potential that drives individual and team success. CircleCI CTO Rob Zuber sits down with Corey Latislaw, Head of Engineering at Trainline and executive coaching expert, to explore how creativity transforms both careers and team dynamics.

QA Testing in 2025: Revolutionize Your Workflow with Preview Environments

Software quality assurance has changed dramatically over the past few years. Today, the velocity of software development demands more than traditional staging and shared QA environments. Releases are expected to be faster, integration cycles shorter, and quality standards higher. These pressures have inspired a growing interest in preview environments—ephemeral, production-like spaces spun up on demand for testing code changes in isolation.

Mike Long and DORA Community Discussion - Software Delivery Governance

Manual governance in regulated industries is like steering a ship with last year’s map. Approvals, ticket queues, and after-the-fact evidence collection slow delivery and increase risk. By the time an audit arrives, teams are scrambling to prove they followed the process. Watch Kosli’s Mike join Nathen Harvey at DORA to unpack why this happens — and what continuous, automated governance can do to fix it.

Cortex MCP set up

Learn how to set up the Cortex MCP in under 5 minutes. The MCP integrates directly into your IDE, giving instant access to Cortex data without leaving your coding environment. It reduces context switching by enabling natural questions about services and teams, and streamlines workflows with real-time data from Cortex, Jira, GitHub, and more.

Using Claude to power up your onboarding

I joined incident.io about ten weeks ago, having been in my previous role for four and a half years. Being a new starter was an unusual feeling for me, and there's been a huge amount to learn; but by lunch on my second day (!) I had started shipping value to our customers. A large part of hitting the ground running has been having a colleague alongside me, who I can pester with questions, who doesn’t get offended when I write in all capitals, and often praises me for being absolutely right!

Zero-downtime deployment with Flagsmith and CircleCI

As developers, we continually strive to improve our software. This often means rolling out new software features at a rapid pace. However, deploying new features to production is not without risk. From no real production testing to limited rollback options, traditional deployment can quickly become frustrating. The worst issues, though, usually stem from one thing: buggy features making their way into the hands of users.

Reliability is when customers aren't impacted

Ultimately, a system is reliable when customers and engineers can count on it. Full transcript:  When I get to hear stories like, "Hey, we just had our holiday sales event kick off and everything went smoothly and I didn't have to wake up in the middle of the night." That is really the true definition of reliability these people that are constantly hands-on keyboard in charge of making sure that people like myself and like you aren't impacted when we're going to, for example, buy a new pair of sneakers, or we're going to get some sort of limited edition release that's coming out, right?

Cycle Joins the Vultr Cloud Alliance

Cycle.io is proud to announce that we have officially joined the Vultr Cloud Alliance , a curated ecosystem of best-in-class cloud providers and services. The alliance is built around the opportunity to provide developers, startups, and enterprises more choice, flexibility, and performance without the overhead of traditional orchestration tools.

Platform Team Toolkit demo

Platform teams face an impossible choice: rigid standardization that slows developers down, or operational chaos that creates security gaps. CircleCI's new Platform Team Toolkit eliminates this tradeoff by delivering self-service developer experiences with built-in governance. What You'll See in This Demo: Key Benefits: Perfect for platform engineers, DevOps teams, and engineering leaders who need to scale software delivery without sacrificing speed or safety.

LTS vs. upgrades: which future are you building for?

How should businesses decide between sticking to an LTS release or moving to a continuous upgrade model? In this episode, we explore the trade-offs, from stability and security to innovation and agility, and why flexibility in your upgrade policy is key to long-term success. We break down when LTS makes sense, when frequent upgrades deliver the most value, and how to balance both to keep your business secure, stable, and ready for what’s next.

Fiber Paths and Failsafes: Why Your Network Design Matters

Redundancy isn’t just a buzzword – it’s the design principle keeping modern AI and cloud applications online. In this Uplink episode, Kevin Schlosser, Interconnection Product Manager at NTT Global Data Centers, explains how resilient infrastructure is engineered to expect failure but remain operational. We explore: Diverse entry points and fiber path management AI-driven bandwidth growth: 100G standard, 400G emerging Cooling innovations for intense compute workloads Why providers without their own fiber may offer the most resilient paths.
Sponsored Post

Traffic Replay: Production Without Production Risk

The software and product life cycle is fraught with pitfalls and tradeoffs. While testing applications under production-like load is critical to ensuring the reliability, performance, and security of your data storage and software services, you need to do this testing without actually affecting the production data and systems. In essence, you have to pull off the impossible - be as close to production as you can without actually being production.

Amazon Kinesis Pricing Explained: A 2025 Guide

Kinesis is an Amazon Web Services (AWS) product that collects, processes, and analyzes streaming data in real-time. It can process streaming video, audio, IoT data, application logs, and other data as it arrives from thousands of unique sources, unlike technologies like Hadoop, which utilize batch processing (waiting for a complete dataset to arrive before processing and analyzing it).

Site Reliability Engineering vs DevOps: Which Approach Fits Your Organization?

Choosing between Site Reliability Engineering (SRE) and DevOps can feel like picking between two similar but distinct philosophies. Both aim to improve software delivery and system reliability, but they take different paths to get there. Understanding these differences helps you make an informed decision about which approach aligns best with your organization's goals, culture, and technical needs.

Startup GPU Hacks: Max Performance, Min Cost

Running a startup means every resource counts, especially when it comes to expensive GPUs. In this video, we break down proven GPU hacks to help you get maximum AI performance at minimum cost. Learn how to choose the right hardware, use pre-trained models, leverage quantization, adopt CPU-efficient frameworks, and make the most of affordable GPU providers. Whether you’re prototyping, training, or deploying AI models, these strategies will help you deliver big results without a big budget.

Pulseway's New AI-Powered Workflows: The Next Evolution in IT Automation

Efficiency has never been so vital for IT departments and MSPs—it’s a necessity. Endpoints need constant patching, security threats evolve daily, and service requests never stop coming. For many IT teams, the biggest challenge isn’t solving complex problems—it’s finding the time to do it all. That’s why Pulseway’s new AI-powered workflow generator is a breakthrough for IT operations.

Stop Trying To Cut Cloud Costs, Start Trying To Price AI Correctly

Most SaaS companies aren’t spending too much on AI. They’re just completely screwing up how they price it. You feel the budget pressure. The OpenAI and Anthropic bills keep climbing. Finance is starting to twitch. So the instinct is to cut. Trim back experiments. Cap usage. Beg your team to “optimize.” You can’t cost-cut your way out of a pricing failure though. And most of the time, that’s all this is — a pricing failure.

How HireVue Turned Cloud Cost Chaos Into A Competitive Edge

When you’re a global leader in AI-assisted hiring, speed matters. Not just in matching candidates to jobs, but in making the engineering and financial decisions that keep your platform running efficiently. For HireVue, fragmented infrastructure, manual processes, and sprawling spreadsheets turned cloud cost management into a time-consuming spelunking expedition.

Migrate to Bitbucket Pipelines from Jenkins

In this video, we briefly discuss developer experience before diving into the details of setting up and running a simple CI/CD system using Jenkins and AWS EC2. Using tightly integrated, cloud-native products like Bitbucket Pipelines provides a better developer experience than using self-hosted on-prem tools like Jenkins. With self-hosted on-prem, developers spend less time building software and solving problems for their customers and more time maintaining their on-prem software.

What is Data Center Interconnect (DCI)? A Complete Guide

Data centers have become the beating heart of digital business. Everything from financial transactions to cloud-based collaboration tools depends on the seamless movement of information between these high-powered facilities. As demand for bandwidth grows and companies stretch their networks across cities, states, and continents, the ability to connect data centers securely and efficiently has taken on new urgency. But linking data centers isn’t as simple as laying fiber between buildings.

15+ Best Docker Alternatives For Containers And Beyond

Although container-related technology existed before 2013, Docker revolutionized and propelled it into the mainstream. Using Docker, developers could automatically create containers from application source code, share libraries, and reuse containers. Docker enables you to track container image versions, roll back to an earlier iteration, and track who built a specific one. You can even upload only the deltas between two versions.

Signals: Using Jira + Git Activity to Automatically Flag At-Risk Work

Jira issue status is often...wrong. Or at least misleading. If you're a project, product, or engineering manager, you need something more reliable to understand if work is going to land in time (and what needs your attention). Using Signals, available in Git Integration for Jira Advanced Edition, you get an activity-based view of what's actually going on with Jira issues.

RITA: The Chatbot Alternative That Doesn't Waste Your Time

After years of overhyped pilots and half-baked “AI” assistants, IT leaders are increasingly skeptical of chatbot vendors pitching the same old logic trees that are disguised by prettier interfaces. These leaders don’t want another script engine in Slack; they want fewer tickets, faster resolution, and better employee experiences. That’s where RITA comes in. RITA isn’t just a chatbot alternative.

Why Sustainable Cloud Starts With The Bottom Line - Not Before

If you want to align green awareness with bottom-line impact, start by looking at your cloud waste. Not just as a budget problem, but also as wasted energy, because that’s exactly what it is. AI, especially, is a mounting factor. Deloitte’s Tech Trends 2025 report highlights the growing energy demands of large AI models, warning that electricity use in data centers could soon rival that of entire nations like Sweden or Germany.

Discover Puppet Enterprise Health & Status Checks

Enhance your Puppet Enterprise installation with powerful health and status check tools. This video dives into the PE Status Check module, showing the steps for how you can inspect and monitor your infrastructure, troubleshoot issues, and optimize performance effortlessly. See how to get data about your installed packages, review operational status, and extract JSON-based outputs that integrate seamlessly into alerting systems. Perfect for Puppet administrators and IT professionals aiming to enhance system reliability.

Discover Agentic AI in Resolve: RITA, Jarvis & the Future of IT Automation

Unlock the next generation of IT automation with Resolve’s groundbreaking agentic AI features. In this video, we introduce powerful new capabilities designed to help IT teams accelerate resolutions, streamline workflow creation, and bring control to enterprise AI strategies within the Resolve platform.

Learning English with AI: A Productivity Booster for Global Operations Teams

Global infrastructure runs on YAML, shells, dashboards, and, whether we like it or not, English. Every runbook, vendor knowledge-base article, and incident bridge seems to flow through that single linguistic channel. When words fail, so does uptime. The irony is obvious: modern operations can orchestrate fleets of containers across continents, yet a simple misinterpreted phrase can freeze the entire pipeline.

What Is ANSI SQL and Why You Should Use It

Every major RDBMS insists it’s SQL-compliant, but under the hood, proprietary quirks break portability and slow down development. That’s where ANSI SQL steps in—not just as a standard, but as a survival strategy for teams juggling multiple platforms. By adopting ANSI SQL, you’re not just writing cleaner, more portable code—you’re unlocking freedom.

Stop Guessing with OAuth: Understanding CI/CD

OAuth 2.0 is the leading open authorization framework that enables secure delegated access to protected resources. From traditional web apps and browser-based apps to native apps and desktop applications, OAuth allows client apps to grant access on a user’s behalf without exposing login credentials, enabling powerful third-party applications, custom data flows, and powerful user experiences. However, while OAuth is secure, it’s not always fast.

Ready, steady, goa: our API setup

At incident.io, speed is essential. Our product is growing faster than ever; in scope, range of features and the number of people contributing to it. In the early days, when you’re a small startup with just a few hundred endpoints, a basic API setup gets you by. But as things scale, you need to make creating endpoints easy, fast, and reliable.

Introducing FlexCore AI: Your Sovereign Private Cloud for AI Workloads

Since launching Civo AI, we have been working on creating a secure, scalable, and easy-to-manage private AI solution. We are excited to announce that we have officially launched FlexCore AI Private Cloud, a sovereign AI cloud solution designed for businesses demanding data sovereignty without sacrificing innovation. Deploy your AI-ready private cloud by contacting our team today >

How To Use Alloy and Hosted Graphite's Loki to Store and Visualize Logs

In a modern DevOps environment, having just metrics or just logs is like trying to navigate with half a map because you’re missing important context that makes decisions faster and smarter. Metrics tell you what is happening (CPU spikes, request rates, failed logins) but logs tell you why it’s happening, with the timestamps to prove it.

What Is a Telemetry Pipeline and Why It Matters in Modern IT

A practical guide for IT professionals, DevOps, security teams, platform engineers, and anyone who’s dealing with logs. In contemporary distributed systems, telemetry data—logs, metrics, traces, and events—serves as the primary mechanism for understanding internal system behavior. However, as system complexity increases, so does the volume and heterogeneity of telemetry.

How Streaming, AI, and Network Demand Are Reshaping Rural Middle Mile Networks

Rural America is experiencing a dramatic surge in network demand driven by high-bandwidth applications like 4K video streaming, real-time sports content, and AI workloads. As broadband competition and digital transformation accelerate, service providers must rethink middle-mile network architecture to be scalable, technology-agnostic, and service-aware.

Using DCIM to Consolidate and Drive Down Colo Costs

As colocation demand surges, space is becoming increasingly scarce and costly. According to CBRE, the average asking rate in primary wholesale colocation markets for a 250–500 kW requirement has climbed 12.6% year-over-year to a record $184.06 per kW/month, while vacancy rates have dropped to a record-low 1.9%. With vacancy rates low and power costs rising, doing more with less in your data center is essential.

Reliability Intelligence: your reliability expert

For the last decade, Gremlin has helped Fortune 500 organizations with critical uptime requirements proactively uncover reliability risks and prevent costly outages. We started with Chaos Engineering, then built Reliability Management to help teams standardize and scale their testing efforts. Today, we take another leap forward with the release of Reliability Intelligence. Reliability Intelligence draws on Gremlin expertise with each test to show you what happened and recommend remediation.

Insights from Azure Logic Apps Product Team

This episode is a spontaneous yet insightful conversation at with Rohitha from the Microsoft Azure Logic Apps product team! Here's what you'll learn: 00:00:54 - What’s new with Azure Logic Apps 00:02:35 - Behind-the-scenes of building Microsoft workflows 00:07:52 - Real-world use cases and developer tips 00:19:39 - Rohitha's experience working on one of Azure’s most powerful automation tools.

Top 7 Application Performance Monitoring Tools

Your application is under constant pressure to deliver low latency, high reliability, and a smooth user experience isn’t optional. When performance drops, every second matters. Application Performance Monitoring (APM) gives you the visibility to spot issues before your users feel the impact. It also helps you understand what’s happening inside your stack, so you can track resource usage, pinpoint bottlenecks, and keep things running at peak performance.

Introducing Megaport IPsec Tunnels

Protect your network traffic from cloud, edge, or branch with Megaport’s new IPsec add-on for Megaport Cloud Router. If you’re managing a network across public cloud, private cloud, branch offices, disaster recovery sites, or remote endpoints, you’ve probably asked the question: How can I secure this entire environment without adding more hardware or complexity?

A CISO's guide to Application Security best practices

When most people think about the most important ingredients of software, Application Security (AppSec) is unlikely to be at the top of the list… but it should be. Without AppSec, you face severe risks of data breaches, massive fines, enraged users, and severe financial losses.

Product Klip: Komodor's Advanced Cost Optimization Capabilities

This Product Klip covers Komodor's cost optimization features, highlighting how the platform helps users reduce Kubernetes spending while maintaining operational stability. Key features discussed include: Before activation, Komodor provides simulations of potential savings, and for activated clusters, it shows CPU and memory usage before and after Komodor's bin packing and the resulting dollar savings. Komodor enhances existing autoscalers rather than replacing them, unlocking up to 40% in additional savings.

#047 - Securing the Software Supply Chain and Kubernetes with Dustin Kirkland (Chainguard)

Meet Dustin Kirkland, VP of Engineering at Chainguard. Dustin shares his fascinating 26-year journey in the tech industry, from IBM and two stints at Canonical to roles at Google (working on GKE), Apex, and Goldman Sachs, eventually leading him back to engineering at Chainguard.

Why MikroTik VPS Is a Smart Choice for Network Monitoring and Management

Managing complex, distributed networks is no longer optional; it's essential for business success. They are often used for remote offices and IoT deployments, and managing those without the right toolkit is too much pressure, as uptime, security, and scalability without overspending should be secured. If you buy MikroTik VPS, you can be surprised at how these constant headache-causing tasks are managed successfully and with minimal effort. All thanks to the features this technology has.

Implementing IT Access Management: A DevOps Operations Guide for Streamlined Security Integration

Effective IT access management in DevOps requires implementing automated controls that scale with development workflows while maintaining security principles throughout the entire software lifecycle.

GPT-OOS: A Secure Step Forward, But Not a Free Pass

The release of OpenAI’s new open-source model, GPT-OOS, has sparked a wave of excitement across the AI community. And rightly so. For organizations that want the benefits of generative AI without sending data out to the web, this is a compelling option. Running locally, GPT-OOS offers a level of privacy, control, and cost-efficiency that’s hard to ignore. It’s fast, lean and at least in its early benchmarks, surprisingly capable in coding, math, and STEM-heavy workloads.

Data Sovereignty Is Everyone's Problem

Data sovereignty isn’t just a niche consideration anymore – it’s a central requirement in everything from cloud computing and analytics to software development. The environment of 2025 is significantly different from that of 2015, and even more so from 2005. What was once a patchwork of guidance documents, data privacy laws, and local regulations has given way to massive EU-wide regulations, multinational frameworks, and a greater focus by users on digital identity.

The riskiest thing you can do is not measure your risk

Hiring good engineers is important, but it’s not enough to prevent outages. You need to measure and track your risk to get real results. Full transcript:   My name's Jeff Nickoloff. I'm a principal engineer here at Gremlin.  What I hear non-technical functions talk about is really they are much happier to sort of lean on their great engineers. Oh, we've got a great engineering culture. "We don't have reliability issues because we hire the best people.".

High Score: Megaport Hits 1,000 Locations

To celebrate one of our biggest milestones so far, we reflect on the journey we've taken to get here alongside our incredible partners and customers. Megaport has just hit a milestone that has been over a decade in the making: 1,000 Megaport-enabled locations worldwide. This achievement is more than just a nice, round number – it means Megaport is now available in 10% of all data centers globally. In this industry, that’s a massive deal.

Visualizing Logs Alongside Metrics: A Practical Use Case

Security threats aren’t always loud and don’t always crash systems or trigger alarms. Sometimes they creep in quietly as a steady stream of unauthorized login attempts, slow brute-force probes, or unknown IPs scanning your server for vulnerabilities. These behaviors often show up in logs before they surface in metrics but if you're only watching logs or only tracking metrics, you're missing part of the story.

Getting closer to space with Canonical #ubuntu #space #shorts

@EuropeanSpaceAgency is scaling to support more missions than ever. Canonical makes it possible with open source infrastructure built for space. Watch the full video to see how we're helping ESA automate, scale, and future-proof its operations. Subscribe for more tech stories from space.

A local fix just spreads the problem

“You fixed a bug in QA — great! But did that fix go into version control and get tested and deployed everywhere? If not, you just created drift, and more problems down the line.” Peter Kruis, Microsoft SQL Engineer at Monin Fixing a bug in the environment where it appears feels like progress, but without a proper process, it creates fragility everywhere else.

Break it early to ship it safely

“We want developers to break things – just not for the customers. If all our tests are green, I get nervous that we’re not testing deep enough.” Naga Santhosh Reddy Vootukuri, Principal Software Eng. Manager, Microsoft Azure SQL Naga Santhosh, Sunny to most, leads a team that ships changes to Azure SQL databases worldwide. Those deployments must be fast, frequent, and invisible to customers. That kind of reliability doesn’t come from playing it safe during development.

Navigating the Growth of Digital Infrastructure in Brazil with Carlos Eduardo Sedeh

What does it take to build a telecom network that actually listens? In this episode of Uplink, Carlos Eduardo Sedeh, CEO of SAMM (formerly Megatelecom), joins host Michael Reid to explore how a flat-fee dial-up service launched in 1999 laid the groundwork for a customer-first telecom strategy that continues to reshape Brazil’s enterprise connectivity landscape.

Unlocking Growth for Northern FinTechs

FinTech is more than just a fast-growing industry; it can drive economic prosperity and improve quality of life, not just in the traditional financial hub of London, but across the whole of the UK. The sector’s success, especially outside the capital, is critical for regional growth and a more balanced national economy.

Flyway and Optimized Partition Management for Oracle Databases

I’m pleased to announce that Flyway continues to improve its support for Oracle databases with every release, focusing on performance, reliability, and developer efficiency. The most recent updates to Flyway Desktop and Flyway CLI specifically target a long-standing challenge for Oracle users: unnecessary table rebuilds when migrating to partitions or when managing partitions.

Cloud Services Investigation: What the CMA's findings mean for the cloud industry

In 2022, the Office of Communications (Ofcom) conducted a market study into the UK cloud industry to investigate whether the UK’s cloud market was working well, and if any regulatory intervention was needed¹. Ofcom concluded that the market was dominated by AWS and Microsoft, and that competition was limited, and referred matter to the Competition and Markets Authority (CMA) for further investigation².

A Guide to Agentic Orchestration

A lot of IT organizations (I daresay most IT organizations) aren’t short on bots; they’re short on direction. You've got GenAI copilots here, virtual assistants there, and workflow bots running in many different tools. Maybe one automates diagnostics while another resets passwords. But none of them talk. None of them collaborate. None of them understand what the others are doing. That’s not automation; that’s entropy.

How To Run Monthly Cloud Cost Meetings For AI Teams

If you’ve ever stared at your cloud bill and thought, “How on earth did this get so crazy?” — you’re not alone. Especially when AI workloads come into play, those GPU costs can feel like a runaway train. The good news? It doesn’t have to be that way. The magic happens when you’ve got someone from every team that cares about smart growth (FinOps, AI/ML, product, engineering, whatever) all in one room, looking at the same set of numbers.

Best Practices for End-to-End Testing in 2025

End-to-end (E2E) testing is a critical practice in today’s software development, ensuring that entire applications work seamlessly from the user’s perspective. With the growing complexity of web applications – from large monoliths to distributed microservices – thorough E2E testing has become essential for quality assurance.

Compliance Requirements for Financial Services

We're taking a look at how you can secure your business's future with cloud services and how security threats can seriously impact your business. Moving to the cloud has changed the way organisations handle infrastructure, but for highly regulated sectors like finance, it’s never been a straightforward leap. It comes with serious scrutiny, and a long list of requirements to meet before any workloads can safely be moved off-prem.

The Impending SaaS Crisis: How AI Is Disrupting SaaS - And How You Can Prepare

At CloudZero’s most recent company retreat, we held an investor panel, where representatives from four of the VC firms investing in CloudZero fielded questions from our team. Unsurprisingly, a good deal of the conversation revolved around AI. A standout moment from this panel came when one investor described a vibe coding session he’d done about a month prior. “Vibe coding,” for the uninitiated, means using AI to build an application without writing any actual code yourself.

What craft means for Canonical

Last month Jon Seager (our Vice President for Ubuntu Engineering) wrote about crafting software: Multiple Canonical products have craft in their names: Snapcraft, Charmcraft, Rockcraft (and there are others in the works). Our craft products are tools for making software, for the software craftsperson. To be a maker of tools comes with responsibilities – when you decide what tools should be like, you are also deciding how people should work.

How to Automate Password Resets with Resolve #itautomation #ai #aiautomation #agenticai

Tired of password reset tickets? Meet RITA — Resolve’s AI IT agent. Understands natural language Verifies identity Resets passwords instantly No tickets. No delays. This is Zero Ticket IT in action. Watch Resolve’s Derek Pascarella show how it works.

Weaponized AI vs. AI Driven Security Posture Management: Why the Battle Starts in Misconfigurations

August 5, 2025, Las Vegas Black Hat 2025, Abnormal AI officially launched its Security Posture Management for Microsoft 365. This release marks a critical turning point. In an era where attackers weaponized AI to uncover and exploit misconfigured cloud environments at machine speed, reactive security simply can’t keep pace. Threat actors are now leveraging automated AI to scan systems, identify configuration drift, escalate privileges, and deploy zero‑day exploits in seconds.

Automating Network Diagrams for A Complete View of All Active and Passive Components

Accurately tracking how data center devices are connected—across switches, patch panels, structured cabling, and more—is essential for efficient data center operations. But for many teams, documentation still lives in static diagrams or outdated spreadsheets, requiring extensive manual effort. This is time-consuming and leads to inaccuracies that can cause delays in planning or troubleshooting and unnecessary risk. Sunbird DCIM changes that.

IP Optical Middle Mile Network Architectures for Rural America

In addressing the burgeoning demand for broadband connectivity in rural America, a robust and innovative IP Optical Network Architecture is essential. The architecture must incorporate a best-in-class multi-layer design optimized for middle-mile functionality, integrating both voice and security dimensions. A pivotal requirement is to decouple the last mile from the middle mile, ensuring that the last-mile solutions can remain agnostic to various technologies while still benefiting from a unified middle-mile infrastructure.

Avoid the Chaos Engineering bottleneck

Chaos Engineering is great, but by itself it can create bottlenecks that limit your reliability journey. FULL TRANSCRIPT: One of the things we've learned while building Gremlin and being the first Chaos Engineering tool to market is with all the greatness that comes with this approach, we've learned some of the downfalls, some of the drawbacks. And one of those is how you scale this practice.

Log Format Standards: JSON, XML, and Key-Value Explained

Your log format defines how your application records events. The structure you choose shapes how logs get parsed, indexed, and queried. It affects how quickly you can debug issues, build alerts, or control storage usage. In this guide, we'll take a look at the log formats developers typically use, the essential fields to include, and what trade-offs to consider before locking down a format for your system.

GitKraken Desktop 11.3: AI-Powered Commit Cleanup Without the Chaos

TL;DR GitKraken’s new Commit Composer is the smartest way to turn messy commit history into clean, structured stories, no command-line gymnastics required. This AI-powered tool rewrites your commit sequence with intent and clarity, saving you time and reducing review headaches. Oh, and it makes you look like a Git legend Ready to see it in action? Check out the Youtube Tutorial below.

Securing Your Business's Future with Cloud Services

We're taking a look at how you can secure your business's future with cloud services and how security threats can seriously impact your business. Security concerns for cloud-based services often start small, with a missed update, a misconfigured setting, or an overlooked access point. But in practice, those small gaps can open into much wider vulnerabilities, affecting not just data but the ability to operate.

How server-side tagging benefits complex operational systems

What was the biggest pain in your childhood when doing puzzles? Of course, the worst is when you need to somehow finish a plain-black segment of 300-400 pieces. Lost puzzle pieces proudly occupy the second place. Well, if you have chosen to work in the digital marketing or e-commerce industry, nothing changes. You still will suffer from huge black spots and tiny pieces of missing data. The bigger the company you work for is and the larger the amount of data you operate with, the more you will be affected by those information gaps.

Optimizing Legacy ML Systems with Real-World DevOps Practices

We chose to feature this article because it reflects exactly what OpsMatters stands for: practitioners solving real problems with practical DevOps thinking. When we came across Ashish's detailed breakdown of his experience modernizing a complex ML environment, it stood out for its clarity and actionable insights. We reached out to him to learn more about the work behind this case study, and with his permission, we are sharing it here so the broader community can benefit from these lessons in observability, cost optimization, and real-world DevOps execution.

Introduction to End-to-End Testing: Everything You Need to Know in 2025

End-to-end (E2E) testing is a crucial software testing methodology that ensures an application works flawlessly from start to finish. In today’s fast-paced development cycles (think Agile and DevOps), E2E testing helps teams validate entire user workflows – from the user interface on the front end, through any APIs or services, down to databases or external integrations – exactly as a real user would experience them.

Speeding up AI Coding Assistants using Deterministic Feedback

AI coding assistants are transforming the way developers approach software development by automating routine tasks and enhancing code quality. These tools leverage artificial intelligence and machine learning to provide real-time code suggestions, auto-complete functions, and even debug existing code, making the development process faster and more accurate. Modern AI coding assistants integrate seamlessly with a wide range of programming languages and frameworks, including Java, Python, and C++.

Breaking through the Senior Engineer ceiling

You’ve made it to Senior engineer. Now what? You’re now staring at the next level, Staff typically, sometimes Principal, or whatever your company calls it. The path feels murky. Your manager gives you feedback like “show more technical leadership” or “think bigger picture”, but what does that actually mean day-to-day? I’ve been there. I’ve also been on the other side, helping engineers grow through whatever explicit (or implicit) levels a company has.

GitKraken Desktop 11.3: This is either a brilliant or terrible idea

AI-powered commits? Rewriting your Git history? We know, it sounds unhinged. But in GitKraken Desktop 11.3, Commit Composer makes it possible to take messy WIP or past commits and turn them into a clean, logical story. In this video, we’ll show you how Commit Composer works, when to use it, and what it means for your workflow. What Commit Composer can do: Plus, we cover other new features in 11.3: Git Blog: gitkraken.com/blog.

Patterns for safe and efficient cache purging in CI/CD pipelines

"There are only two hard things in Computer Science: cache invalidation and naming things."—Phil Karlton In the age of increasingly frequent deploys, edge caching, and Jamstack adoption, caching plays a key role across the software delivery life cycle. In build and CI pipelines, caching compiled assets or dependencies helps reduce compute costs, speed up job runtimes, and lower the environmental impact (regarding energy usage) of repeated builds.

Introduction to Puppet Security Compliance Enforcement

Take enterprise compliance to the next level with Puppet Security Compliance Enforcement. This video demonstrates how to shift from reactive to proactive compliance, automating security standards across Puppet Enterprise and Puppet Core environments. Discover how to implement CIS benchmarks with ease, achieve consistent compliance, and reduce risk across your Linux and Windows infrastructure with tailored configurations.

The Innovation vs. Control Syndrome: Unlocking Enterprise AI's Full Potential

From optimizing supply chains to personalizing customer experiences, artificial intelligence and machine learning models are no longer statistics-based revenue initiatives; they’re foundational to modern business strategy. Organizations are pouring resources into developing and deploying AI, driven by the promise of unprecedented efficiency, insight, and competitive advantage. Yet, beneath this surging wave of innovation lies a growing tension: the Innovation vs. Control Syndrome.

Vibe coding with the incident.io API

Many, many years ago, I was a computer science major at the University of Illinois, hoping someday I’d be able to write code for a living. I started my career in QA hoping to learn the ins and outs of software development. But it turns out I wasn’t very good at coding. I was just good enough to get a role as a sales engineer, where all I had to do was write code that could hold together for 30 minutes in a demo.

CFO Cloud Cost Metrics: Key KPIs To Track

Cloud services have become an indispensable resource for businesses seeking agility, scalability, and innovation. However, with this increased reliance on the cloud comes the challenge of managing and optimizing costs effectively. For Chief Financial Officers (CFOs), understanding and tracking cloud cost metrics is crucial to maintaining financial health and ensuring strategic investments yield the desired returns.

PagerDuty vs. Spike: Which Tool is Better for Alerting in 2025

If you’re stuck choosing between PagerDuty vs. Spike for alerting, you’re in the right place. I wrote this blog post to help you make a clear choice. To do this, I signed up for both tools and ran a full, hands-on comparison to see which one performs better in real-world scenarios. This detailed analysis will show you the key differences, declare a clear winner based on a 25-point scoring system, and give you the confidence to pick the right tool for your team. Let’s get started.

AI in IT: Great! Now What the Hell Do You Do with It?

The demand for organizations to minimize downtime, swiftly address issues, and proactively manage the infrastructure has never been greater. But how can teams be expected to meet that challenge with legacy tools and approaches? Enter Zero Ticket IT, a transformative approach where AI-driven automation eliminates traditional ticket bottlenecks, empowering IT teams to focus on innovation and strategy. But how do you know if your team is truly ready for this transformative leap?

Console Connect Ecosystem Update August 2025

In this ecosystem update, we share details of 11 new data centre locations now available on the Console Connect platform, along with new global on-ramps across the ''big three'' cloud providers. Across the U.S., we’ve expanded our footprint in New Jersey, Florida, Utah, and Ohio, giving you access to more local data centres with ultra low-latency connectivity.

PostgreSQL Performance: Faster Queries and Better Throughput

A PostgreSQL setup that performed well with 10,000 users starts to show strain at 100,000. Queries that once returned in under 50ms now take over 2 seconds. The connection pool regularly hits its limit during peak usage, leading to timeouts and degraded performance. This blog focuses on practical ways to reduce query latency by 50–80% and increase throughput for high-concurrency environments.

Top Tools to Connect Your Analytics and Data Visualization Platforms with Zoho CRM

Zoho CRM integration is a game-changer for businesses that want to convert raw customer data into powerful, crystal-clear insights. In an era where every click, call, and customer interaction generates valuable data, the ability to instantly connect your CRM with analytics and data visualization platforms can make the difference between thriving and merely surviving. Imagine having a live, visual pulse on your sales pipeline, customer behaviors, and revenue forecast all in one place.

SQL Server Transaction Log Analysis

Transaction logs play a crucial role in database management, especially in SQL Server databases. These logs meticulously record every transaction and the corresponding changes made to the database. Such detailed tracking is indispensable for identifying issues and restoring the database to a specific state and time if any failures occur. In SQL Server, every database includes a transaction log, a feature that is always enabled and cannot be turned off.

What are Application Metrics?

Application metrics are structured, quantifiable signals that reflect how your software behaves in production. They capture key aspects of performance, response times, error rates, throughput, and resource usage, giving you a real-time view into the health of your system. Tracking the right metrics helps detect regressions early, surface latent issues before they impact users, and guide optimization decisions based on hard data, not guesswork.

You're Writing Code Wrong: Start Telling Better Stories with Git

What if your Git history could read like a great novel? In this talk from GitKon, Jason Gates (Senior Staff at Sandia National Labs) makes the case that software is storytelling...and Git is your medium. With references from The Hobbit to The Stormlight Archive, he shows how commit structure, messaging, and PR flow aren’t just best practices, they’re tools to help your team (and future you) understand what really happened.

EMEA Rundeck by PagerDuty Meetup - July 2025

Join us for an informal 1-hour virtual event where the open-source Rundeck by PagerDuty community comes together to share automation stories and use cases. Whether you're new to Rundeck or looking to elevate your automation game, this meetup is packed with valuable takeaways for everyone! Host: Martin Van Son, Automation Specialist & Strategic Solution Advisor at PagerDuty New OSS Dashboards & Enterprise ROI Plugin + Creating Rundeck Plugins with Claude Code.

AMER Rundeck by PagerDuty Meetup - July 2025

Join us for an informal 1-hour virtual event where the open-source Rundeck by PagerDuty community comes together to share automation stories and use cases. Whether you're new to Rundeck or looking to elevate your automation game, this meetup is packed with valuable takeaways for everyone! Host: Forrest Evans (Director, Product Management at PagerDuty) Rundeck by PagerDuty: A Swiss Army Knife of Automation.

Save Hours on Troubleshooting with Automated Investigations

How many times has your team stared at a dashboard, pointed to a spike, and asked a question that charts alone can’t answer? “What was the real impact of that deployment?” “Why are our Kubernetes pods in the us-east-1 cluster suddenly crashing?” “Are we wasting money on overprovisioned servers?” Answering these questions is the real work of operations and SRE.

Tutorial: How to Remediate Vulnerabilities with Puppet Enterprise Advanced Patching

The rate at which vulnerabilities are being exploited is on the rise. The VulnCheck company, which specializes in vulnerability intelligence, found that in Q1 2025, 28.3% of vulnerabilities were exploited within 1 day of CVE disclosure. Keeping your systems up to date is more important than ever. The reality is that many security teams are running scans and then exporting to giant spreadsheets, which are “tossed over the wall” to the Operations team with little context.

How to Block Apps on Android Business Devices?

Are you an IT administrator looking for an efficient way to manage company-owned Android devices? This video provides a step-by-step guide on how to block apps on Android devices to boost employee productivity and maintain security. In a business environment, a clear app usage policy is essential for compliance and focus. We'll show you how to easily set up an App Blocklist using the AirDroid Business MDM solution.

IPAM Site Mapping: Give Your Subnets a Home

Without site context, your 5-minute fix becomes a 30-minute hunt through spreadsheets and Slack channels while users wait. This isn’t just inconvenient—it’s expensive. Every minute of downtime costs your business, and every minute spent playing IP detective is a minute not spent solving the actual problem. As networks scale across cloud, hybrid, and on-premises environments, this lack of infrastructure context creates real operational pain for your team.

Cut Compute Costs Up To 90% With Azure Spot Instances

When cloud costs spike, compute is often the culprit. Using Azure Spot Instances could cut your compute costs by up to 90%. But Spot VMs come with trade-offs, including unpredictable evictions and capacity constraints. And that makes them tricky to use without the right strategy and visibility. In this guide, we will share how to make them work for you.

We built an MCP server so Claude can access your incidents

"Show me all critical incidents from the last week." "Create an incident for the payment API being down." "What was the root cause of that database incident last Tuesday?" If you've ever wished you could just ask Claude (or any MCP client) to handle incident management tasks instead of context-switching between chat and your incident management dashboard, you're going to like what we built.

Product Klip: Istio Developer Dashboard

Troubleshooting issues in a complex service mesh environment, such as traffic failures or authorization problems, often requires the expertise of an SRE or DevOps professional. However, Komodor simplifies this process. Komodor provides developers with the necessary visibility to diagnose service mesh issues on their own. It helps developers easily identify blocked connections and understand the root cause without having to review logs or configuration files.

Netdata Now Troubleshoots Your Alerts for You

The 2 AM pager alert. For anyone in Ops, SRE, or IT administration, those words trigger a familiar sense of dread. An alert has fired. Is it a real fire, or another false alarm waking you from a dead sleep? The pressure is on. Every minute of downtime costs money and reputation, but troubleshooting a complex system when you’re sleep-deprived is a Herculean task.

AI Agent Is Hitting Your APIs - Are You Ready?

It’s no longer theoretical – artificial intelligence has left research labs and entered production systems, generating a new breed of consumers – autonomous and intelligent agents. These autonomous AI agents are increasingly interacting with real-world APIs (application programming interfaces), which are sets of protocols and tools for building and integrating software applications.

Building your AI infra, our tips

Modular architecture: Decouple compute from storage so each can scale independently. This makes it easier to adapt to growing or shifting workloads over time. Future-ready hardware: Select GPUs and CPUs not just for current workloads but with an eye on scalability, including support for newer accelerator types. Scalable design: Ensure the system allows seamless addition of compute nodes or storage without a full redesign.

Running AI without blowing up your storage

Storage is often underestimated: In infrastructure discussions, compute and networking get most of the attention, while storage is treated as secondary. For AI workloads, that can be a costly oversight. Data throughput for specialized hardware: AI infrastructure powered by GPUs can process massive volumes of data at unprecedented speeds. This puts immense pressure on the storage system to keep up. Scale-out performance: An on-prem, scale-out, software-defined storage setup allows you to meet high performance demands, grow capacity as needed, and stay in control of infrastructure costs.

Bridging the Gap: 3 Practical Strategies to Align Security and Operations in DevOps

The gap between security operations and IT operations poses significant risk. It’s increasingly clear that DevOps leaders, IT managers, and enterprise teams face an uphill battle to manage growing threat complexity, endless patches, and compliance requirements while operating in silos. Bridging this gap is essential to effectively manage risks and enhance operational efficiency.

Securing the Invisible: Why Ambient AI Needs Next-Gen Security

If, like me, you’re continuously striving to keep pace with the ever-evolving world of artificial intelligence, you’re probably hearing a lot about how Ambient AI is poised to dominate discussions and developments throughout the second half of 2025. Ambient AI refers to artificial intelligence systems that operate unobtrusively in the background of our daily environments, constantly sensing, analyzing, and responding to various inputs without explicit human interaction.

Librato on Heroku is Going Away and Hosted Graphite Is the Better Next Step

Librato (a SolarWinds product) is being sunsetted summer of 2025, and that directly affects Heroku teams who’ve relied on the Librato add-on for “good enough” visibility into dynos, routers, and Postgres. If you’re in that group, you’ll need a replacement monitoring add-on that keeps you covered on Heroku and lets you grow beyond it without re-architecting how you ship metrics.

The strategic art of build vs. buy in software delivery ft. Tara Hernandez of MongoDB

Rob Zuber sits down with Tara Hernandez, VP of Developer Productivity at MongoDB and former Netscape engineer who helped create early continuous integration systems, to explore strategic frameworks for build vs. buy decisions in modern software delivery.

Jaeger Monitoring: Essential Metrics and Alerting for Production Tracing Systems

Your Jaeger setup is running. Traces are coming in, and the UI is helping you spot slow services or debug broken flows. But just like any part of your observability stack, Jaeger needs some basic monitoring to stay reliable. If the collector starts queueing spans or the agent runs out of buffer, it can lead to dropped traces, sometimes without any obvious sign in the UI. This blog focuses on the operational side of Jaeger.