Latest Posts

Improve Consistency Across Signals with OTel Semantic Conventions

Jul 8, 2025 By Anjali Udasi In Last9

It’s 2 AM. Your API is timing out. Logs show a slow query. Metrics flag a spike in DB connections. Traces reveal a 5-second delay on a database call. But then the questions start:- Which database?- Does the query match the delay?- Why doesn’t this align with the connection pool metrics? Each tool uses different labels, db.name, database, sometimes nothing at all. Without a shared schema, connecting the dots is slow and frustrating.

Read Post

Last9

Read more about Improve Consistency Across Signals with OTel Semantic Conventions

How Replicas Work in Kubernetes

Jul 8, 2025 By Faiz Shaikh In Last9

Replicas in Kubernetes control how many copies of your pods run simultaneously. They're the foundation of scaling, availability, and recovery in your cluster. When you're running a stateless API or a background worker, understanding how replicas work directly impacts your application's reliability and performance. This blog walks through replica management, from basic concepts to production monitoring patterns that help you maintain healthy, scalable applications.

Read Post

Last9

Read more about How Replicas Work in Kubernetes

Instrument LangChain and LangGraph Apps with OpenTelemetry

Jul 7, 2025 By Anjali Udasi In Last9

In our previous blog, we talked about how LangChain and LangGraph help structure your agent’s behavior. But structure isn’t the same as visibility. This one’s about fixing that. Not with more logs. Not with generic dashboards. You need to see what your agent did, step by step, tool by tool, so you can understand how a simple query turned into a long, expensive run.

Read Post

Last9

Read more about Instrument LangChain and LangGraph Apps with OpenTelemetry

Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Jul 7, 2025 By Faiz Shaikh In Last9

Your Prometheus dashboard shows 847 CPU metrics. The alert fired—but is the problem in us-east or us-west? You're trying to rule out whether that new feature caused a latency spike, but the sheer number of time series isn’t helping. Grouping can make this manageable. By organizing metrics by shared label values, you can quickly spot which service or region is behaving differently, without digging through every metric.

Read Post

Last9

Read more about Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Docker Status Unhealthy: What It Means and How to Fix It

Jul 4, 2025 By Faiz Shaikh In Last9

If your container shows Status: unhealthy, Docker's health check is failing. The container is still running, but something inside, usually your app, isn’t responding as expected. This doesn’t always mean a crash. It just means Docker can’t verify the app is working. Here’s how to debug the issue and restore the container to a healthy state.

Read Post

Last9

Read more about Docker Status Unhealthy: What It Means and How to Fix It

LangChain Observability: From Zero to Production in 10 Minutes

Jul 3, 2025 By Anjali Udasi In Last9

LangChain apps are powerful, but they’re not easy to monitor. A single request might pass through an LLM, a vector store, external APIs, and a custom chain of tools. And when something slows down or silently fails, debugging is often guesswork. In one instance, a developer ended up with an unexpected $30,000 OpenAI bill, with no visibility into what triggered it. This blog shows how to avoid that using OpenTelemetry and LangSmith. With this setup, you’ll be able to.

Read Post

Last9

Read more about LangChain Observability: From Zero to Production in 10 Minutes

LangChain & LangGraph: The Frameworks Powering Production AI Agents

Jul 2, 2025 By Anjali Udasi In Last9

Your AI agent worked flawlessly in development, with fast responses, clean tool use, and nothing out of place. Then it hit production. A simple "What's our pricing?" query triggered six API calls, took 8 seconds, and returned the wrong answer. No errors. No stack traces. Unlike traditional systems, AI agents don't crash, they drift. They make poor decisions quietly, and your monitoring says everything's fine.

Read Post

Last9

Read more about LangChain & LangGraph: The Frameworks Powering Production AI Agents

How to Run Elasticsearch on Kubernetes

Jul 2, 2025 By Anjali Udasi In Last9

Elasticsearch stands as one of the most robust open-source search engines available today. Built on Apache Lucene, it handles complex search operations, real-time analytics, and large-scale data processing with impressive speed and accuracy. Kubernetes has transformed how we deploy and manage containerized applications. This orchestration platform automates deployment, scaling, and operations of application containers across clusters of hosts.

Read Post

Last9

Read more about How to Run Elasticsearch on Kubernetes

Logging in Docker Swarm: Visibility Across Distributed Services

Jul 1, 2025 By Faiz Shaikh In Last9

Docker Swarm's logging model shifts from individual container logs to service-level aggregation. The docker service logs command batch-retrieves logs present at the time of execution, pulling data from all containers that belong to a service across your cluster. This approach gives you a unified view of distributed applications, but it comes with its patterns and considerations for effective observability.

Read Post

Last9

Read more about Logging in Docker Swarm: Visibility Across Distributed Services

How to Write Logs to a File in Go

Jul 1, 2025 By Anjali Udasi In Last9

When your Go application moves beyond development, you need structured logging that persists. Writing logs to files gives you the control and reliability that stdout can't match, especially when you're debugging production issues or need to meet compliance requirements. This blog walks through the practical approaches, from Go's standard library to structured logging with popular packages.

Read Post

Last9

Read more about How to Write Logs to a File in Go

Operations | Monitoring | ITSM | DevOps | Cloud

Improve Consistency Across Signals with OTel Semantic Conventions

How Replicas Work in Kubernetes

Instrument LangChain and LangGraph Apps with OpenTelemetry

Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Docker Status Unhealthy: What It Means and How to Fix It

LangChain Observability: From Zero to Production in 10 Minutes

LangChain & LangGraph: The Frameworks Powering Production AI Agents

How to Run Elasticsearch on Kubernetes

Logging in Docker Swarm: Visibility Across Distributed Services

How to Write Logs to a File in Go

Monthly Archive

Follow Us