Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Networking Field Day 35: Selector AI and the Workings of an LLM

An LLM differs from a function in that it takes output and imputes, or infers, a function and its arguments. We first consider how this process works within Selector for an English phrase converted to a query. We then step through the design of Selector's LLM, which relies on a base LLM trained with English phrases and SQL translation, then fine-tuned, on-premises, with customer-specific entities. In this way, each of Selector's deployments relies on an LLM tailored to the customer at hand.

Networking Field Day 35: Solving the Query Problem with Selector AI

Selector translates English phrases to SQL queries through the use of an LLM. Each SQL query includes the table, or data set to be searched, along with filters, or conditions which prune the search results. We walk through a number of SQL queries and sample search results, before considering the LLM-based translation of a sample English phrase processed by Selector.

Networking Field Day 35: Selector AI Introduction with Debashis Mohanty

Selector's customer base includes 50 deployments across service providers as well as large enterprises in retail, media distribution, colocation services, and multi-cloud networking services. These customers aim to correlate events across their network, applications, and infrastructure; eliminate the need for human intervention in RCS and remediation; and democratize access to insights using conversational natural language interfaces. Selector delivers on these outcomes, while accelerating incident remediation through smart, actionable alerting and a GenAI-based conversational interface.

How to Create an Incident Communication Plan in 2024

No matter how robust your IT systems are, every business faces incidents at some point. Incidents can include degraded performance, poor response time, service disruptions, outages, and security incidents such as data breaches. This is why it’s key for businesses to have an incident communication plan that ensures all the affected parties are aware of the status of services. This includes DevOps teams, affected accounts, investors, customers, media outlets, etc.

Learnings from ServiceNow's Proactive Response to a Network Breakdown

ServiceNow is undoubtedly one of the leading players in the fields of IT service management (ITSM), IT operations management (ITOM), and IT business management (ITBM). When they experience an outage or service interruption, it impacts thousands. The indirect and induced impacts have a multiplier effect on the larger IT ecosystem. Think about it. If a workflow is disrupted because of an outage, then there are large and wide ripple effects. For example: The list goes on.

Elevate Your Database Performance: The Power of Custom Query Monitoring With DX UIM

In today's data-driven world, while new storage solutions and data lakes continue to emerge, many companies still use traditional databases with specific needs for tracking activities. Custom queries, tailored to particular applications or use cases, are crucial for identifying performance bottlenecks, slow-running queries, and resource-intensive operations.

Strategies for Efficient Log Management in Large-Scale Kubernetes Clusters

Aliaksandr Valialkin, #VictoriaMetrics CTO present "Strategies for Efficient hashtag#LogManagement in Large-Scale hashtag#Kubernetes Clusters" at hashtag#FrOSCon. Large #Kubernetes clusters can generate significant volumes of logs, especially when housing thousands of running pods. This may demand substantial CPU, RAM, disk IO, and disk space for storing and querying large log volumes. In this talk, we will look into different strategies of storing those logs in #ElasticSearch, Grafana Loki and #VictoriaLogs and examine how we can save 10x or more on infrastructure costs.