Operations | Monitoring | ITSM | DevOps | Cloud

November 2023

Sponsored Post

Buyer Beware! Three Challenges with Elasticsearch and OpenSearch

Elasticsearch and OpenSearch are powerful enterprise search and analytics engines that have become popular in the world of data management and telemetry analysis. Their ability to swiftly search, analyze, and visualize data has made them indispensable for businesses and organizations. However, in this blog, we will explore a few key challenges faced by companies using Elasticsearch and OpenSearch, shedding light on important considerations when selecting the right tool for your needs.

Sponsored Post

A leap forward in cloud migration: Avantra Cloud Edition for RISE with SAP

RISE with SAP bundles transformation services, ERP software, partner expertise guides, and business analytics to facilitate cloud migration for ERP processes and data. It lets users follow a clean core strategy for ERP optimization and innovate their business processes on a larger scale. Businesses can also enjoy real time data viewing and AI backed user interactions. The no and low code development tools further reduce operational and maintenance costs, allowing swift business process customization.

Explore Azure Integration Environments and Business Process Tracking in Public Preview

As we know, Microsoft Ignite is one of the premier conferences that developers and IT enthusiasts look forward to every year and it just concluded leaving the global audience all excited about the new capabilities that they can leverage while building innovative solutions. This edition was remarkable, with over 100+ announcements covering a range of topics from Windows and Microsoft 365 to Azure.

Azure Savings Plan vs Reserved Instances

In the world of cloud computing, Azure is a leading player, offering a wide range of services for businesses of all sizes. As organizations continue to migrate their workloads to the cloud, cost management becomes a critical aspect of their strategy. Two key options for optimizing costs on Azure are Azure Savings Plans and Reserved Instances (RIs). In this blog, we will explore the differences between these two cost-saving mechanisms and help you decide which one is right for your organization.

What is Cardinality? Cardinality Metrics for Monitoring and Observability

The transition to cloud-native architectures has led to an explosion in metrics data, both in volume and cardinality. This necessitates the development of monitoring systems capable of managing large-scale, high-cardinality data to achieve effective observability in these environments . In this blog post, we’ll explore the important role of cardinality in monitoring and observability.

Metrics to Monitor for AWS (ELB) Elastic Load Balancing

Amazon Elastic Load Balancing (ELB) allows websites and web services to serve more requests from users by adding more servers based on need. There are several challenges to operating load balancers, as discussed in a previous blog post: Microservices Load Balancing: Navigating the Waves of Modern Architecture. An unhealthy ELB can cause your website to go offline or slow to a crawl.

Istio Roadmap, Ambient Mesh, and the Service Mesh Landscape: KubeCon 2023 Updates

In the dynamic landscape of microservices and cloud-native architectures, the role of service meshes has become increasingly crucial. These programmable frameworks empower users to seamlessly connect, secure, and observe their microservices, relieving them of the complexities associated with these critical tasks within their applications. Istio, a leading service mesh project, has been at the forefront of this evolution since its inception in 2017.

Multi-Cluster Observability Part 2: Developing The Right Strategy

This is the second of a three-part blog series. Prior to reading this, be sure to check out Part 1, Benefiting from multi-cluster setups requires familiarity with common variations. In your Kubernetes journey, it's highly likely that you'll encounter the need to manage multiple clusters simultaneously.

What's New at Kentik, Episode 1

Welcome to the inaugural episode of "What's New at Kentik?" hosted by our technology evangelist, Leon Adato. Learn about the latest advancements and offerings from Kentik, the network observability company, revolutionizing how you manage your digital infrastructure. In this episode, Leon unveils complete Azure observability for Kentik Cloud, introduces the groundbreaking Kentik Kube for Kubernetes visibility, and discusses how to tackle DDoS attacks and spoof traffic with Kentik Protect.

10 Things to Consider before Multicasting Your Observability Data

This article was originally published in APM Digest here. Multicasting in this context refers to the process of directing data streams to two or more destinations. This might look like sending the same telemetry data to both an on-premises storage system and a cloud-based observability platform concurrently. The two principal benefits of this strategy are cost savings and service redundancy.

Introducing CoTerm, your collaborative terminal for pair programming and debugging

For too long, engineers have had to piece together an unwieldy combination of tools to collaboratively debug and resolve incidents while pair programming in real time. These activities normally require developers to work individually through a terminal, but the patchwork solutions that allow teams to work together in terminals all have significant drawbacks.

10 Best StatusHub Alternatives For Incident Communication and Monitoring in 2023

Before we dive into the best StatusHub alternatives, let’s quickly recap the tool’s capabilities. In short, StatusHub is an IT incident communication tool. As indicated by its name, StatusHub is focused on creating and managing status pages. Users get to leverage their connected hub of status pages to communicate system statuses, incidents, and maintenance updates to different audiences, customers, and stakeholders.

ML and APM: The Role of Machine Learning in Full Lifecycle Application Performance Monitoring

The advent of Machine Learning (ML) has unlocked new possibilities in various domains, including full lifecycle Application Performance Monitoring (APM). Maintaining peak performance and seamless user experiences poses significant challenges with the diversity of modern applications. So where and how does ML and APM fit together? Traditional monitoring methods are often reactive, resolving concerns after the process already affected the application’s performance.

Mastering Cloud Monitoring: Ensuring Uninterrupted Performance in the Digital Era

In an era defined by rapid technological advancements, the cloud has emerged as a transformative force, revolutionizing the way businesses store, manage, and process their data. According to industry reports, 90% of large organizations have already implemented multi-cloud environments. In 2020, the global cloud market was valued at approximately $371.4 billion, and it’s not showing any sign of slowing down.

Workplace Experience: See. Diagnose. Fix

Learn how Workplace Experience – powered by Nexthink Infinity delivers unparalleled visibility across all environments so you can see, diagnose, and fix digital workplace issues to continuously optimize productivity and cost in a fast-changing IT landscape. With Workplace Experience, you can lower MTTR and drive digital transformation while reducing cost, saving time, improving sustainability, increasing employee productivity and more.

Grafana vs. Power BI

Grafana vs. Power BI – This is often a confusing decision for people wanting to select a data visualization and analytics tool with visually appealing and interactive insights. In this article, we highlight the features and benefits of both tools —Grafana and Power BI, bringing out their key differences to help you make a better decision. If you're interested in trying it out for yourself, sign up for our free trial.

Is Cloud Infrastructure Right For You?

The Cloud infrastructure solution has been around for many years now and has been proven that it can optimize your operations, reduce costs, and boost the efficiencies of developers. Thanks to these benefits, organizations of all sizes in different industries are considering moving into the Cloud infrastructure if they haven’t. This article will explore the world of cloud infrastructure, helping you determine if it is the right fit for your organization.

Syslog-NG: The Sandbox That Taught Me to Appreciate Cribl Even More

Recently, we launched a new Sandbox focused on handling syslog at scale with Cribl. The marketing messaging behind the Sandbox has been done a couple times already; therefore I wanted to let y’all see what we as Cribl Technical Marketing Engineers(TMEs) actually do in our daily lives. I’ll try to keep it engaging, with tales of danger and subterfuge, but I can only take so much artistic license. What’s in a Sandbox and how the Sandbox platform functions (i.e.

Splunk SOAR 6.2 Introduces New Automation Features, Workload Migration, and Firewall Integrations

The Splunk team is proud to announce the release of Splunk SOAR 6.2 (Security Orchestration Automation and Response). We’ve been hard at work developing the latest and greatest features for this update, several of which have come from requests and suggestions from our users over on Splunk Ideas.

Paving the way for modern search workflows and generative AI apps

Elastic’s innovative investments to support an open ecosystem and a simpler developer experience In this blog, we want to share the investments that Elastic® is making to simplify your experience as you build AI applications. We know that developers have to stay nimble in today’s fast-evolving AI environment. Yet, common challenges make building generative AI applications needlessly rigid and complicated. To name just a few.

Best Practices for Implementing Microsoft Teams QoS

Microsoft Teams has emerged as a vital communication and collaboration platform for modern workplaces. As organizations increasingly rely on Teams to conduct meetings, share files, and collaborate on projects, ensuring a seamless user experience becomes paramount. To achieve this, the implementation of effective Quality of Service (QoS) measures is crucial.

DevAlert 2.0 Now Available

DevAlert 2.0, which is now immediately available from Percepio, is a major upgrade to our edge observability platform. The upgrade provides much improved diagnostic capabilities, including core dumps for Arm Cortex-M devices. This allows remote inspection of crashes, errors or security anomalies in full detail, including the function call stack, parameters and variables and with source code display.

Grafana Agent v0.38 release: new OpenTelemetry components, configuration improvements, and more

Grafana Agent v0.38 has hit the digital shelves just before the holiday season! 🧑‍🎄 The elves over at Grafana Labs have been quietly working on Grafana Agent, with more than 50 updates for all SREs and developers to use — no matter if you’re on the naughty or nice list. This includes new features, improvements, bug fixes, and significant ease-of-use changes.

AVD Monitoring for MSPs (Managed Service Providers)

AVD (Azure Virtual Desktop) has grown in popularity and for many reasons many organizations are choosing to consume AVD via a specialist Managed Service Provider (MSP). For the MSPs offering AVD, monitoring and troubleshooting AVD especially for multi-tenant environments is challenging and involves significant, costly and skilled effort using native tools such as Azure Insights or Azure Monitor.

A Guide to Predictive Maintenance & Machine Learning

Various economic pressures on businesses have created a focus on new and innovative ways to manage operational costs. At the same time, businesses are looking at using IT to help manage overall business costs and increase income—for example, by supporting remote working, and in many cases, enabling e-commerce to replace closed retail outlets.

Elevate Your IT Service Offering with an Official Icinga Partnership

In the dynamic landscape of IT services, staying ahead of the curve is not just a strategy but a necessity. For IT service providers already consulting clients on monitoring solutions, taking the next step to engage in an official partnership with Icinga can be a transformative move. Let’s explore the reasons why such a partnership can elevate your service offering and provide substantial benefits for both your business and your clients.

Challenges with Traditional SCA Tools

Application security testing tools are designed to ensure that applications are put through rigorous security assessments to identify security flaws within the application and its code. Even though applications are tested thoroughly (in static and dynamic ways), attackers always seem to find new ways of compromising them.

Top 12 Best Practices for Network Monitoring

Network monitoring is a critical component in network management, ensuring that all systems and services are running smoothly. Monitoring a network involves continuously surveilling network components to detect any potential issues or performance bottlenecks that may arise. Doing so guarantees that the network’s performance remains at its peak, providing users with a seamless experience.

Monitor Amazon S3 Express One Zone with Datadog

Amazon Simple Storage Service (S3) now offers a high-performance storage class, S3 Express One Zone, that delivers consistent single-digit millisecond data access for your most latency-sensitive applications. Designed for your most frequently accessed datasets, S3 Express One Zone replicates and stores your data within a single AWS Availability Zone, scales to process millions of requests per minute, and uses hardware and software optimized for low latency.

Introducing Uptimia's Domain Monitoring Tool: Never Miss a Domain Expiration Again!

In the realm of online management, staying vigilant about the expiration date of domain names is paramount. Uptimia's Domain Monitoring Tool emerges as a solution to streamline this process, providing timely alerts to prevent the oversight of crucial domain renewal deadlines.

An Introduction to Profiling in Node.js

CPU-bound tasks can grind your Node.js applications to a standstill, frustrating users and driving them away. You must master the art of profiling your application to pinpoint bottlenecks, and implement effective strategies to resolve any issues. In this guide, we'll explore various techniques to help you identify and fix the root causes of your Node.js performance issues. Let's get started!

How Generative AI Makes Observability Accessible for Everyone

We are pleased to share a sneak peek of Query Assistant, our latest innovation that bridges the world of declarative querying with Generative AI. Leveraging our large language models (LLMs), Coralogix’s Query Assistant translates your natural language request for insights into data queries. This delivers deep visibility into all your data for everyone in your organization.

Observability Is About Confidence

Observability is important to understand what’s happening in production. But carving out the time to add instrumentation to a codebase is daunting, and often treated as a separate task to writing features. This means that we end up instrumenting for observability long after a feature has shipped, usually when there’s a problem with it and we’ve lost all context. What if we instead treated observability similarly to how we treat tests?

Azure Cosmos DB Monitoring

Azure Cosmos DB, offered by Microsoft as an integral part of the Azure cloud platform, is a multi-model database service with a global distribution. It efficiently partitions data for the creation of exceptionally scalable applications. Being a fully managed solution, Azure Cosmos DB relieves you of the responsibilities associated with database administration. It takes care of management tasks like updates and patches automatically.

What's IT Monitoring? IT Systems Monitoring Explained

Whether on the cloud or on-premises, visibility into the inner workings of our IT services and infrastructure is an essential ingredient of a well working IT system. The drive for digital transformation as a core strategic objective for most modern enterprises has meant that ensuring IT systems are working well, secured and delivering value for money is a critical endeavor.

Logit.io Unveils Exciting Enhancements: Integrating OpenSearch 2.10.0

We're thrilled to share an exciting update from Logit.io. As part of our ongoing commitment to providing cutting-edge observability solutions to our users, we've integrated OpenSearch 2.10.0 into our platform, bringing a host of advanced features to enhance your experience. Let's dive into what's new and how these changes can benefit your observability workflows.

How to Create a Network Assessment Report with This Template

As an MSP, your role is to ensure your client’s network operates without a hitch. Checking the health of their network and sharing your findings in a network assessment report helps you identify potential vulnerabilities so you can fix them sooner rather than later. Creating these reports is a critical part of an MSP’s role, but it’s also a tedious one. That’s why we’ve created a network assessment report template.

Anomaly Detection for Time Series Data: Techniques and Models

Welcome to the third chapter of the handbook on Anomaly Detection for Time Series Data! This series of blog posts aims to provide an in-depth look into the fundamentals of anomaly detection and root cause analysis. It will also address the challenges posed by the time-series characteristics of the data and demystify technical jargon by breaking it down into easily understandable language.

What is Zero Trust and How IT Infrastructure Monitoring (ITIM) Makes it Happen

When the concept of Zero Trust emerged in 2010, it marked a sea change in how IT and network security are handled. The term, invented by Forrester Research analyst John Kindervag, is loosely based on the “never trust, always verify” motto. So why is this a sea change? Before 2010, IT focused on perimeter defenses and the concept of DMZs — areas of the network they deemed safe based on the protection they implemented.

Sysdig Achieves the Amazon EKS Ready Designation

Today Sysdig has been recognized for achieving the Amazon Elastic Kubernetes Service (Amazon EKS) Ready designation from Amazon Web Services (AWS). This specialization recognizes that the Sysdig cloud-native application protection platform (CNAPP) is validated by AWS Partner Solutions Architects to integrate with Amazon EKS and Amazon EKS Anywhere. Amazon EKS Ready Partners like Sysdig offer AWS customers the ability to customize the Kubernetes solution to fit their business needs.

How to Monitor MongoDB Metrics with OpenTelemetry

For high throughput systems that focus on gathering continuous data or have a heavy read-only traffic, NoSQL databases came as a blessing. NoSQL databases, due to their unstructured nature of data, allow relatively faster inserts as well as reads compared to relational databases. One such database that’s quite popular today is MongoDB. In this article, our focus would be to understand how to extract metrics out of MongoDB and ship them to Signoz using the Open Telemetry collector.

Memcached Metrics Monitoring with OpenTelemetry

Let's dive deep into the realm of Memcached, where we'll uncover the power of monitoring with OpenTelemetry and SigNoz. This isn't just about caching data; it's about watching over Memcached like a vigilant guardian, ensuring it performs at its best, and optimizing your application's speed. In this tutorial, you will install OpenTelemetry Collector to collect Memcached metrics that should be monitored for performance and then send the collected data to SigNoz for visualization and alerts.

Streamline your CD pipeline for Cisco Cloud Observability

How can you leverage a monitoring-as-code mechanism to initiate new workload monitoring, or to create new visualizations? In this demo, see how Cisco AppDynamics can integrate with Flux CD (Continuous Delivery)—a GitOps Kubernetes operator tool that offers a simple and efficient interface to synchronize manifests within CD workflows from GitHub repositories. See how easy it is to upgrade existing software with just a few lines of code such as when instrumenting new workloads with the OpenTelemetry Agent or customizing a Grafana dashboard.

The First 48 Hours of Ransomeware Incident Response On-Demand Webinar

The first 48 hours of incidents response is the most critical. We will explain few important steps that need to be taken to mitigate the impact on service availability, information systems integrity and data confidentiality. The cyber resilience is also covered by the individual national regulations and directives. In this on-demand webinar we’ll let's take a closer look at it and explain why principles of Network Detection and Response shall be a crucial part of technical measures implementation for regulated entities.

How to Perform A Network Stability Test: A Kickass Guide for Network Admins

Let's talk about the unsung heroes of the digital world – network administrators. For network administrators, ensuring that a network remains robust, resilient, and responsive is not just a priority – it's a mission-critical task. The ability to identify potential bottlenecks, optimize performance, and proactively address vulnerabilities is a skill that sets apart seasoned professionals from the rest.

Make your end-to-end tests more stable with Playwright's user-first selectors

Learn in this video how Playwright's user-first locators like getByRole help to write more stable tests and improve your product user experience. Use nth() or filter() to select elements by their semantic surrounding elements, and avoid relying on implementation details to test your sites.

Top Status Page Providers

Today, businesses are heavily reliant on online systems and platforms. But how would you communicate with users if you encounter downtime or an incident? This is where a status page comes in. A status page acts as a real-time communication hub, providing updates on system status, incidents, and maintenance activities. What’s even better is that you can find tons of free, self-hosted, and open-source status page systems to choose from online.

How to calculate the difference of a value over time with InfluxDB and Grafana

Learning about the past helps us understand the present, and even predict the future. So, whether you are monitoring CPU usage or how long your IoT device was powered on and then off, at some point, you might want to know the difference of a value over time. InfluxDB is an open source database for storing and retrieving time series data. Thanks to its own query languages — flux and InfluxQL — it provides different and powerful ways to analyze data.

Cloud Observer: Subsea Cable Maintenance Impacts Cloud Connectivity

In this edition of the Cloud Observer, we dig into the impacts of recent submarine cable maintenance on intercontinental cloud connectivity and call for the greater transparency from the submarine cable industry about incidents which cause critical cables to be taken out of service.

Govern your infrastructure resources with the Datadog Resource Catalog

As an administrator of an expanding, highly distributed infrastructure, you may be responsible for overseeing thousands of on-premise and cloud resources from multiple providers—governed under dozens of accounts by a complex nest of RBAC rules. To query all these resources for purposes such as compliance audits and access management, you may be required to write custom scripts and painstakingly sift through data across disparate tools.

Announcing Service Map: Troubleshoot With Context and Confidence

Logz.io is excited to announce Service Map, a new way to visualize the data flow, dependencies, and critical performance metrics throughout your microservices architecture, which makes it easy to gather critical troubleshooting context as you investigate production issues.

Using Honeycomb for LLM Application Development

Ever since we launched Query Assistant last June, we’ve learned a lot about working with—and improving—Large Language Models (LLMs) in production with Honeycomb. Today, we’re sharing those techniques so that you can use them to achieve better outputs from your own LLM applications. The techniques in this blog are a new Honeycomb use case. You can use them today. For free. With Honeycomb.

Enhanced Linux Visibility with Sumo Logic

In the continually evolving digital landscape, the importance of effective and efficient logging cannot be overstated. When we journey into the realm of Linux, this rings particularly true. Today, we'll delve into why Linux logging is vital, the challenges customers commonly encounter with it, and how Sumo Logic has emerged as a market leader in providing unparalleled SIEM solutions.

Coralogix named as AWS Rising Star Partner for 2023

Amazon Web Services, Inc. (AWS), an Amazon.com company, today announced the 2023 AWS Partner Award winners, recognizing leaders around the globe playing a key role helping customers drive innovation and build solutions on AWS. Announced during re:Invent 2023, AWS Partner Awards recognize our Top Partners of the Year and Rising Star Partners of the Year, whose business models have embraced specialization, innovation, and collaboration over the past year.

Using the Cribl Redux Stats Pack

Cribl’s internal metrics are very handy for seeing what Cribl is doing. And while there are many data points related to input vs output volumes, sometimes you need more control over what you’re tracking. This pack allows you to route arbitrarily defined traffic through a stats tracker to capture changes in event count and volume. Perhaps you are onboarding a new host, or trialing a new Pipeline.

Gartner IOCS replay: Achieving unified observability with data mesh

The single pane of glass is perhaps the most enduring and elusive goal of enterprise IT operations teams. When we polled our customers a couple of years ago, out of 184 respondents, 99% of them rated it as important to their business – with 64% indicating “extremely important”. The shared dream is to have: But unfortunately, the single pane of glass has become a bit of myth.

On-premise vs SaaS: Which Network Monitoring Tool You Should Choose?

On-premise or SaaS network monitoring tool: which one to choose? This question has been bugging many businesses for quite a few years now. And no matter how much I hate to say it but the answer to this burning question is — it depends. Yes, it depends on several factors. In this blog, we will list all those factors that you should keep in mind before making this decision.

Kibana vs Grafana: Battle Of the Dashboards

Every day, we generate tons of data, so much so that we cannot analyze them manually anymore. Visualization is a process by which the generated data will be broken down into smaller packages. Visualization extensively relies on observability and the triumvirate of Logs, Metrics and Traces. When it comes to visualizations and dashboards, Kibana and Grafana are two prominent names in the market. What makes them stand apart from their peers is their ability to conceive top-notch analytics charts.

AI Explainer: The Dirty Little Secret About ChatGPT

ChatGPT, developed by OpenAI and launched in November 2022, isn’t the only large language model that has received lots of attention lately, but it’s by far the most widely known. A previous blog post that listed a glossary of AI terms included this brief definition: You may have read over the past year that GPT-4 (the paid version of ChatGPT) has been able to pass many difficult exams. Here are just a few.

Monitor and improve your CI/CD on AWS CodePipeline with Datadog CI Visibility

CI/CD services such as AWS CodePipeline enable developers to automate and accelerate the process of building, testing, and deploying code. But with the speed, scale, and complexity of the modern software development life cycle, even small performance regressions or increases in failure rates in your CI system can quickly snowball, slowing or even halting releases and causing cost overruns.

Enhance your troubleshooting workflow with Container Images in Datadog Container Monitoring

Containers are powerful tools for scaling and deploying your applications, but with so many components pulled from different sources, there’s a greater potential for issues within them to go undetected. As a result, you need to monitor every layer of your containerized environments for vulnerabilities and performance problems—from your application to your container images.

Build custom monitoring and remediation tools with the Datadog App Builder

When you’re responding to an issue with your application in the heat of on-call, you need reliable, well-maintained tooling that’s painless to use. Otherwise, the time you’ll spend combing through monitoring data for context, connecting to hosts and other infrastructure resources, and pivoting between consoles for various managed services can add up quickly and slow your response.

Enhance your visibility into OTel-instrumented apps in AWS Lambda

Enabling auto-instrumentation for your Lambda functions provides detailed insights into the performance and security of your serverless applications. Developers often also use custom instrumentation to fine-tune visibility and further tailor telemetry to their business needs. However, different teams within your organization might use a variety of instrumentation libraries, and achieving more granular visibility can come at the expense of data portability and interoperability.

Lightning-fast troubleshooting for AWS: How to find the root cause fast with Sumo Logic

It’s time to stop firefighting. With Sumo Logic’s AWS Observability, companies like Snoop have been able to simplify data collection, achieve unified visibility across AWS accounts and regions and leverage machine learning to troubleshoot — fast. This re:Invent, we’re excited to showcase how our capabilities for AWS have evolved.

Using the Cribl API Part II: The Replay

Our previous post was all about dipping your toes into the wonderful world of API interaction. By leveraging Cribl’s API you can automate many parts of your event pipeline management and tasks. So we got that goin’ for us. Which is nice. One of the common use cases for the API I hear about is kicking off data collection automatically. Use cases include: Cribl gives you the tools to collect data when you want, from where you want, and to where you want.

Simplify Kubernetes with Cribl Edge on EKS Add-on

Let’s be honest, working with Kubernetes (K8s) has never been the easiest tech to work with. As a seasoned Kubernetes professional, I find myself constantly looking for ways to set up collecting data from my clusters, only to find out that there is a new, more complicated way to get the data I’m looking for.

Azure Logic Apps Pricing

It is vital to understand the price of Azure Logic Apps for multiple reasons, regardless of whether you are a developer overseeing their implementation or a business wishing to use the platform for your operations. Understanding the pricing model facilitates efficient budget planning and allocation for your tasks. You can avoid unforeseen expenses and make more accurate cost estimates.

Jaeger vs Zipkin: The Complete Comparison Guide

To monitor and troubleshoot the performance of microservice-based applications, Jaeger and Zipkin are examples of the most commonly used open-source distributed tracing systems. They both supply users with insight into the flow of requests through various components of a system, which can be utilized to find latency bottlenecks, errors, and performance problems in the system.

The Benefits of Cloud Computing for Small Businesses: Cost Savings, Flexibility, and Security

At one time, tech advancements were a major drain on small businesses. Housing your software and digital assets in-house once created a financial nightmare many emerging companies couldn't handle. On top of that, threats to security and a complete lack of flexibility made establishing a foothold challenging. That was especially true when going up against established industry heavy hitters.

What's the Difference between AIOps and Observability?

In the ever-evolving world of IT, keeping an eye on application, service and system performance and addressing issues in real-time is crucial both to an organization’s customer experience, as well as its overall success. Two terms and approaches that have gained significant attention in recent years are AIOps and observability. While they both relate to improving IT monitoring and management, they serve distinct roles in enhancing operational efficiency.

Uni Updates Episode 4: AppDynamics University is moving to Cisco U

AppDynamics University's self-paced and instructor-led training is moving to Cisco U., Cisco’s world-class learning experience platform, and will be available after December 15, 2023. What does this mean for you? If you have a Standard AppDynamics University subscription, self-paced training will be available in Cisco U. Free. If you have a Premium or Multi-User University subscription, self-paced and instructor-led training will be available in Cisco U. Essentials. In all cases, you will need to create a Cisco U. account using your AppDynamics University account email address.

Monitoring Microsoft SQL Server login audit events in Graylog

One of the most important events you should be monitoring on your network is failed and successful logon events. What comes to most people’s minds when they think of authentication auditing is OS level login events, but you should be logging all authentication events regardless of application or platform. Not only should we monitor these events across our network, but we should also normalize this data so that we can correlate events between these platforms.

What is Zero Trust and How IT Infrastructure Monitoring (ITIM) Makes it Happen

When the concept of Zero Trust emerged in 2010, it marked a sea change in how IT and network security are handled. The term, invented by Forrester Research analyst John Kindervag, is loosely based on the “never trust, always verify” motto. So why is this a sea change? Before 2010, IT focused on perimeter defenses and the concept of DMZs — areas of the network they deemed safe based on the protection they implemented.

Grafana Tutorial - Annotations

Grafana is a tool that helps users identify and fix performance issues by allowing them to monitor and analyze their database. Grafana is famous for making great graphs and visualizations, with tons of different functionalities. This Grafana tutorial is about one of these functionalities: Annotations. Grafana annotations are for users who want to make notes directly onto the graphs in their dashboards. There are various reasons a user might want to do this.

Network Monitoring Best Practices to Elevate Your Admin Game

Just as car owners rely on indicators to track their vehicle's health, you as a network administrator depend on monitoring systems to keep an eye on the network's vital signs. It involves inspecting and addressing various network components, from devices like routers to critical assets like applications and services, user experience and service providers. In the connectivity world where network reliability is non-negotiable, you must not settle for mediocrity.

A Simplified Guide to Kubernetes Monitoring

The open-source Kubernetes platform has become the de facto standard for deploying, managing, and scaling containerized services and workloads. In fact, 83% of DevOps teams are using Kubernetes to deploy containerized applications in production, taking advantage of its workload orchestration and automation capabilities to optimize the software development process and reduce web server provisioning costs.

IIS - A Go-To Web Hosting Solution for Windows

Internet Information Services (IIS) stands out as the primary web hosting solution for Windows-based servers. With a heritage deeply rooted in Microsoft's commitment to providing robust and reliable web server software, IIS has evolved into the top choice for hosting websites and web applications. The inherent compatibility between IIS and the Windows operating system sets it apart from other web server options, enabling it to fully leverage the strengths of the Windows environment.

How to use flow mode for Grafana Agent with Matt Durham (Grafana Office Hours #21)

Senior Software Engineer Matt Durham shows us how to use flow mode for Grafana Agent. Flow mode is a new and better way to install and configure Grafana Agent than the older "static mode". Among other things, flow mode's modularity makes it easier to build more complex workflows like traditional data pipelines and allows for more use cases than collecting and processing telemetry.

Transitioning from lz4net to K4os.Compression.LZ4

At Raygun, we’re processing billions of events per month for our customers, so it’s well worthwhile looking for the most efficient data storage solutions. Way back when we started out, we chose lz4net to store data, which served our purposes well for many years. As we grew, though, we realized this was getting expensive, and was starting to undermine our business model. This post is focused on how we made the switch to the K4os.Compression.LZ4 rewrite, attaining significant performance gains.

The role of observability in incident response

Observability has brought a new approach to IT infrastructure management, easing the workload on IT admins across the world and bringing more accuracy and efficiency. One of the clear beneficiaries of this evolution in IT infrastructure management is incident response. Incident response is the systematic process of identifying, analyzing, and mitigating security threats, breaches, or operational issues to minimize their impact on the continuity of business operations.

How To Monitor Cisco Catalyst Metrics with Grafana

In today's interconnected world, the reliability of network infrastructure, especially Cisco Catalyst switches, plays a key role for businesses. To ensure these critical components perform optimally, it's vital to monitor their metrics effectively. This article will show you how to effortlessly monitor Cisco Catalyst metrics with Grafana, a top-tier monitoring and visualization platform.

How to measure engineering team health

At SquaredUp, we have been debating how the senior leadership team can monitor the ‘health’ of our engineering teams. To do this, we decided to create a dashboard that could represent this for a team – but first, we needed to figure out what to measure. Our goal was to better understand our teams to inform which actions to take to support them and make them a happier and more productive bunch.

How to Monitor MySQL Metrics with OpenTelemetry

Database monitoring is an important aspect to look at for a high-volume or high-traffic system. The database performance drastically impacts the response times for the application. In this tutorial, you will install OpenTelemetry Collector to collect MySQL metrics and then send the collected data to SigNoz for monitoring and visualization. In this tutorial, we cover: If you want to jump straight into implementation, start with this pre-requisites section.

How to Collect .NET Application Logs with OpenTelemetry

In the realm of modern software development, achieving true observability is paramount for understanding application behavior and performance. This demonstration focuses on a.NET application that harnesses the capabilities of OpenTelemetry to seamlessly integrate logging and tracing functionalities. OpenTelemetry, a key player in the Cloud Native Computing Foundation, provides a unified framework for comprehensive observability.

How to Monitor Prometheus Metrics with OpenTelemetry Collector?

OpenTelemetry provides a component called OpenTelemetry Collector, which can be used to collect data from multiple sources. Prometheus is a popular metrics monitoring tool that has a wide adoption. If you’re using Prometheus SDKs to generate metrics, you can collect them via OpenTelemetry collector and send them to a backend of your choice.

How to Monitor Apache Server Metrics with OpenTelemetry

Monitoring Apache web server metrics ensures your web server performs efficiently, securely, and reliably. In this tutorial, you will configure OpenTelemetry Collector to collect Apache metrics and send them to SigNoz for monitoring and visualization. We cover: If you want to jump straight into implementation, start with this prerequisites section.

Advancing Cloud Monitoring: Benefits of Synthetic Monitoring

The cloud changed how businesses work, making things more flexible and adaptable. But keeping track of app performance from a user’s point of view in this new setup is tough. Legacy tools tend to not give developers an understanding of their users' perspective. That's where synthetic monitoring comes in. It's a strong way to focus on users and fix the problems that legacy tools miss.

Document Azure Environment to monitor Azure usage

As cloud adoption expands, subscribing and provisioning resources increase day by day. Regular analysis and assessment are essential to monitor usage, ensuring that cloud resource spending remains efficient and minimizing waste. The assessments can be made using many ways but will need some level of technical competence and regular maintenance: Not all stakeholders will have access and the necessity to check through these tools.

Micro Lesson: Monitoring and Troubleshooting with AWS Observability Solution

This video introduces Sumo Logic's AWS Observability solution, which is an all-in-one approach to give visibility into the important elements of the cloud infrastructure and assist in troubleshooting complex issues. This video further describes the features of the observability solution such as pre-built dashboards, prepackaged log searches, and the out-of-the-box alerts that help in monitoring and troubleshooting.

The Future of Network Monitoring: AIOPS Trends to Watch in 2024

Network monitoring is the process of continuously scrutinizing a computer network for failures or deficiencies to ensure availability and performance. AIOps (Artificial Intelligence for IT Operations) is an umbrella term for the integration of artificial intelligence (AI) technologies into network operations. It involves using machine learning, analytics, and big data to automate and enhance IT operations.

Optimize Monitoring with Time-based Threshold Profiles

Learn how to enhance your monitoring efficiency by configuring Time-based Threshold Profiles in Site24x7. In this video, we'll guide you through the process of setting up thresholds based on business hour segments, ensuring you receive alerts when you need them the most. Optimize alert frequencies and receive notifications only when it matters. Happy monitoring!

Large Enterprise Cuts Elasticsearch and SIEM Costs by 40% with Observo.ai

A large, global Data Management and AI software company with over 5,000 customers across more than 100 countries had seen unprecedented growth (more than 30% year over year) in telemetry data from their multi-cloud infrastructure being sent to the Elasticsearch Observability and SIEM Platform. The growth of this data contributed to a multi-million dollar price tag for Elasticsearch.

Observo.ai Enables Global E-Commerce Giant to Slash Splunk Costs by 50%

A Global 1000 E-commerce company struggled with the rapid growth in telemetry data that their security team analyzes with Splunk, Grafana, and other Observability tools in the cloud. Specifically, the increase in VPC Flow log and Firewall log volumes caused a spike in Splunk costs on certain data sets and triggered daily indexing limit overage fees. As this deluge of data began piling up in block storage within their Splunk index, the team saw corresponding spikes in storage costs.

Top tips: Four compelling use cases for AI in FinTech

Top tips is a weekly column where we highlight what’s trending in the tech world today and list out ways to explore these trends. This week, we’re examining four use cases for AI in the ever-growing FinTech sector. The FinTech sector has transformed the discussion around the financial services industry from top to bottom.

Troubleshooting End-User Issues With a DEM Tool

In the last decade, businesses have made massive investments in the digital economy with the goal of increasing operational efficiency and improving their customer or end-user experience. However, it isn’t rare for businesses to incur losses due to poor page load speed, failed transactions, or website errors. This is why businesses need to track end-user experience in real time and resolve issues quickly.

How to create log sinks

Are you wondering how you can route your Google Cloud logs to your desired destination? Then check out this video, where we introduce you to log sinks which can be used to route logs to various supported destinations, walk you through how it works and the list of supported destinations to which logs can be routed. It covers the different use cases and scenarios, where the logs sinks can be very useful. We’ll also demonstrate how to create and configure an aggregated log sink that sends all VPC flow logs to BigQuery.

Virtual Customer Workshop 2023

Last week, we were very excited to host our first ever SquaredUp Virtual Customer Workshop! We’d received so much positive feedback on the intimate in-person workshop we ran in the UK last month that we wanted to extend the invitation worldwide – by recreating the experience virtually. Using the digital conferencing app Gather Town, we were pleased to welcome 23 SquaredUp customers from 12 countries to our virtual “SquaredUp Town”.

Comparison of Cron Monitoring Services (November 2023)

In this post I’m comparing cron monitoring features of four services: Cronitor, Healthchecks.io, Uptime Robot, Sentry. How I picked the services for comparison: I searched for “cron monitoring” on Google and picked the top results in their order of appearance. Disclaimer: I run Healthchecks.io, so I’m a biased source. I’ve tried to get the facts right, but choosing what features to compare, and what differences to highlight, is of course subjective.

Best Practices for Monitoring Network Usage & Maximizing Performance

As organizations continue to expand their digital footprint, the demand for a robust and efficient network infrastructure has surged. To navigate the complexities of today's networks, IT professionals must not only be vigilant in monitoring network usage but also adept at implementing best practices to ensure peak performance.

Key Value Parser Delivers Useful Information Fast

Parsers make it easier to dig deep into your data to get every byte of useful information you need to support the business. They tell Graylog how to decode the log messages that come in from a source, which is anything in your infrastructure that generates log messages (e.g., a router, switch, web firewall, security device, Linux server, windows server, an application, telephone system and so on).

Proactive Network Monitoring: A Key to Network Reliability

Proactive network monitoring is no less than a superpower. Just like how the Spidey sense alerted Spiderman of any dangers, proactive network monitoring also helps you to: This is why every business big or small is investing in proactive monitoring to optimize their systems and resolve security issues in real-time. In this blog, we will share some of the key benefits of proactive network monitoring along with the top challenges and best practices that can help your business in the long run.

How to Evaluate Your Current Network Management Solution

Is your network management solution still up to the task? With remote work, cloud adoption, IoT, and more pushing networks to evolve rapidly, reliance on an outdated system risks performance issues, security gaps, insufficient visibility, and increased downtime. But when is it time to upgrade or replace your existing platform?

Nexthink Amplify & the Power of Unified Information

In the world of IT support, where time is always ticking and employee satisfaction is on the line, removing inefficiencies in ticket resolution can be nothing short of a game-changer. One major source of inefficiency in nearly every service desk operation relates to the process of obtaining and confirming information about users and devices. Comprehensive and unified information about a device and/or a user typically isn’t immediately available, and gathering it is a major burden.

Grafana Dashboard Tutorial: How to Get Started

Grafana is an open-source web application for visualizing data. You can query your data, create visuals, and receive alerts to better understand what you have. Some people think of Grafana as a Kubernetes-only tool, but in reality, it’s simply a data visualization tool that became popular within the Kubernetes ecosystem, especially when combined with Prometheus. In this post, I’ll focus on a very specific part of Grafana: the dashboards.

Best practices for writing clean, maintainable JavaScript

The world’s biggest language comes with a huge collection of conventions and guidelines from the community’s collective wisdom. Following JavaScript best practices can help achieve faster page loads and better performance, improve code readability, and make maintenance and debugging easier. Carefully crafted code can also prevent errors and security issues, especially if it’s complemented with real-time diagnostic tools such as JavaScript error monitoring.
Sponsored Post

Unveiling Azure Monitor SCOM Managed Instance

In the ever-evolving landscape of IT operations, the introduction of Azure Monitor SCOM Managed Instance (SCOM MI) stands as a testament to Microsoft's commitment to providing advanced monitoring solutions. This article is a comprehensive exploration of SCOM MI, its journey to General Availability (GA), anticipated features, the seamless integration of NiCE Management Packs, and the positive impact it promises on the world of IT operations.

Top 6 reasons for website downtime and ways to prevent it

Top 6 reasons for website downtime What is website downtime? Website downtime is a period during which a website is unavailable to its users, making them unable to carry out desired functions such as making purchases, accessing information, or ordering food. In a fast-paced digital landscape where users expect instant access, any delay or unavailability can have significant consequences for a website's business and reputation.

New AWS Policy Update: No Resale of Discounted Reserved Instances Starting January 15, 2024

Upcoming specific new AWS regulations will significantly impact how businesses handle their AWS operations. Starting from January 15, 2024, AWS will no longer permit the resale of discounted Reserved Instances (RIs) bought from the Amazon EC2 Reserved Instance Marketplace. This update aligns with AWS’s service terms, particularly Section 5.5, which explicitly prohibits the resale of discounted RIs.

Understanding Log Levels

In this video, we will discuss what log levels are, how to use them in your application, and how to monitor your logs with Sematext. We break down the intricacies of log levels, guiding you through their significance and practical implementation. Elevate your DevOps game with insights on proactive issue detection and rapid problem resolution. With a centralized logging solution like Sematext Cloud, you can enhance collaboration, minimize downtime, and boost overall system performance.

Visualize AWS Step Functions with the State Machine Map

AWS Step Functions allows you to coordinate activity from hundreds of services—including AWS Lambda, Amazon EKS, and Amazon API Gateway—to build and orchestrate serverless workflows. With Step Functions, you organize work into workflows known as state machines, in which each state defines a task or decision and specifies the next state in the workflow.

Monitor Amazon Bedrock with Datadog

Amazon Bedrock is a fully managed service that offers foundation models (FMs) built by leading AI companies, such as AI21 labs, Meta, and Amazon along with other tools for building generative AI applications. After enabling access to validation and training data stored in Amazon S3, customers can fine-tune their FMs to invoke tasks such as text generation, content creation, and chatbot Q&A—without provisioning or managing any infrastructure.

Solving Complexity Challenges with Kubernetes 360

Here at Logz.io, we realize Kubernetes is the most common infrastructure component that organizations are running on to keep their applications going. In return, we’ve made a big investment to support Kubernetes properly and give customers the tools they need to investigate and troubleshoot any issues that arise.

Elastic Observability monitors metrics for Google Cloud in just minutes

Developers and SREs choose to host their applications on Google Cloud Platform (GCP) for its reliability, speed, and ease of use. On Google Cloud, development teams are finding additional value in migrating to Kubernetes on GKE, leveraging the latest serverless options like Cloud Run, and improving traditional, tiered applications with managed services. Elastic Observability offers 16 out-of-the-box integrations for Google Cloud services with more on the way.

New in Grafana roles: Manage user permissions better with 'No basic role'

Since we introduced role-based access control (RBAC) in Grafana 9.0, users — and later, service accounts — have been required to have an assigned role that includes a basic set of permissions. This sometimes led organizations to create users and service accounts that had more permissions than necessary. As a result, Grafana administrators had to make additional adjustments to users’ permissions on a case-by-case basis.

Icinga 2 API with Let's Encrypt certificates, just for fun

In our community forum Michael already outlined the possibility to operate Icinga 2 with an external certification authority, not the one Icinga 2 generates by itself. Thomas, one of our NETWAYS colleagues, reported his experience in that field: in short, it’s easy to mess up lots of things and hard to debug them. And I absolutely agree. At the moment I’m reading TLS Mastery from Michael W. Lucas.

5 Tips For Consumers To Shop Safely This Black Friday

While it makes for bleak reading, the frenzy of sales and online shopping activity surrounding Black Friday, means this pre-holiday season is a key period for cybercriminals. And each year we see an increase in cyberattacks during what should be a feel-good time. With more consumers expected to be shopping online this year, the opportunity for fraudulent behaviour is rife. But that doesn’t mean we have to surrender to the risks of poor website security.

Turkeys, Tech, and Table Settings: A Humorous Guide to IT Security at Thanksgiving Dinner

Let’s set the table a bit. As you know, in the U.S., Thanksgiving is coming up. And recently I had a conversation with my 83-year-old mother about Thanksgiving. Of course, we came across the inevitable parallels between Thanksgiving dinner and network security! That’s what you would be thinking when talking about Thanksgiving dinner with someone right? Before we dive into the feast, let me set the table.

Defensive Instrumentation Benefits Everyone

A lot of reasoning in content is predicated on the audience being in a modern, psychologically safe, agile sort of environment. It’s aspirational, so folks who aren’t in those environments may feel like the path there includes doing “the new thing” or using “the new tool.” If you write software and your employer hasn’t caught up to all the newest, best ways to work, I hope this pragmatic post helps you sleep better at night.

Effortlessly monitor AWS services in Grafana Cloud

Including AWS service metrics and logs into a single pane of glass helps engineers get holistic visibility into their infrastructure. Analyze 60+ AWS services across your individual accounts and regions without the toil of configuring data and building dashboards from scratch. Learn how to: Sign up for a free Grafana Cloud account today and unlock the potential of distributed tracing in your performance testing workflow.

Optimizing SD-WAN Monitoring Efficiency for MSPs: SATLX Deploys Obkio for Distributed Continent-Wide Network Monitoring

In today's rapidly evolving digital landscape, organizations across industries increasingly rely on Software-Defined Wide Area Networks (SD-WANs) to streamline their network infrastructure and enhance connectivity between geographically dispersed locations. As the demand for efficient and reliable network performance continues to grow, Managed Service Providers (MSPs) face the challenge of effectively monitoring and managing these complex SD-WAN environments.

Azure Key Vault Certificate Expiration Monitoring

Azure Key Vault is a cloud-based service provided by Microsoft Azure for managing and safeguarding cryptographic keys, secrets, and digital certificates used by cloud applications and services. It offers a secure and compliant way to store, manage, and control access to these sensitive pieces of information. Key Vault is designed to help protect the confidentiality and integrity of data, enhance the security of applications, and simplify key management tasks.

Significance of Website Monitoring for Educational Institutes

We have come a long way from dusty chalkboards and lightbulb-powered overhead projectors. The modern-day educational landscape has dramatically shifted to digital platforms. Students, educators, and staff are dependent on web portals and online services to facilitate learning and administrative tasks. However, a lingering question persists: How resilient and reliable are these platforms?

Are your Holiday experience SLOs in place?

If your digital service relies on holiday traffic, now is the time to check your Service Level Objectives (SLOs). SLOs are the performance objectives you set for your service to meet its Service Level Agreements. Even if they’re internal and not guaranteed to customers, SLOs can be very helpful in measuring the quality of service you’re providing.

Multi-Cluster Observability Part 1: Building A Foundation

In the world of modern Kubernetes, things have come a long way from the days of a single cluster handling one app. Now, it's common to see setups that span multiple clusters across different clouds. Initially, managing those clusters was a complicated operation with many moving parts. Using tools such as SUSE Rancher, RedHat OpenShift or AWS EKS, made managing multiple clusters somewhat easier.

Building scalable OSS observability with Mimir, Loki, Tempo, and Pyroscope | ObservabilityCON 2023

In this video, we cover the latest and greatest news about the scalability and performance of the open source telemetry backends that make up the Grafana LGTM Stack: Grafana Mimir for Prometheus metrics, Grafana Loki for logs, and Grafana Tempo for traces.

Promoting & Protecting the Patient Experience: How HBL ICT Put People at the Center of their IT Strategy

Patient care is the core of the mission for the HBLICT team. Serving more than 1.2 million adults and children in Hertfordshire, West Essex, Bedfordshire and East England, they provide a range of community health services and hold the highest standards for their care. Achieving this high quality of care requires a dedication to patient experience – and a dedication to the experience that their clinical staff have with technology.

The App Attention Index 2023: Beware the Application Generation

The latest App Attention Index published by Cisco AppDynamics reveals how younger consumers — “The Application Generation” — is punishing brands whose applications and digital services fail to perform. Across the world, attitudes and behaviors toward applications and digital services are evolving at speed.

Anodot vs. CloudZero: Who's the Optimal Platform for FinOps in the Cloud?

Our solution (Anodot) and CloudZero are popular choices regarding third-party solutions for businesses managing cloud costs, adapting a FinOps approach to make their cloud operations more efficient. That’s why a platform that can effectively support a FinOps model has become necessary for optimizing cloud functionality. Let’s analyze and dive deeper into who offers the best solutions, technology, and support to take your FinOps culture in the cloud to the next level.

Do you need an OpenTelemetry Collector?

When you use OpenTelemetry SDKs to collect logs, metrics, and traces from infrastructure or an application, you’ll find many references to people using Grafana Agent and OpenTelemetry Collector. They start with an application or infrastructure that sends telemetry, and that data is sent to a collector, which then sends it to a backend like Grafana that may perform many functions, including visualization.

Monitor the state of your Tailscale private network with Datadog

Tailscale is a modern remote access solution that allows customers to easily scale, segment, and manage a private network as their business grows. It enables encrypted point-to-point connections using the open source WireGuard protocol, so that devices on your private network can only communicate with each other.

C-suite insights on the transformative power of generative AI

Generative AI is revolutionizing the way businesses operate, from improving operational resilience to mitigating security risks and enhancing customer experiences. In a recent roundup of c-suite insights from three IT leaders — Matt Minetola, CIO, Mandy Andress, CISO, and Rick Laner, chief customer officer — we gain a comprehensive understanding of how generative AI is being used to improve business outcomes across organizations.

10 Top Help Desk and IT Asset Management Software for K-12

In 2023, K-12 schools are becoming more dependent on technology than ever before to enhance students’ learning experiences. The use of devices in teaching has seen a significant rise. For example, 42% of teachers used technology daily in 2016, but in 2019 this number rose to 67%. Given this surge in tech usage, IT administrators in K-12 schools face the challenge of managing these assets and providing support.

Splunk Edge Hub: Physical Data, Sensing and Monitoring on the Edge

Splunk Edge Hub device is a multi-component solution that includes a hardware device coupled with the Splunk platform and solutions that our partners build on top of both. It is a powerful tool that can help collect, distribute and act on data from edge devices and sensors, making it easier to capture and act on data that can be difficult to access physically or digitally.

Kubecon 2023: Code, Culture, Community, and Kubernetes

Kubecon 2023 was more than just another conference to check off my list. It marked my first chance to work in the booth with my incredible Kentik colleagues. It let me dive deep into the code, community, and culture of Kubernetes. It was a moment when members of an underrepresented group met face-to-face and experienced an event previously not an option.

A Closer Look at AlertBot's Email Reports

Here at AlertBot, we know that our customers don’t want to get bogged down with mountains of raw information about their websites and related processes. Instead, they want clear, organized, and reliable intelligence that tells them: what happened recently, what’s happening now, what’s likely to happen in the near future — and what they can do about it. That’s where email reports enter the story.

Validate JSON files against schema in Azure DevOps build

JSON files have become part of our daily lives. We use JSON files for all sorts of tasks like settings, defining database schemas, and much more. The other day I found out that invalid JSON files had been pushed to one of our repositories. So, I decided to include JSON file validation as part of our build on Azure DevOps. In this post, I'll share the solution. I'm sure you can think of a scenario where invalid JSON files either do not parse as valid syntax or don't conform to the intended format.

How SpyCloud Architected Its Cribl Stream Deployment

In this livestream, I talked to Ryan Saunders – Manager of Security Operations at SpyCloud, about how he used the Cribl Reference Architecture to build a scalable deployment. He explained how this approach enabled SpyCloud to grow alongside its evolving needs without requiring significant rework. The reference architecture also facilitated a repeatable data-onboarding process, reducing administrative time and allowing the team to focus on critical security and data analysis tasks.

Optimizing VPC Flow Logs - Part 1

Amazon Web Services (AWS) VPC Flow Logs is a feature designed to capture and provide information about the IP traffic that flows to and from network interfaces within your Virtual Private Cloud (VPC). This data can be published to various destinations, including AWS CloudWatch Logs, AWS S3, or AWS Kinesis Data Firehose. Flow logs serve several important purposes, such as diagnosing security group rule issues, monitoring incoming and outgoing traffic, and determining traffic directions.

Exoprise Digital Workplace Insight 2023

In the constantly shifting sands of the IT landscape, keeping pace with the rapid technological advancements and fluctuating economic conditions is a challenge that IT professionals face daily. The Exoprise Digital Workplace Insight 2023 Survey provides a crucial barometer for these changes, offering a comprehensive view of the industry through the lens of those who navigate its complexities every day.

Network Overhead, Latency with Secure Access Service Edge (SASE)

Digital Experience Monitoring (DEM) has become an area of focus for Secure Access Service Edge (SASE) vendors. As businesses adopt SASE or security-as-a-service technology for compliance and security, they must consider the overall employee digital experience. SASE architectures add network overhead and impact performance, response times, and latency. In this article, we will delve into.

Make the leap to Exchange Online with seamless migration

As businesses make their transition to the cloud, administrators find themselves with the task of migrating their on-premises Exchange environments to Exchange Online. This migration journey includes multiple challenges, like preserving data integrity, addressing security concerns, ensuring minimal downtime, and delivering seamless user experiences.

Active vs. Passive Monitoring: What's The Difference?

Today, it’s perfectly normal for businesses to continuously monitor software applications and IT infrastructure to ensure uninterrupted customer service. Active and passive monitoring are the two popular methods enterprises use for infrastructure and application performance monitoring (APM). As the names indicate, these two approaches to monitoring are very different.

The Leading Jaeger Dashboard Examples

Unlocking the full potential of observability and tracing in modern software ecosystems has become imperative for businesses striving to deliver improved reliability and user experience. In this comprehensive roundup, we will dive into the world of Jaeger-incorporated observability and tracing dashboards, offering a curated selection of the best use cases that empower DevOps teams, engineers, and developers to gain unparalleled insights into the inner workings of their applications.

What is CI/CD observability, and how are we paving the way for more observable pipelines?

Observability isn’t just about watching for errors or monitoring for basic health signals. Instead, it goes deeper so you can understand the “why” behind the behaviors within your system. CI/CD observability plays a key part in that. It’s about gaining an in-depth view of the entire pipeline of your continuous integration and deployment systems — looking at every code check-in, every test, every build, and every deployment.

From Cloud to AI: The Evolution of IT Infrastructure

Zenoss Chief Product Officer Trent Fitz was recently featured on the "Tech Talks Daily" podcast, hosted by Neil Hughes. Trent is a pioneer in the realm of AIOps with over two decades of experience in artificial intelligence, cloud computing and cybersecurity. His expertise in cloud computing has significantly influenced his work with AIOps, especially in managing the increasing complexity of cloud-based infrastructure.

Track Frontend JavaScript exceptions with Playwright fixtures

Table of contents Frankly, end-to-end testing and synthetic monitoring are challenging in today’s JavaScript-heavy world. There’s so much asynchronous JavaScript code running in modern applications that getting tests stable can be a real headscratcher. That’s why many teams rely on testing mission-critical features and treat “testing the rest” as a nice to have. It’s a typical cost-effort decision.

Secure and monitor infrastructure networking with Buoyant Enterprise for Linkerd in the Datadog Marketplace

As organizations adopt Kubernetes, they face gaps in security, reliability, and observability such as unencrypted communication, lack of multi-cluster support, and missing reliability features like circuit breaking. Buoyant Cloud is the dashboarding and automated monitoring component of Buoyant Enterprise for Linkerd, which helps organizations secure and monitor communication between Kubernetes workloads.

Centrally govern and remotely manage Datadog Agents at scale with Fleet Automation

As customers scale to thousands of hosts and deploy increasingly complex applications, it can be difficult to ensure that every host is configured to give you the visibility you need to monitor your infrastructure and applications. To ensure visibility across a growing number of hosts, you need to know that your observability strategy is implemented uniformly across your entire fleet of Datadog Agents installed on these hosts.

5 Simple Steps to Reduce Your AWS S3 Bill

Understanding your AWS S3 billing is crucial to effectively manage and reduce your costs. Charges in AWS S3 are primarily based on three factors: the amount of data you store, the number of requests you make, and data transfer fees. Storage costs are calculated per gigabyte (GB) stored, which are tiered depending on the total size of your data. Requests costs are incurred with each put, get, or list operation on your objects, with prices varying based on the type of request.

Application Observability and Beyla Demo | ObservabilityCON 2023

In cloud native environments, finding and resolving issues across services and between application and infrastructure dependencies can be challenging. In this recording, we provide demos on Grafana Cloud’s latest capabilities for correlating application and infrastructure observability: Application Observability and Beyla — both generally available. You will hear how Grafana unifies and contextualizes service relationships and application and infrastructure dependencies to help you resolve problems faster.

User-centered observability: load testing, real user monitoring & synthetics | ObservabilityCON 2023

Understanding your end users’ experience with your applications and services is critical, and there are a variety of tools to help. But there are also a number of different use cases: During development or in production? Simulate user behavior or monitor real user behavior? What should you use and when? This recorded session explores when and how to apply load testing, synthetic monitoring, and real user monitoring to gain insights into the end user experience of your critical applications.

Grafana SLO Demo: Prioritize critical resources with SLO-driven IRM | ObservabilityCON 2023

A majority of respondents in our Observability Survey said they were using SLOs or moving in that direction. For good reason: By highlighting the most critical error budget burndown, service level objectives (SLOs) can help you prioritize performance issues based on business impact. In this recording, Josh Abreu Mesa and Reem Tariq walk through how Grafana Cloud’s integrated SLO and Incident Response Management (IRM) capabilities can help you identify the most important issues and resolve them quickly.

How Pipedrive switched its observability stack to OpenTelemetry & LGTM | ObservabilityCON 2023

The cloud-based CRM company Pipedrive has been relentlessly modernising its observability stack, first adopting Grafana visualisation and Grafana Mimir for Prometheus metrics, then recently completed a migration of its distributed tracing from a third-party SaaS provider to OpenTelemetry and Grafana Tempo, and its logging stack from Graylog to Grafana Loki. Along the way, the team developed its own in-house library to include OpenTelemetry in its roughly 750 microservices.

Manage metrics & logging costs with Grafana Cloud + Log Volume Explorer demo | ObservabilityCON

Are your SRE and platform teams under pressure to ingest fewer metrics and logs in the name of cost savings? Reducing costs does not have to mean reduced observability. This recording walks through the cost management features in Grafana Cloud that allow you to analyze, attribute, monitor, and optimize your metrics and logs usage – and lower costs – without compromising your observability strategy.

Chasing the Rainbow: Towards Unified Service Metrics

As Zendesk migrated from a monolithic application to an ecosystem of hundreds of services, its need for fully unified and standardized observability became a chief concern. In this talk, Senior Principal Engineer Daniel Schierbeck shares how adopting a service mesh has helped Zendesk teams manage its growing number of services while standardizing its observability. He also explains how Zendesk’s approach to monitoring service interactions has evolved as it adopted Datadog metrics and Datadog APM.

Master Your SaaS Discovery Process Using Auvik

SaaS discovery is really easy if you’re an end user. You can probably find a product that meets your needs with a Google search and a credit card. However, SaaS discovery from an IT management and governance perspective is a whole different beast. In the past, there haven’t been a lot of easy ways to detect application usage in the browser or restrict user activity without creating hyper-restrictive internet usage policies. Enter SaaS discovery platforms.

The Ultimate Network Monitoring Solution Buyer's Guide

When was the last time your network threw a surprise party – and by party, we mean a full-blown connectivity catastrophe? If you're thinking, "Not on my watch," well, kudos to you! But here’s the reality check – it's not a matter of 'if' but 'when'. With more than 30 years of collective experience in the industry, our team knows that network issues come knocking sooner rather than later, and usually when you least expect it. So, how ready are you for the pitfalls hidden in the future?

Azure Integration Account Certificate Expiration Monitoring

The Azure Integration Account is part of the Logic Apps Enterprise Integration Pack (EIP) and serves as a secure, easily manageable, and scalable repository for the integration artifacts that you create. You can create and store agreements, certificates, maps, partners, and schemas in your integration account and refer them seamlessly across all your Logic Apps. This streamlined approach facilitates the swift and hassle-free creation of B2B processes using Logic Apps.

What is Bandwidth Monitoring: Benefits, Efficiency and Best Practices

Bandwidth Monitoring is a vital function essential for the smooth flow of data and business success in the digital landscape. Many businesses, individuals, and data centers use networks to share information or services with each other. However, at times customers may experience poor results from unreliable connections, but these problems could also be signs of more serious network problems.

Traditional Network Monitoring vs. AIOps Network Monitoring: A Comparative Analysis

The digital shift has particularly influenced the world of network management, where traditional monitoring is gradually giving way to AI operations (AIOps) solutions. It illustrates a clear shift towards automated and predictive IT operations management. While traditional systems have their place, particularly in smaller or less complex environments, AIOps represents the future of network monitoring, offering the ability to anticipate and prevent issues.

OpenTelemetry Java Tutorial | Auto-Instrument Java App with OpenTelemetry

OpenTelemetry stands at the forefront of modern observability practices, revolutionizing how developers gain insights into their applications' performance and behavior. As a powerful distributed tracing framework, it empowers engineers to effortlessly instrument their applications, providing comprehensive visibility into the intricacies of microservices architectures. This tutorial discusses how OpenTelemetry can be used to get insights from a Java application.

Licensing and the Future of Open Source | Day 5 | Sentry Launch Week

Today we’re taking another step by relicensing both Sentry and Codecov under a new license we’ve written called the Functional Source License (FSL). FSL is an evolution of BSL that deepens our commitment to balancing user freedom and developer sustainability. We think it is a compelling option for Open Source-minded SaaS companies such as ourselves who wish to grant freedom without harmful free-riding.

Generative AI & Enterprise IT: Overhyped or Radically Under Estimated?

Join Cribl’s Jackie McGuire and Ed Bailey as they discuss AI's current and future state. They will discuss the many challenges and vast promise of this promising way to increase productivity and solve problems. In addition, Jackie and Ed will also comment on SolarWinds’ response to the SEC charges alleging Solarwinds and its CISO defrauded investors by repeatedly misleading them about its cybersecurity posture. Please join us for a great conversation.

NiCE VMware Management Pack 5.7

In the dynamic realm of virtualization management, the latest release of the NiCE VMware Management Pack is bringing new, much-desired monitoring features to the SCOM admin table. Packed with a slew of powerful features and enhancements, this release promises to enhance the monitoring capabilities for Microsoft System Center Operations Manager (SCOM) and Azure Monitor SCOM Managed Instance.

Five worthy reads: The future of tech is clean

Five worthy reads is a regular column on five noteworthy items we’ve discovered while researching trending and timeless topics. In this edition, we are exploring the emerging market for climate technology, exploring their significance, and addressing why a successful path forward lies in embracing clean, green, and planet-friendly solutions for both startups and established companies. Let’s dive right in.

Performance optimization techniques in time series databases: function caching

Relabeling is an important feature that allows users to modify metadata (labels) of scraped metrics before they ever make it to the database. As an example, some of your scrape targets may generate metric labels with underscores (_), and some of your targets may generate labels with hyphens (-). Relabeling allows you to make this consistent, making database queries easier to write.

OpenTelemetry Operator Complete Guide [OTel Collector + Auto-Instrumentation Demo]

Manually deploying and managing OpenTelemetry components in a Kubernetes environment can be a complex and time-consuming task. It involves creating various Kubernetes resources, setting up configurations, and ensuring the components are properly integrated with the applications.

What Is Observability? Key Components and Best Practices

Software systems are increasingly complex. Applications can no longer simply be understood by examining their source code or relying on traditional monitoring methods. The interplay of distributed architectures, microservices, cloud-native environments, and massive data flows requires an increasingly critical approach: observability.

How the LGTM Stack changed the observability culture at Wise Payments

The observability team at Wise Payments – Europe’s leader in cross-border money transfers – had long provided the company’s developers access to a multitude of tools. But as costs and complexity increased, Ibukun Itimi, Engineering Lead for Observability and Andrew Brown, Reliability Squad Lead, saw an opportunity to change not only the tools they were using, but also the observability culture.

Data lakes vs data warehouses explained

In the era of big data, choosing the right data storage solution is crucial for organizations to harness the power of their data. Understanding the differences and benefits of data lakes and data warehouses can help businesses make informed decisions on which option best suits their needs. In this blog post, we will explore data lakes and data warehouses, their architecture, and their key features, enabling you to make the right choice for your organization.

How OpsRamp Can Monitor Your Enterprise Applications

Application tracing may be getting all the hype these days and rightfully so, as monolithic Java and.Net applications give way to microservices-based applications in modern IT environments. Distributed tracing provides visibility into the flow of requests between the microservices that make up these applications, helping you to spot performance and network connectivity issues.

Announcing the Splunk Add-on for OpenTelemetry Collector

The Splunk Add-on for OpenTelemetry Collector is a variation of the Splunk Distribution of the OpenTelemetry Collector that simplifies metrics and traces data collection, configuration and management. Since it is an add-on, users can deploy it alongside Universal Forwarders using tools like Deployment Server to start collecting high-fidelity metrics and traces from 1000s of their hosts easily. We’re happy to announce that the Add-On is now generally available in Splunkbase.

Deployment Frequency (DF) Explained

Technical teams use various metrics and indicators to track performance and success. For DevOps teams, among the most important metrics is deployment frequency. Deployment frequency can help you evaluate the software delivery performance of teams that develop software and apps. In this article, I’ll look at using this metric to calculate deployment rate, the importance and best practices for improving your deployment rate and setting your DevOps team up for success.

Introducing the Functional Source License: Freedom without Free-riding

Sentry started life in 2008 as an unlicensed, 71-line Django plugin. The next year we began publishing it under BSD-3, and ten years later we switched to the Business Source License (BSL or BUSL). Last year we purchased Codecov, and a few months ago we published it under BSL/BUSL as well. That led to some vigorous debate because of our use of the term “Open Source” to describe Codecov, from which emerged this helpful suggestion from Adam Jacob, co-founder of Chef.

Microsoft Ignite 2023: Azure Monitor SCOM Managed Instance now available

Today, at Microsoft Ignite, the release of Azure Monitor SCOM Managed Instance (SCOM MI) was announced. In recent years, the IT industry has increasingly gravitated towards cloud-based solutions. Microsoft, as a leader of innovation, has been at the forefront of this shift, by consistently developing the best possible cloud alternatives for their products.

Best practices to scale and modernize your observability strategy

ObservabilityCON 2023 took place in London this week, showcasing all the latest and greatest trends in open source observability. Following the opening keynote, the event featured a range of breakout sessions — led by both Grafana Labs experts and members of the Grafana OSS community — that explored observability best practices and lessons learned.

Strengthen operational resilience with Service Chain Mapping. Watch our 60 second overview.

Watch this short video to learn how Interlink’s Service Chain Mapping solution transforms the ability of banking and finance organizations to address regulatory demands, manage operational risk, and avoid technology failures that could disrupt key customer journeys.

No Signal, No Problem: Troubleshooting Network Disconnection

Welcome, Network Admins and IT Pros! In the fast-paced world of digital infrastructure, network hiccups are more than just inconvenient—they're hurdles that demand swift resolution. In this guide, we're cutting to the chase. Designed for the practical minds that keep the digital gears turning, this blog is your go-to resource for tackling network disconnections head-on.

Mastering Firewall Logs - Part 1

A firewall is a network security device or software that is used to monitor and control incoming and outgoing network traffic based on predetermined security rules. Firewall Logs contain valuable information about network and security events. These logs are essential for security and infrastructure monitoring for enterprises. While this data is critical to securing enterprise networks, they are also one of the most voluminous data types security teams use to monitor and secure their networks.

5 Elasticsearch Disadvantages You Should Know

Since its initial release in 2010, Elasticsearch has grown into the most popular enterprise search engine with use cases that range from web crawling and website search to application performance monitoring and security log analytics. But despite its widespread adoption and success, Elasticsearch does have some notable disadvantages that you should consider - especially if you’re envisioning a high-scale deployment with a large amount of daily ingestion.

SDK & Integration Updates: Sentry for every platform, framework, and tool

It seems like these days there’s a new exciting framework or dev tool launched every week. The challenge is that even if you’re ready and able to use new products, your existing tooling might not be up to the task; it could be months or years before your developer tools add support for the burgeoning platforms you want to use.

Power Automate Monitoring

Power Automate is a cloud-based service that Microsoft provides as part of Power Platform. It allows users to create automated workflows integrating various applications and services. It is designed for a wide range of users, from business professionals looking to automate daily tasks to developers who can create more complex automation. A trigger initiates a Flow in Power Automate followed by predefined actions. Each execution of the Flow creates a run.

Does Tracealyzer fit into my project?

Every developer wants universally applicable tools for their embedded development. However, hardware (processor type) and software (RTOS) architecture can limit the choice, making it a decision for the second-best rather than the best tool – even if you are in the mainstream. This is one of the reasons why Software Development Kits (SDKs) are becoming increasingly popular – even more so if they are easy to use and deploy.

Implementing QoS for VoIP for Exceptional VoIP Call Quality

Businesses and individuals alike have embraced the flexibility and cost-effectiveness of VoIP, but ensuring a seamless and high-quality experience requires more than just a reliable internet connection. Enter Quality of Service (QoS), a crucial framework that governs and prioritizes network traffic to guarantee optimal performance for real-time applications like VoIP.

SDK & Integration Updates: Every platform, framework, and tool | Sentry Launch Week | November 2023

Today, we’re announcing new capabilities for emerging web development platforms and frameworks – like Bun, Deno, Next.js, and Remix – as well as improvements to our Mobile project onboarding flow. We’re also expanding our ecosystem reach with new APIs to make it easy to connect Sentry to any tool, key improvements to our existing integrations, and launching four new integrations: Opsgenie, Discord, Cloudflare Workers, and LaunchDarkly.

The future of Sumo Logic begins at the atomic level of logs

This time of year, complete with Thanksgiving, re:Invent and December holidays around the world, ends up feeling like a natural moment to pause, reflect, and plan for what’s ahead. This is especially true this year, as it also marks my half-year anniversary as CEO of Sumo Logic. I have a strong sense of why I joined, what I’ve learned since leading the incredible team of Sumos, and where I see us going in the future.

AI Explainer: What Are Neural Networks?

In a previous blog post, which was a glossary of terms related to artificial intelligence, I included this brief definition of "neural networks": Let’s go a bit deeper on that. Neural networks are a class of artificial intelligence (AI) and machine learning models inspired by the structure and functioning of the human brain. They are a subset of AI techniques that have gained significant popularity due to their ability to learn and make decisions from data.

How We Handle Upgrades at AppSignal

At AppSignal, our pricing revolves around the number of requests we process for a customer and the number of buckets of logging data we store. After their free trial, customers are offered the most fitting plan for them based on their usage in the previous 30 days. About nine years ago, we noticed that many customers were slowly growing their number of requests, but we kept charging them for the plan they started on.

Infrastructure Management & Lifecycle Explained

IT infrastructure must meet enterprise needs for effective service delivery while also providing value for money. This is a critical undertaking. Massive data growth, increased complexity of hybrid cloud environments, and emphasis on digital-first strategies are just some of the challenges. This requires an advanced approach to how infrastructure is configured and controlled — infrastructure management.

The Grafana OpenTelemetry Distribution for Java: Optimized for Application Observability

The OpenTelemetry project provides many different components and instrumentations that support different languages and telemetry signals. However, new users often find it hard to pick the right ones and configure them properly for their specific use cases. For this reason, OpenTelemetry defines the concept of a distribution, which is a tailored and customized version of OpenTelemetry components. Here at Grafana Labs, we are all-in on OpenTelemetry.

The Grafana OpenTelemetry Distribution for .NET: Optimized for Application Observability

The OpenTelemetry project provides many different components and instrumentations that support different languages and telemetry signals. However, new users often find it hard to pick the right ones and configure them properly for their specific use cases. For this reason, OpenTelemetry defines the concept of a distribution, which is a tailored and customized version of OpenTelemetry components. Here at Grafana Labs, we are all-in on OpenTelemetry.

Aggregating Logs From Microservices-Best Practices

Depending on where you are on your journey with microservices, you may have noticed visibility into the system can be a bit tricky at times. Well, there’s good news. Not knowing what’s going on in the system is a solvable problem. One of the first things you can do is get your logs in order. And one of the best ways of doing so is aggregating your logs into a single logging service.

Proactive monitoring with automated alerts

E-commerce DevOps teams can use AppDynamics to monitor the health and performance of their applications, receiving alerts on issues BEFORE significant business impact. Devs can use AppDynamics to provide automatic email and text notifications about such issues. In this demo, see how you can harness a custom email alert notification to view anomalous synthetic transaction events within the Browser App Dashboard, drill into the Heath Rules violation page, and link to a custom dashboard to troubleshoot an unexpected increase in synthetic end-user response time for shopping cart activity.

OpenTelemetry Webinars - The Open Agent Management Protocol

Open Agent Management Protocol (OpAMP) is the emerging open standard to manage a fleet of telemetry agents at scale. Join Nica and Srikanth as we discuss recent updates to the standard and how you can remotely manage the OpenTelemetry collector with OpAMP. SigNoz helps developers monitor applications and troubleshoot problems in their deployed applications. SigNoz uses distributed tracing to gain visibility into your software stack.

OpenTelemetry Webinars - The Trace API

Join Nica and Srikanth to talk in detail about the OpenTelemetry Trace API. We'll talk about adding spans, events, attributes and other extra info, whether it's really possible to replace logs with traces, and more More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator.

Transform Your Kubernetes Troubleshooting With Powerful Data Correlations

As developers and engineers, we're all too familiar with the manual labor of connecting disparate data points—metrics, logs, events and resource status. We're also familiar with a continual need to simplify and unify these elements into a seamless troubleshooting experience. This on-demand webinar looks at how data correlation approach provides a holistic view of complex system interactions and can move you from issue awareness to full resolution without juggling different tools or performing mental gymnastics. All on a single dashboard!

Introducing Responsive Pipelines from Mezmo

The ability to swiftly resolve incidents is central to SREs responsible for a service's reliability and its users' satisfaction. Mezmo has recognized this need and, at Kubecon, unveiled an innovative solution: Mezmo Responsive Pipelines. Responsive Pipelines enable users to pre-configure a Pipeline to respond automatically in the case of an incident.

Recapping KubeCon North America 2023

If you missed KubeCon North America 2023 in Chicago, or you were there and spent more time in the “hallway tracks,” you may have missed some of the big news that came out of the show. We covered the big happenings in the open source cloud native and observability realm in the latest episode of OpenObservability Talks!

Reach new heights in business excellence with full-stack observability

Organizations are constantly looking to grow and expand, which requires establishing strong foundations, especially for the IT infrastructure. The challenge in achieving this is to consistently push the limits of the IT infrastructure to deliver more business excellence. To ensure success, management operations should be fine-tuned, and this often requires improving tool sets, skillsets, and personnel.

Incident communication best practices for an elevated user experience

Downtime is unavoidable, and incidents happen. Organizations need to be rapid and transparent in communicating incidents with their customers. Lack of timely communication can jeopardize the entire incident management process and increase user frustration. This guide provides rich insights into what incident communication is, why it's important, and best practices for effective incident management. What is an incident, and why is incident communication important?

Icinga Monitoring is the trusted "source of truth" for Scandinavian company NTE

We are proud of our many customers and users around the globe that trust Icinga for critical IT infrastructure monitoring. That´s why we´re now showcasing some of these enterprises with their Success stories. It´s stories from companies or organizations just like yours, of any size and different kinds of industries. Some of them are our long-standing customers, others have just recently profited from migrating from another solution to Icinga.

The Challenges of Collecting Runtime Data

Collecting data in real-time plays a crucial role in securing, monitoring, and troubleshooting applications. This real-time data, often referred to as ‘runtime data,’ provides unique insights into the application’s behavior, which aren’t available through other collection techniques. But the tricky part is that collecting runtime data comes with challenges.

How a Modern Integration MESH Changes our Roles and Responsibilities

There was a time not too long ago, before the cloud was a part of every enterprise technology conversation, when integration work was considered the purview of a specific architecture and engineering group. If messages failed to send, or services failed to respond, application stakeholders would create a trouble ticket for the integration team to address. In some ways, this separation of labor was effective enough at the time.

Managing Cisco Switch Logs with Kiwi Syslog Server

Network management, particularly the effective handling of system logs, is crucial in maintaining a high-performance and secure IT infrastructure. Log files, or simply logs, are generated by network devices such as switches and routers, serving as valuable resources to understand the intricacies of network performance, spot anomalies, and even comply with regulatory requirements.

Alerts Should Work for You, Not the Other Way Around

The entire reason we have monitoring is to understand what users are experiencing with an application. Full stop. If the user experience is impacted, sound the alarm and get people out of bed if necessary. All the other telemetry can be used to understand the details of the impact. But lower-level data points no longer have to be the trigger point for alerts.

Dashboarding Azure Monitor SCOM MI in SquaredUp

Big news! Microsoft have just dropped Azure Monitor SCOM Managed Instance (SCOM MI), their cloud-based alternative to SCOM. It’s fully Microsoft managed, and so it promises to take the headache out of deploying, scaling, and managing your SCOM Management Groups. Read Microsoft’s announcement blog to learn all about it.

From Data Deluge to Strategic Advantage: Cribl and Elastic Chart the Future of Flexible Data Management and Operationalization

In an era where industry standards are as dynamic as the data they govern, Cribl’s core value of putting ‘Customers First, Always’ drives us to stay ahead of the curve. It’s with immense pride and excitement that we announce our strategic partnership with Elastic. This alliance isn’t just a meeting of minds; it’s a bold stride towards a future where flexibility in data management isn’t just a luxury – it’s the standard.

Unlocking Open Telemetry for Golang

Open Telemetry (OTel) is an open source observability framework that has garnered significant attention for its powerful capabilities in monitoring metrics, logs and traces.. It is second only to Kubernetes in the CNCF velocity chart with contributions being made from major players in the cloud industry, and has a growing community helping build out a thriving ecosystem.

What is DevOps? Grafana for Beginners Ep.2

As a beginner in DevOps, you probably have come across multiple definitions of DevOps and countless things that fall under the DevOps umbrella. So you have a basic idea of what DevOps is but are you able to explain it to another newbie like yourself? Join Lisa Jung, a senior developer advocate at Grafana, to learn about DevOps in the simplest terms possible. Subscribe to the Grafana for Beginners series to delve deeper into concepts like observability, DevOps, and how Grafana can be used to observe your system as a part of your DevOps Practice!

Set and scale service level objectives in Grafana Cloud: Introducing Grafana SLO

When we began offering Grafana Cloud Metrics, we set a service level agreement (SLA) for 99.5% of requests to be completed within a few seconds. So we built an alert that would go off if more than 0.5% of requests were slower than a couple of seconds within a five-minute moving window. Sounds reasonable, right?

My First Kubecon - Tales of the K8's community, DE&I, sustainability, and OTel

I went to my first Kubecon ever this last week. If you’re not familiar with Kubecon, it is a convention that is around Kubernetes, a Cloud Native Community Foundation (CNCF) open source project. With this being my first Kubecon ever, it was an adventure all around building community, education, kindness, and of course, a love for Kubernetes technology.

How To Save Money On Your Observability Costs

In today's digital age, the complexity and scope of dynamic system architectures are expanding at an unprecedented rate. As a result, IT teams find themselves grappling with the challenge of monitoring and addressing conditions across multi-cloud environments. With the increasing complexities, IT operations, DevOps, and SRE teams are searching for enhanced observability within these multifaceted computing environments.

Alerts Don't Suck: YOUR Alerts Suck!

Join Leon Adato, Kentik's Principal Technical Evangelist, for, "Alerts Don't Suck: YOUR Alerts Suck." In this engaging talk, Leon shares a personal anecdote about generating a staggering 772 tickets - twice - in just 15 minutes, setting the stage for an enlightening exploration of alert management. Leon discusses the common misconceptions and pitfalls of alerts in network observability, distinguishing between effective and ineffective alert strategies. He demystifies the concepts of monitoring versus observability, offering practical advice on creating alerts that genuinely add value and drive action. Whether you're an IT professional, network engineer, or anyone interested in improving your alerting philosophy, this talk is packed with actionable insights, humor, and real-world examples. Leon's hard-won advice might just transform your approach to alerting and optimizing your monitoring systems.

EU Data Residency & How We Built It | Sentry Launch Week | November 2023

Like many of our customers, Sentry takes privacy and data sovereignty seriously. One of Sentry’s values is to be for “Every Developer,” and we can’t do that with a US-only data-solution. That’s why, starting next month all Sentry organizations will be able to choose where your Sentry data is hosted – either in the US or the EU.

Live Render Log Monitoring with Papertrail

Cloud platforms like Render have made developers’ lives easier by handling many of the underlying infrastructure concerns. You can deploy web services, spin up databases, and schedule cron jobs without ever setting up a server manually. However, this convenience comes with a challenge: Accessing logs across these disparate services takes time and effort. To overcome this challenge, many developers centralize their logs with a log management service.

Optimize Core Web Vitals for SaaS and Custom Apps

A set of metrics known as Core Web Vitals have become key indicators of website performance and user satisfaction. Monitoring and optimizing these metrics for web pages can be challenging. Today, we learn how to use synthetic and real-user monitoring to measure, analyze, and improve Core Web Vitals. Delivering a smooth user experience plays a pivotal role in determining website application success.

Identify & Solve Issues Faster with Session Replay | Sentry Launch Week | November 2023

We know, we’re Sentry the error and performance monitoring platform and we catch production issues. But some broken experiences simply won’t throw an exception. So we built a way to detect when your users are slamming their keys on the keyboard in frustration, and to even let them contact you directly when that doesn’t go their way.

CMMC v2 Compliance with EventSentry

A quick overview on how EventSentry can help organizations become CMMC v2 compliant. EventSentry features actionable dashboards and reports to become and STAY compliant. But EventSentry goes beyond compliance - the monitoring and security features of EventSentry can be leveraged by any compliance framework. The result is a network that's compliance AND secure.

Digging into the Optus Outage

Last week a major internet outage took out one of Australia’s biggest telecoms. In a statement out yesterday, Optus blames the hours-long outage, which left millions of Aussies without telephone and internet, on a route leak from a sibling company. In this post, we discuss the outage and how it compares to the historic outage suffered by Canadian telecom Rogers in July 2022.

From Oops to Ops: SLOs Get Budget Rate Alerts

As someone living the Honeycomb ops life for a while, SLOs have been the bread and butter of our most critical and useful alerting. However, they had severe, long-standing limitations. In this post, I will describe these limitations, and how our brand new feature, budget rate alerts, addresses them. We usually don’t have SREs writing product announcements, but I’m so excited about this one that I said, “Screw it, I’m doing it!”

Kubecon North America 2023 event recap

As autumn graced the vibrant city of Chicago, I had the distinct opportunity to immerse myself in the heart of innovation and camaraderie at the CNCF’s Kubecon North America conference. Over the span of four remarkable days, from Nov 6-9, I was fortunate enough to walk alongside the many enthusiasts, contributors and organizers of open source and cloud native communities.

How To Investigate a Reported Problem

Getting to the root cause of a problem in cloud-native environments requires engineers to navigate through immense complexity within a distributed system. Oftentimes, you didn’t write the code and you lack the background and context to quickly understand what’s going on when a problem occurs. The stakes are even higher when a problem is reported - meaning it’s already started to impact the business and the executives and your customers are not pleased.

Java Application Monitoring - How IT Ops can Diagnose Memory Leaks at Scale

Many server-side applications are written in Java and often process tens of millions of requests per day. Key applications in various domains like finance, healthcare, insurance and education are often Java-based. When these applications slow down or fail, they affect the user experience and in turn, reduce business revenue. Behind many web forms or form-like GUIs there will often be a Java application.

The Future of Operations: AI-powered Internet Performance Monitoring

At Catchpoint, our philosophy is that AI should not be adopted simply for the sake of AI itself. Instead, it should be embraced when it proves to be the most effective solution for addressing a particular business challenge. While the world is currently in the fervor of the oncoming AI revolution, our industry-leading IPM platform has quietly harnessed the potential of Artificial Intelligence for years.

Driving Efficiency and Advanced Insight - Explore Virtana's Latest Innovations

Virtana is proud to announce a series of new capabilities focused on empowering our customers with advanced AI-driven capabilities, enhanced user interfaces, and deeper integrations, all aimed at optimizing application and infrastructure observability. Let’s dive into these innovations and see how they revolutionize the way IT professionals interact with their environments.

ObservabilityCON 2023 - Opening Keynote (Live)

👋 Coming to you live from London, Grafana ObservabilityCON 2023's keynote introduces the latest developments in the open and composable LGTM (Loki, Grafana, Tempo, Mimir) observability stack AND many exciting announcements! Our keynote features CEO/Co-founder Raj Dutt, CTO Tom Wilkie, and members of the Grafana Labs engineering team.

Managing observability spend with Grafana Cloud's Cost Management Hub

Learn how Grafana Cloud helps analyze, manage and optimize observability spend from a central location called the cost management hub. The move to cloud-native architectures like K8s and Prometheus has caused an unprecedented increase in telemetry data that has resulted in observability bills skyrocketing. With Grafana Cloud and the central cost management hub, you will be able to answer any cost-related question with the tools to inspect, attribute, optimize and monitor your observability spend.

How to map log volume to teams with Grafana Cloud's Log Volume Explorer | Demo

Investigate the source of high log volumes in Grafana Cloud by leveraging log labels to understand which teams or applications are responsible for log usage. In this video, see how to use the Log Volume Explorer with a point-and-click user interface and explore log volumes by using any combination of labels associated with the logging data. Slice and dice the data as you choose to see log volume broken down by teams, applications, clusters, cloud region.

2023 ONUG Fall Panel Discussion: Building Integrated Solutions for Network and Security

Full-Stack Analytics explores the essence of implementing cross-layer, multi-domain analytics, emphasizing the need for an integrated platform that combines endpoint and network security telemetry for holistic threat detection. In this session, we explore the transformative role of AI and machine learning in bolstering real-time intelligence across multiple layers and domains, enhancing predictive analytics and anomaly detection.

Cisco Cloud Observability on AWS: Deploying is easy with the AppDynamics add-on for Amazon EKS Blueprints with Terraform

Quickly deploy the Cisco AppDynamics Kubernetes® and App Service Monitoring solution for cloud native application observability using Helm charts and Amazon EKS Blueprints for Terraform module. In this blog, I’ll show you how to deploy the AppDynamics Kubernetes and App Service Monitoring solution for cloud native application observability using Helm charts and the Amazon EKS Blueprints for Terraform module. Now, you can do it in just minutes.

Not Every Problem is an Error: Introducing Rage and Dead Clicks + New User Feedback Reports

I know, we’re Sentry the error and performance monitoring platform and we catch production issues. But as you (hopefully) saw during our Launch Week announcement, some broken experiences simply won’t throw an exception. So we built a way to detect when your users are slamming their keys on the keyboard in frustration, and to even let them contact you directly when that doesn’t go their way.

Manage log volumes, metrics cardinality, monthly bills: Explore Grafana Cloud cost management tools

As more organizations adopt observability at massive scale, they have also been grappling with rising costs. Over the past 12 months, we have been working on different solutions to help our users better understand and manage their observability stack, not to mention the bills that come with scaling it.

Grafana Beyla 1.0 release: zero-code instrumentation for application telemetry using eBPF

Just two months after introducing the public preview of Grafana Beyla, we are excited to announce the general availability of the open source project with the release of Grafana Beyla 1.0 at ObservabilityCON 2023 today. We’ve worked hard in the last two months to stabilize, stress test, and refine the features that were part of the public preview of this open source eBPF auto-instrumentation tool.

How Asserts.ai will make it even easier for Grafana Cloud users to understand their observability data

At Grafana Labs, our mission has always been to help our users and customers understand the behavior of their applications and services. Over the past two years, the biggest needs we’ve heard from our customers have been to make it easier to understand their observability data, to extend observability into the application layer, and to get deeper, contextualized analytics.

Announcing Application Observability in Grafana Cloud, with native support for OpenTelemetry and Prometheus

The Grafana LGTM Stack (Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics) offers the freedom and flexibility for monitoring application performance. But we’ve also heard from many of our users and customers that you need a solution that makes it easier and faster to get started with application monitoring.

A Guide to Effective Network Load Testing & Load Balancing

When it comes to network management, there are two challenges that are ever-present; ensuring optimal network performance and maintaining uninterrupted network connectivity. Network admins are the unsung heroes, diligently managing the digital highways that connect the modern world. To maintain the delicate balance between seamless user experience and network reliability, two crucial practices come to the forefront: Network Load Testing and Load Balancing.

5 Best Frontend Error Monitoring Tools

You have so many options for frontend error monitoring today, and they all do slightly different things. We looked at everyone and did a breakdown of the most important features for frontend, the problems developers run into, end user reviews, and pricing structures to see how the best vendors stack up.

Why Every SRE and DevOps Beginner Needs a Status Page

So, you’ve ventured into the world of Site Reliability Engineering (SRE) and DevOps. Exciting, isn’t it? Yet, amidst the code, deployments, and system architectures, there’s a silent hero that often goes unnoticed — the humble status page. In this dynamic environment, it’s not just about keeping systems up but communicating effectively when they aren’t. That’s where Uptime.com steps in, like a superhero in a world of mortal website monitoring services.

Data-driven insights with AIOps: Making informed IT decisions

Information technology (IT) departments are always juggling multiple tasks in the fast-paced world of modern business, from maintaining the network infrastructure to guaranteeing data security and enhancing system performance. Even the most seasoned IT professionals can get overwhelmed by the sheer volume of data created and the rising complexity of IT systems. Artificial intelligence for IT operations (AIOps) can revolutionize the way businesses manage their IT environments in this situation.

Performance Monitoring for Every Developer | Sentry Launch Week | November 2023

Extracting relevant insights from your performance monitoring tool can be frustrating. You often get back more data than you need, making it difficult to connect that data back to the code you wrote. Sentry’s Performance monitoring product lets you cut through the noise by detecting real problems, then quickly takes you to the exact line of code responsible. The outcome: Less noise, more actionable results.

Uncovering Business Insights from Logs

In the world of modern business, data drives decision-making. Every interaction, every transaction, and every click generates a series of data in the form of logs. These logs, often seen as plain text records, have the potential to unlock valuable business insights when analyzed correctly. In How to Create Log-Based Metrics to Improve Application Observability, we described the process of creating log-based metrics to improve application observability using Sematext Cloud.

What is Multicloud? An Introduction

Simply defined, multicloud (or multi-cloud) describes a computing environment that relies on multiple SaaS or cloud services for different workloads within a single architecture. In a multicloud approach, organizations may use public cloud providers such as Amazon Web Services (AWS) for infrastructure, Microsoft Azure for platform, and Google Cloud Platform for development.

The Internet of Medical Things (IoMT): A Brief Introduction

The Internet of Medical Things (IoMT), a subset of Internet of Things (IoT) technologies, comprises inter-networked devices and applications used in medical and healthcare information technology applications. IoMT devices connect patients, doctors and medical devices — including hospital equipment, diagnostic gear, and wearable technology — by transmitting information over a secure network.

Benefits of a Mobile-Friendly Website

Aside from being a hub of information, your website should guide customers through the purchase journey. But you can't just have any website. You need a mobile-friendly website. Why? Essentially, a mobile-friendly website makes your business more accessible to potential customers. In this post, you'll learn five reasons your website designer should ensure mobile optimization. But first.

Datadog acquires Actiondesk

Datadog customers have an abundance of observability data at their fingertips. Using this data effectively requires having the right visualizations and analysis tools. For some teams, the powerful functionality of spreadsheets is critical to their ability to make data-driven forecasting and business decisions. That’s why we are pleased to announce that Actiondesk—a spreadsheet-powered connection to your live data—is joining Datadog.

Enhancing System Security with Advanced Logging and Auditing in Linux

Linux is a powerful operating system that has become a staple in the world of computing. With its open-source nature and versatility, it has gained popularity among individuals and organizations alike. However, as with any operating system, there is a need for robust logging and auditing capabilities. This is where the concept of "Advance Logging and Auditing in Linux" comes into play. In simple terms, logging and auditing are methods of recording and analyzing system activity.

Grafana k6 for Beginners: Why observability needs testing

Having observability and monitoring solutions is a great way to gain insights into your applications' health, behavior, and performance. However, it doesn’t prevent incidents. Observability needs a partner, and this is where Grafana k6 can help you! In this video, Marie Cruz, a Developer Advocate at Grafana Labs, explores what Grafana k6 is, why it's the missing puzzle piece in your Grafana stack, and how to get started.

Performance Monitoring for Every Developer: Web Vitals & Function Regression Issues

Extracting relevant insights from your performance monitoring tool can be frustrating. You often get back more data than you need, making it difficult to connect that data back to the code you wrote. Sentry’s Performance monitoring product lets you cut through the noise by detecting real problems, then quickly takes you to the exact line of code responsible. The outcome: Less noise, more actionable results.

Azure Monitoring: What it is and why you need it

Even before the push to the cloud, your company was a Microsoft shop. From workstations to servers, you’ve invested heavily in the Microsoft ecosystem because it gave your business all the technologies necessary for success. As part of your organization’s digital transformation strategy, Azure offered the easiest onboarding experience.

Grabbing the Datadog by the Tail

Datadog is a monitoring and analytics tool for information technology (IT) and DevOps teams that can be used to determine performance metrics as well as event monitoring for infrastructure and cloud services. The software can monitor services such as servers, databases, tools, and applications. Cribl Stream makes it easy to move data from anywhere, to anywhere. We take the saying to heart, and we also allow you to send our Cribl application metrics anywhere.

Building Dashboard and Dashboard Inputs in Cribl Search

This video demonstrates how to create “inputs” to Cribl Search dashboards. An Input is a control widget that we can add to our Dashboards to control how they execute. They allow the user to supply a range of inputs to customize one or many of the Searches in each of the panels on a given dashboard.

How to Create Log-Based Metrics to Improve Application Observability

As a Site Reliability Engineer (SRE) or DevOps professional, you are well aware of the importance of observability in ensuring the smooth functioning and performance of your applications. Observing and monitoring your applications can help you identify and resolve issues in real-time, resulting in increased reliability and improved user experience. Logs play a crucial role in this process as they provide detailed information about the activity and behavior of your applications.

Analyzing GCP Costs with BigQuery: A How-To Guide

Effective Google Cloud Platform (GCP) cost management is an essential aspect of cloud administration, ensuring that resources are used optimally without overspending. Utilizing BigQuery for GCP cost analysis offers a comprehensive solution to understand and control your cloud expenses. The integration of GCP billing data into BigQuery allows for real-time analysis and detailed insights into your cloud spend.

Selecting Observability and Security Solutions in Compliance with RBI: Fintech Challenges

Fintech, an abbreviation for financial technology, encompasses many firms and technologies that employ innovation and tech to enhance and automate financial services and operations. Their goal is to enhance the efficiency, accessibility, and user-friendliness of financial services. Fintech entities span numerous sectors within the financial industry, such as online payments, lending, digital banking, investing, insurance, and more, all aimed at streamlining financial processes.

Understanding Internal Status Pages

In today's fast-paced business environment, it's crucial for companies to monitor and address system health issues immediately. Internal status pages are tools designed for this purpose. They display up-to-date information about the company's internal and external systems and services, proving valuable not only for IT professionals but also for the overall efficiency and response capacity of a business.

Understanding Your Cloud Bill: A Beginner's Guide

Embarking on the journey of understanding your cloud bill can initially seem daunting, but grasping a few essential concepts and familiarizing yourself with common terminology can significantly demystify the process. At its core, cloud billing is the method by which cloud service providers charge for the resources and services that your business consumes.

Load testing on Kubernetes with k6 Private Load Zones (Grafana Office Hours #19)

This week, we're talking about how you can do load testing on Kubernetes with k6 Private Load Zones, a new feature on Grafana Cloud k6 that leverages the k6 Kubernetes operator to allow you to run distributed load tests against applications behind a firewall. Here to discuss this new feature are Senior Software Engineer Olha Yevtushenko, Product Manager Daniel González Lopes, Developer Advocate Paul Balogh, and Senior Developer Advocate Nicole van der Hoeven.

Distributed Tracing: Your Ultimate Guide

When all your IT systems, your apps and software, and your people are spread out, you need a way to see what’s happening in all these minute and separate interactions. That’s exactly what distributed tracing does. Distributed tracing is a way to tracking requests in applications and how those requests move from users and frontend devices through to backend services and databases.

What Do Developers Need to Know About Kubernetes, Anyway?

Stop me if you’ve heard this one before: you just pushed and deployed your latest change to production, and it’s rolling out to your Kubernetes cluster. You sip your coffee as you wrap up some documentation when a ping in the ops channel catches your eye—a sales engineer is complaining that the demo environment is slow. Probably nothing to worry about, not like your changes had anything to do with that… but, minutes later, more alerts start to fire off.

A story about HTTP status codes and why you should read documentation

Since 2020, I’ve been working on an Express (Node.js framework) application to power viewer interactions and events that happen whilst I’m streaming live coding on Twitch — my Twitch bot. Since using Sentry for error monitoring and crashes using the Sentry Node SDK, I’ve already squashed quite a few bugs that were entirely a result of my own terrible code.

Introducing the Notification API

You'll often hear us saying "everyone loves a dashboard", and that's most certainly true, but nobody loves staring at a screen all day waiting for something to happen. Real magic is when your awesome dashboard comes to you, where you need it, when you need it. Over the last few months we've introduced a bunch of powerful features to make "taking action" as simple as possible... Monitors let you define the health of your data so you can see at a glance if something isn't right.

Germain UX's Platform Now Available on Salesforce AppExchange: A Game-Changer for Salesforce Users!

We are absolutely delighted to announce that the Salesforce Partner Team has given their enthusiastic approval for Germain UX to be publicly listed on the illustrious Salesforce AppExchange platform. Following an extensive and rigorous review process that spanned an impressive 14 months, we are honored to become part of the exclusive league of top-tier applications featured on the AppExchange. This marks a momentous achievement for Germain UX, and we are eager to share this significant milestone with you.

Azure Event Grid Monitoring and Alerting

Azure Event Grid is a versatile and event-driven service within the Microsoft Azure cloud ecosystem, designed to enable seamless event handling and routing across various Azure services and external sources. This powerful service plays a pivotal role in facilitating event-driven architectures, streamlining the development and management of cloud-based applications and systems.

What is OpenTelemetry? A Comprehensive Guide

An Essential Guide to OpenTelemetry In today’s expeditious, highly distributed software landscape, achieving true observability is no simple task. As you strive to understand how your applications and services perform and behave, you face multiple challenges. Moreover, you need to instrument applications and services that generate data effectively, have a reliable means to transmit it, and, most importantly, find a way to visualize and derive insights from it.

6 Reasons Your Data Lake Isn't Working Out

Since the data lake concept emerged more than a decade ago, data lakes have been pitched as the solution to many of the woes surrounding traditional data management solutions, like databases and data warehouses. Data lakes, we have been told, are more scalable, better able to accommodate widely varying types of data, cheaper to build and so on. Much of that is true, at least theoretically.

Proactive error management: Collaborate effectively and work smarter with tags

Talking to many of our customers with different needs and use cases, one particular issue comes up all the time. When I’m seeing so many error groups in my app and so many error notifications in my inbox every day, it’s easy to end up feeling overwhelmed. I want a more proactive system to alert me to which errors need attention and when, so that I can stop getting buried. Does this hit home?

How to Reduce MTTR: A Complete Guide

Organizations striving to improve their operational efficiencies must know how to reduce MTTR as it plays a key role in today’s fiercely competitive business landscape. Customer satisfaction is a top priority for most businesses and late response to their queries or issues can have a negative impact. To track the response and resolution time, businesses measure their MTTR score. MTTR is a key metric that gives insight as to how much time an organization takes to resolve an incident or issue.

AI Explainer: What Are Reinforcement Learning 'Rewards'?

In a previous blog post, which was a glossary of terms related to artificial intelligence, I included this brief definition of "reinforcement learning": I expect this definition would prompt many to ask, "What rewards can you give a machine learning agent?" A gold star? Praise? No, the short answer is: numerical values. In reinforcement learning, rewards are crucial for training agents to make decisions that maximize their performance in a given environment.

Enrich Kubernetes with New Deployment Tracking Capability

When things go wrong, we’d all love the ability to go back in time, return things to the way they were, and fix whatever issues pop up at the start so they never happen in the first place. This is no different when maintaining complex microservices-based architectures. With any complex system, things are bound to go wrong from time to time.

The Importance of Microservices

What are microservices? Microservices are a software approach that creates applications as a loose coupling of specific services or functions, rather than as a single, “monolithic” program. A microservice architecture increases the speed and reliability with which large, complex applications are delivered. What makes a service a microservice? Microservices are defined not by how they’re coded, but by how they fit into a broader system or solution.

How to Use Tags to Speed Up Troubleshooting

Maybe as a kid, you pretended to have a magic wand. You would say something like, “Show me the answer to this long division question” then wave your magic wand and wait for the answer. Sadly, mine never seemed to work – for math questions or to make magical snacks appear. Now, imagine if you had a magic wand for your application stack where you could ask it a question about your data and it would give you immediate insights.

How Grafana Labs switched to Karpenter to reduce costs and complexities in Amazon EKS

At Grafana Labs we meet our users where they are. We run our services in every major cloud provider, so they can have what they need, where they need it. But of course, different providers offer different services — and different challenges. When we first landed on AWS in 2022 and began using Amazon Elastic Kubernetes Service (Amazon EKS), we went with Cluster Autoscaler (CA) as our autoscaling tool of choice.

Unleashing the Dynamic Duo: Supercharging Productivity with Liquit Workspace and eG Innovations Monitoring Solution!

A guest blog from Donny van der Linde, Pre-Sales Consultant at eG Innovations’ partner Liquit covering how to leverage eG Enterprise’s monitoring in combination with Liquit Workspace technologies to build powerful proactive contextual access workflows to support automated application delivery. An example using eG Enterprise’s user experience metrics to trigger remediation via Liquit Smart Icons is given.

Building Dashboard and Dashboard Inputs in Cribl Search

This blog demonstrates how to create “inputs” to Cribl Search dashboards. An Input is a control widget that we can add to our Dashboards to control how they execute. They allow the user to supply a range of inputs to customize one or many of the Searches in each of the panels on a given dashboard. Currently, there are four types of inputs: a time picker, a dropdown, a string, and a number. This blog shows how to create all four types of Inputs on a dashboard using built-in sample data.

PX5 Announces Tracealyzer Support for PX5 RTOS

A little more than a month ago, we released the free Tracealyzer SDK – a toolkit that allows other embedded software vendors to integrate Tracealyzer recording in their own software. At that time, the development team at PX5 in California were already hard at work combining Tracealyzer with their PX5 RTOS, and yesterday they released the integration. Built with Percepio’s SDK, in a just a few weeks.

More is More - A Case for Dynamic Observability

Dynamic observability is the concept that the amount of data collected should scale based on signals from your environment. Elastic infrastructure is not a new concept. Much of the internet is powered by services that provision more resources based on signals derived from metrics like cpu load, memory utilization and queue depth. If we can use tools to right size our infrastructure, why can’t we also use tools to right size the amount of data we collect?

What is SLA? How to Handle SLA Breaches?

Service Level Agreements (SLAs) are foundational contracts that define the expectations and commitments between service providers and their customers. These agreements outline the quality, performance, and availability of services, setting the stage for a harmonious relationship. However, as service environments grow increasingly complex, the risk of SLA breaches looms ever larger.

LM Co-Pilot: Your AI Co-Pilot for the Magical Streamlining of IT and Cloud Operations

LogicMonitor’s Generative Intelligence Solution for IT Teams Cutting-edge generative technologies have revolutionized our industry, paving the way for fresh and innovative approaches to deliver interactive and actionable experiences. At LogicMonitor, we firmly believe in leveraging these generative techniques across our platform, offering a uniquely dynamic support system for various aspects of our end-user experience.

Resolve issues faster with Grafana Cloud Application Observability

Grafana Cloud Application Observability provides an out-of-the box experience to monitor application performance and minimize MTTR. With its native support of the open standards OpenTelemetry and Prometheus, Application Observability unifies signals across the full stack, accelerating root cause analysis while removing proprietary formats and vendor lock-in. Watch this demo of how to use Application Observability in Grafana Cloud.

Zero-code application observability with Grafana Beyla and eBPF: demo

The eBPF-based OSS auto-instrumentation tool Grafana Beyla makes it easier to get started with application observability. Beyla provides RED (Rate, Errors, Duration) metrics through OpenTelemetry or Prometheus for your existing web services, whichever language they are written in. You don’t need to change any line of application code or configuration; you only need to deploy the Beyla in the same host as the service that you want to monitor. Collecting monitoring data with the eBPF autoinstrument tool has very low overhead, and allows you to capture data about your runtime, which is impossible with manual code instrumentation. Watch this in-depth demo of how to use Grafana Beyla to get started with application observability.

Control Prometheus cardinality and metrics cost with Adaptive Metrics

Adaptive Metrics is a cost management feature in Grafana Cloud that helps enterprises control Prometheus cardinality and reduce their observability spend by identifying and eliminating unused metrics. Grafana Cloud customers using Adaptive Metrics see 20-50% reduction in their observability bill.

How Mercado Libre scales its AWS microservices without losing visibility

Learn how Mercado Libre acts more quickly, strategically, and proactively thanks to Datadog’s centralized platform and context-rich alerting.Mercado Libre hosts the largest online commerce and payments ecosystem in Latin America, which means thousands of dollars can be lost if some of their critical applications stop working for even 1 minute. Senior Technical Manager Juliano Martins and software expert Marcelo Quadros share a few reasons why they chose Datadog as their observability platform of choice for their AWS environment: the power of our infrastructure monitoring solution, extensive range of integrations, strong reputation in the market, and more.

Monitoring 101 for React Developers by Sarah Guthals & Lazar Nikolov | React Advanced 2023 Workshop

If finding errors in your frontend project is like searching for a needle in a code haystack, then Sentry error monitoring can be your metal detector. Learn the basics of error monitoring with Sentry. Whether you are running a React, Angular, Vue, or just “vanilla” JavaScript, see how Sentry can help you find the who, what, when and where behind errors in your frontend project. This workshop took place live on Oct 16, 2023 at React Advanced London.

Router Monitoring with Grafana

Routers are essential for connecting devices. Routers decide where the internet goes and how fast it should be. Because they play such a crucial role, it's vital to keep an eye on them and make sure they're doing their job well. This act of keeping an eye on routers is what we call "router monitoring." Monitoring routers isn't just about checking how the routers are doing; it's about making sure the whole network works well.

2023 ONUG Fall Keynote - Learn How To Expand Visibility Beyond the Network Edge & Triage Faster

In this 2023 ONUG Fall Keynote session, you will learn how successful enterprises trust Broadcom to help them gain visibility into the error domain beyond their network edge to immediately reveal their operational innocence and triage faster. broadcom.com/netops.

What is Docker Network Host?

Docker is a platform as a service for deploying applications in Docker containers. Containers are software "packages" that bundle together an application's source code with its libraries, configurations, and dependencies, helping software run more consistently and reliably on different machines. To start using Docker containers, you need to be familiar with Docker networking. Below, we'll answer the question: "What is a Docker network host?".

Grafana panel titles: Why we changed from center to left-aligned

As Grafana evolved over the years, so did our panel headers. In our quest for improvement, we continually added design options that created more comprehensive panels, but also an increasingly complex interface. It was a process of continual adaptation without a roadmap — which, though well-intentioned, began to result in unforeseen challenges.

Resolving VPN Issues Without Manual Intervention: Qualcomm Incorporated and Nexthink Flow

VPN issues are easily some of the most common digital workplace problems to plague end user computing (EUC) teams. When the VPN crashes or falls out of compliance, it can have a disproportionate impact on employee experience. Monitoring and managing VPN performance is a top priority for many of our customers – including Qualcomm Incorporated. Qualcomm had a known VPN issue taking place in their environment impacting 90% of their workforce.

As APIs grow in strategic importance to banks, focus turns to modern API monitoring tools

Banks are putting a fresh set of eyes on how they are using APIs to drive better business outcomes and deliver more value to their customers. This is a relatively new departure toward adopting digital transformation of key operations. Financial organizations are traditionally known for favoring conservative business models that often resist modernizing complex legacy systems or rapid change in product and service offerings. This has been changing rapidly as APIs become more prevalent.

Monitoring and Diagnosing AVD

In this overview video, we'll be walking you through monitoring and diagnosing Azure Virtual Desktop utilizing CloudReady Synthetics and Service Watch. With a large number of organizations moving towards virtual desktops for end users, it is critical to have the right monitoring in place. Performance issues and outages can greatly impact the end user productivity and cause frustration due to a poor user experience.

Azure Service Bus Monitoring and Alerting

Microsoft Azure offers a cloud-based messaging service called Azure Service Bus. The goal is to streamline communication between disparate components or applications, regardless of whether they are distributed across multiple environments and locations or operating on the same Azure platform. With Azure Service Bus, you can develop distributed systems that are scalable, reliable, and decoupled.

Simplify OpenTelemetry Pipelines with Headers Setter

In telemetry jargon, a pipeline is a directed acyclic graph (DAG) of nodes that carry emitted signals from an application to a backend. In an OpenTelemetry Collector, a pipeline is a set of receivers that collect signals, runs them through processors, and then emits them through configured exporters. This blog post hopes to simplify both types of pipelines by using an OpenTelemetry extension called the Headers Setter.

SCOM meets ServiceNow with the Opslogix ServiceNow Incident Connector

With the Opslogix ServiceNow Incident Connector for SCOM, you get a bi-directional data sync between SCOM Alerts and ServiceNow Incidents and Tickets Most organizations rely on a number of different tools to keep their operations running smoothly. Getting these tools to integrate seamlessly with each other is an essential factor in streamlining your operations, but sometimes, additional connectors might be needed in order to do so optimally.

Observability Shifts Right

Observability first emerged as a focal point of interest in the DevOps community in the 2017 time frame. Aware that business was demanding highly adaptable digital environments, DevOps professionals realised that high adaptability required a new approach to IT architecture. Whereas historically, digital stacks were monolithic or, at best, coarsely grained, the new stacks would have to be highly modular, dynamic, ephemeral at the component level, and spread over multiple cloud-based services.

How to Quickly Find What's Broken in Your Complex, Cloud Environment

With the rapid adoption of cloud, distributed systems and microservices are standard, resulting in increasingly complex environments. Once straightforward troubleshooting workflows have become chaotic, frustrating, and time-consuming. When something breaks, multiple teams are called to the table to prove they’re “not it”; each with their singular view of the problem.

Introducing Three Powerful Commands in Cribl Search: .show objects, .show queries, and .cancel

Empty spaces, what are we searching for? Abandon queries, but do you know the score? On and on, Does anybody know what we are looking for? … Inspired by “The Show Must Go On”, Queen. Since we launched Cribl Search back in late 2022, we’ve been hard at work on adding features and functionality that continue to empower data engineers to do more with their data without needing to collect it first.

Officially Worldwide: Cribl.Cloud and Cribl Search are now available in EMEA!

At Cribl, we give the people what they want. And what they want is to keep their data close to their sources and destinations. The less data has to travel, the better — lower latency and fewer security risks. This commitment to data locality is even more pronounced among our valued customers in the EMEA region, who are enthusiastically embracing cloud-first strategies.

ScienceLogic Secures the TrustRadius Best of Award in AIOps: A Triumph of Value, Features, and Relationships!

At ScienceLogic, we’ve always believed in the power of innovation and the importance of customer satisfaction. We are excited to announce that we have been honored with the TrustRadius Best of Award in the AIOps category for 2023. This is a testament to our dedication to providing exceptional value, top-notch features, and building enduring relationships with our customers.

ITIM and the Public Sector: How Network Monitoring Rises to the Challenge

Government agencies and public sector organizations are a tantalizing hacker target. Cybercriminals go after public sector organizations because they hold confidential, often classified, information – the exact data state-sponsored and other criminal groups salivate over. The Cybersecurity and Infrastructure Security Agency, or CISA, along with the United States Computer Emergency Readiness Team, or CERT, have warned public sector IT of key threats.

Formalize your organization's best practices with custom Scorecards in Datadog

The Datadog Service Catalog is a centralized hub of information around the performance, reliability, security, efficiency, and ownership of your distributed services. By using the Service Catalog, teams can eliminate knowledge silos and realize seamless DevSecOps workflows.

Troubleshooting Container Network Latency in Kubernetes with Kentik Kube

Kentik Kube brings network observability to Kubernetes. In this Kentik Kube product demo, we navigate a real-time scenario of troubleshooting high latency within a Kubernetes cluster. The Kentik Kube map offers a visualization of our environments, complete with automated alerts and the ability to correlate performance metrics directly to affected pods.

Custom Container Network Monitoring and Alerting in Kubernetes with Kentik Kube

Discover the power of proactive network monitoring in Kubernetes with Kentik Kube. This demo highlights the critical importance of custom dashboards and alerts in maintaining optimal container performance and service availability. We take you through creating a tailored alert for a checkout service within Kentik Kube. From selecting services to diving deep into performance metrics via the Kentik Data Explorer, we show you how Kentik Kube makes it easy to set up policies that monitor and alert you to Kubernetes network issues as they arise.

Tackling Staffing, Funding, and Data Challenges Head-On with TAQA

Join Ed Bailey and TAQA Group's Andrew Ochse as they discuss the diverse services that TAQA offers, look at the challenges with scaling and staffing, and explore in great detail the solutions to classic problems such as insufficient funding, poor data quality, and slow connections linking global sites to their Security Operations Center (SOC).

Retrace Keynote: Observability (metrics, traces, logs) - Take back control of your data

In the keynote recorded during our recent user group session NUGGETS 2023, Sanjeev Mittal, GM of Retrace talks about how organizations are rapidly moving their infrastructure to the cloud and how their costs and complexity are increasing. Our customers are looking for Observability solutions that give them control of data and costs.

Unveiling Endpoint Central MSP Cloud: Your complete SaaS client endpoint management solution

We’re thrilled to announce the release of Endpoint Central MSP Cloud, a game-changing solution for MSPs like you. In today’s tech-driven world, managing and securing client endpoints is a critical aspect of the SLAs between MSPs and their clients. That’s why we’re excited to introduce the cloud version of Endpoint Central MSP. With the cloud version of Endpoint Central MSP, you’ll experience a new level of endpoint management efficiency.

The pivotal role of network automation software in your network environment

Today, automation has shifted from being a luxury to an absolute necessity, especially within the realm of networks. This shift is driven by the rapid evolution of network technologies, compliance standards, and evolving business needs, placing an increasingly daunting burden on network administrators to manage these tasks manually.

Achieve Optimal Performance with Nginx and Micro-Caching

The word "cache" is used in many different contexts. Typically, a cache is a small chunk of data stored locally for quick access. In this case, it’s about computer caches, i.e., the data stored in memory. Micro-caching is a technique where a programmer uses this cached data to do some quick computations, which can lead to some amazing solutions. In micro caching, dynamically generated content is cached for only a brief period rather than full-page caching.

A Data Engineers Journey to Modernizing with Cribl

Terry Mulligan, is a Splunk consultant with Discovered Intelligence (and Notre Dame’s biggest fan)— a data intelligence services and solutions provider that specializes in data observability and security platforms. He shares what Cribl has brought to the table for his organization and his clients, and how it’s changed their processes and the role of the Splunk data engineer.

Setting up better logging in Azure Functions

We have been using Azure Functions for years. Being able to easily deploy and run code on both Azure App Services and real serverless has been a killer feature for all of our asynchronous jobs and services. Unfortunately, the logging approach provided as part of the default template is not ideal. In this post, I'll introduce you to the first steps we take in all of our existing and new function apps to improve logging. A quick note about the Azure Functions runtime.

Troubleshooting Azure Virtual Desktop (AVD) Issues through Logon and Beyond

Today, I’ll be covering troubleshooting Azure Virtual Desktop (AVD) issues. I’ll cover the common causes of problems beyond logon and how you can monitor and troubleshoot to identify the root-causes of issues and how to resolve them resolve them. For information on troubleshooting logon problems and slow logons, please see my previous article: Troubleshoot Slow Azure Virtual Desktop (AVD) Logons.

Why public sector needs AI-powered observability: Cost savings, ROI, and analyst efficiency

Elastic Observability customers saw 243% ROI and $1.2 million in savings over 3 years For government and education organizations around the world, facilitating an efficient, reliable customer experience is essential when providing critical services and building trust with stakeholders. As technology infrastructure expands and the IT landscape becomes a complex mix of private cloud, public cloud, and air-gapped environments, the ability to see across all systems and data is challenging yet critical.

Elasticsearch and LangChain collaborate on production-ready RAG templates

For the past few months, we’ve been working closely with the LangChain team as they made progress on launching LangServe and LangChain Templates! LangChain Templates is a set of reference architectures to build production-ready generative AI applications. You can read more about the launch here.

Troubleshooting Kubernetes Network Performance Issues with Kentik Kube

Discover the power of Kentik Kube, the solution for unparalleled network insight into Kubernetes workloads. In this video, Phil Gervasi walks us through Kentik Kube’s ability to track and visualize container traffic across data centers, public clouds, and the internet. Using the interactive Kube map, Phil demonstrates how to troubleshoot a network latency issue affecting an online shopping application, highlighting the tool's capacity for in-depth analysis of pod performance and node-to-node traffic. Learn how to create custom dashboards for custom oversight, set up proactive alerts for performance issues, and enforce security and compliance across your Kubernetes infrastructure.

Elastic Observability ES|QL Demo

Elevate Your Data Game with Elastic Observability and ES|QL! Discover the future of data querying with Elastic’s groundbreaking new feature: ES|QL! In this video you'll deep dive into how ES|QL revolutionizes the way you interact with complex, distributed data, ensuring seamless and efficient data analysis. Who Is This For? Whether you are a data analyst eager to optimize your query writing skills, or a business leader looking to democratize data insights across your organization, this video is tailor-made for you!

Level 1 NOC WiFi/LAN/WAN Correlation and Triage

The latest release of Network Management 23.3 from Broadcom, the solution extends assurance to cloud-based Wi-Fi architectures with global correlation to your LAN and WAN infrastructure for a complete view of network health across vendors, protocols and user experiences. Today, most NOCs still monitor Wi-Fi as if it is a wired service with no comprehension of its radio integrity or user movements. As a continuation of the "Unified NetOps" capability, v23.3 will enable any NOC to monitor wireless networks using the standard workflows for alarm management, ticketing, and triage, but with the added Level 2 NOC awareness of radio frequency metrics, noise level, interference, and wireless user demographics and movements.

Performance optimization techniques in time series databases: strings interning

VictoriaMetrics is an open-source time-series database (TSDB) written in Go, and I’ve had the pleasure of working on it for the past couple of years. TSDBs have stringent performance requirements, and building VictoriaMetrics has taught me a thing or two about optimization. In this blog post, I’ll share some of the performance tips I’ve learned during my time at VictoriaMetrics.

Saga Design System: shaping the future of user experiences at Grafana Labs

At Grafana Labs, we want to empower our fellow Grafanistas and the community to get the most out of the Grafana LGTM Stack (Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics). As part of this effort, we recently launched a new Grafana developer portal. And now, we’re pleased to announce the launch of the Saga Design System, which establishes a shared visual language for all of Grafana Labs’ offerings.

Mobile iOS Error Simulator

Ready to explore Rollbar without the coding hassle? The Rollbar Error Simulator iOS app is the ultimate solution for carefree error testing, designed for users without coding experience. Seamlessly connecting to your Rollbar account, this user-friendly app lets you simulate errors effortlessly with just a single tap on a button. No technical expertise is needed! Just create a new account, opt for the Error Simulator experience, and you'll be guided.

Nine tips for building an effective digital resilience strategy

Is your business ready to not only withstand but also thrive during digital disruptions? Today's business landscape heavily relies on digital technologies and online services. Digital resilience has become a critical concept to ensure business continuity and safeguard data.

Quantifying the value of AI-powered observability

Organizations saw a 243% ROI and $1.2 million in savings over three years In today’s complex and distributed IT environments, traditional monitoring falls short. Legacy tools often provide limited visibility across an organization’s tech stack and often at a high cost, resulting in selective monitoring. Many companies are therefore realizing the need for true, affordable end-to-end observability, which eliminates blind spots and improves visibility across their ecosystem.

Demystifying Network Availability: What It Is & How to Improve It

In the high-stakes world of network admins, where the flow of data and communication is the lifeblood of modern organizations, the concept of network availability stands as a pillar of paramount importance. For network administrators, it's not just a buzzword; it's a critical measure that directly impacts the success and efficiency of the networks they manage.

9 Tips for a Successful Transition to Cloud-Based ITSM

Today, organizations of all sizes heavily depend on Information Technology Service Management (ITSM) solutions to elevate operational efficiency and customer satisfaction. The advent of cloud technology has presented a jaw-dropping opportunity for numerous businesses, given the benefits that cloud-based ITSM solutions provide.

VMware Horizon Monitoring

In the world of virtualization, VMware Horizon is a name that every expert is familiar with. This powerful platform has revolutionized the way employees access their desktops and applications, creating an unparalleled digital employee experience (DEX). With features like Virtual Desktop Infrastructure (VDI) and Desktop as a Service (DaaS), VMware Horizon has become the go-to solution for organizations looking to streamline their IT infrastructure.

We've Levelled up Our Top Monitoring Features

We've improved our core features to help you debug issues more efficiently and effectively. This article will walk you through the changes we've made to our Error Tracking, Performance Monitoring, Anomaly Detection, and Log Incidents features that enable you to gain the insights you need to dive deep into issues quicker than ever before.

How AppSignal Got Its (Domain) Name

In late 2012, we were getting somewhere with this unnamed side project that was supposed to replace our overpriced New Relic subscription. If we wanted to tell the world about it, it needed a name. This is a brief history of how we came up with the name, and how we landed appsignal.com. Besides being co-founder, I was also "the name guy” at 80beans, the consultancy where AppSignal was born; you know, the person that comes up with fun names and finds out the.com has been squatted.

Best Practices for Using Git in Your Cribl Workflows

In this conversation, Sanjay Shrestha, Principal Detection Engineer at Bayer, and Raanan Dagan, Principal Sales Engineer from Cribl, talk about the integration of Git in Cribl Stream. They discuss how to manage configuration files and pipelines as code, simplifying their deployment. They also share a demo and give best practices for optimizing your GitOps workflow. In the 10+ years that Bayer has worked with Splunk, they’ve gone from processing just 80 GB/day to more than 13 TB/day.

AI Explainer: What Are Generative Adversarial Networks?

I previously posted a blog that was a glossary of terms related to artificial intelligence. It included this brief definition of "generative AI": I expect for someone learning about AI, it's frustrating to read definitions of terms that include other terms you may not understand. In this case, generative adversarial networks — GANs — is probably a new term for many. This post will explain what GANs are for that reason — and also because they’re super cool.

How we upgraded to MySQL 8 in Grafana Cloud

Starting around June this year, we upgraded our Grafana databases in Grafana Cloud from MySQL 5.7 to MySQL 8, due to MySQL 5.7 reaching end-of-life in October. This project involved tens of thousands of customer databases across dozens of MySQL database servers, multiple cloud providers, and many Kubernetes clusters.

How we manage incidents at Datadog

Incidents put systems and organizations to the test. They pose particular challenges at scale: in complex distributed environments overseen by many different teams, managing incidents requires extensive structure and planning. But incidents, by definition, break structures and foil plans. As a result, they demand carefully orchestrated yet highly flexible forms of response. This post will provide a look into how we manage incidents at Datadog. We’ll cover our entire process.

What's the difference between API Latency and API Response Time?

Your app’s networking directly affects the user experience of your app. Imagine having to wait a few seconds for the page to load. Or even worse, imagine waiting for a few seconds every time you perform an action. It would be infuriating! Before you go on a fixing adventure, it’s a good idea to understand what causes that waiting time. So let’s do that!

SolarWinds Kiwi Syslog Server Overview

SolarWinds® Kiwi Syslog® Server is an affordable on-premises solution designed to help you manage syslog messages, SNMP traps, and Windows event logs. It centralizes and simplifies log message management across network devices and servers. Kiwi Syslog Server lets you collect, filter, alert, react to, and forward syslog messages and SNMP traps, and it helps you adhere to regulatory compliance. Learn how to simplify syslog and SNMP trap management with SolarWinds Kiwi Syslog Server.

Kubernetes Clusters: Everything You Need To Know

Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It allows you to create and manage clusters of machines, called Kubernetes clusters, to run your applications in a scalable and highly available manner. Kubernetes clusters provide a distributed and scalable platform for running containerized workloads.

AIOps in DevOps: Advantages, Obstacles, and Best Strategies

AIOps for DevOps. This is a phrase that you’ll hear a lot often in the coming years. Leveraging AIOps is becoming an emerging practice for DevOps. And why not? After all, AIOps offers so many benefits for DevOps, making it a must have for every business. But what exactly are these benefits? To know more, keep reading this blog where we will discuss everything about AIOps for DevOps including its.

Using Cribl Edge to Collect Metrics from Prometheus Targets in Kubernetes

We continue our exploration of the fascinating world of Kubernetes, logs, and metrics. In our previous installment, we delved into the intricate tale of Cribl Edge and its role in unraveling the mysteries of logging and metrics in Kubernetes environments with the Cribl Edge native sources for Kubernetes Metrics and Logs. Today, we’re picking up where we left off, shining a spotlight on a new and powerful tool that has the potential to demystify this complex ecosystem further.

What is Application Performance Monitoring (APM)?

In today's digital landscape, where everything is moving at breakneck speed, the performance of your applications can either make or break your digital game. We've all been there – the frustration of a slow website or an unresponsive app. That's where Application Performance Monitoring, or APM, swoops in to save the day. APM is like your app's personal fitness trainer, helping you keep it in peak condition.

Kubernetes Scheduler - A Comprehensive Guide

In the world of Kubernetes, where applications are encapsulated within containers and seamlessly distributed across diverse clusters of computers, the enigmatic Kubernetes scheduler takes centre stage. Think of it as the orchestra conductor of your Kubernetes cluster, orchestrating a symphony of resources to ensure seamless operations. This unassuming yet powerful component leverages a sophisticated algorithm to perform the intricate dance of optimizing resource allocation.

Monitoring Redis Performance Metrics

Redis, as an in-memory data store, excels at providing high-speed data access and manipulation. However, without effective monitoring, the potential advantages of Redis can be compromised due to performance bottlenecks, scalability issues, and resource constraints. By closely scrutinizing key metrics, Redis monitoring allows you to proactively detect and address potential problems, ensuring the stability, reliability, and high-performance operation of your Redis environment.

Combining frontend and backend performance with John Hill (Grafana Office Hours #18)

In this episode of Grafana Office Hours, Developer Advocates Marie Cruz and Nicole van der Hoeven speak with John Hill, a Web UI Test Engineer and Grafana k6 champion, to talk about how the Grafana and k6 ecosystems can be used to ensure performance in mission-critical applications like NASA’s Open MCT.

Understanding the Netdata Methodology: A Different Take on Monitoring

In the dynamic landscape of modern infrastructure and multi cloud environments, observing and understanding system performance requires a new breed of tools—ones that keep pace with the 'living' nature of modern infrastructure. This is the inflection point at which Netdata steps in, and aims to bring a fresh perspective to monitoring.

SEC Charges on SolarWinds: A Wake-Up Call for Cybersecurity and Risk Management

Cribl’s Ed Bailey and Jackie McGuire look into the recent SEC fraud charges leveled against SolarWinds and its CISO, concerning alleged fraud and internal control failures tied to known cybersecurity risks and vulnerabilities. These charges carry long-term implications for corporate handling of cybersecurity and risk management. Tune into the live stream for an engaging conversation, and come prepared with your questions and insights on the future of cybersecurity.

Streamlining configuration management: Unleashing the power of OpManager's module

Ensuring uninterrupted network availability is paramount for any organization relying on multiple network devices for their day-to-day operations. Two crucial metrics that govern network availability are performance and fault management. By implementing ManageEngine OpManager as your fault and performance monitoring tool, you gain invaluable insights into device health, CPU performance, and more.

Crossed 15K+ GitHub Stars, Simplified Logs Parsing with Pipelines & Trending on Hacker News - SigNal 30

Welcome to the 30th edition of our monthly product newsletter - SigNal 30! Last month, our Github repo crossed 15k+ Github stars, which is a great milestone for our open-source project and for our team. We also shipped the much-awaited logs pipeline that will make logs parsing a much better experience for our users. We also shipped other improvements to the product, hosted OpenTelemetry meetups and webinars, and much more.

Bringing it all together: Speed, performance, and efficiency in InfluxDB 3.0

For most of the past year, we here at InfluxData focused on shipping the latest version of InfluxDB. To date, we launched three commercial products (InfluxDB Cloud Serverless, InfluxDB Cloud Dedicated, and InfluxDB Clustered), with more open source options on the way. All the while, we claimed that this latest version of InfluxDB surpasses anything we built before.

Demystifying Cloud and Cloud-Native Observability

In the ever-evolving and fast-changing landscape of cloud computing and modern software development, achieving 360-degree visibility into your critical business services, applications and infrastructure is essential. This is where observability comes into play. Observability, especially in a cloud-based or cloud-native environment, has become a critical aspect of maintaining and optimizing complex systems and services.

Momentum: Announcing 268 Million Downloads & 320% Growth in 2023

We’re happy to announce a landmark 320% growth in 2023! VictoriaMetrics, our open source time series database and monitoring solution, already hit 268 million downloads this year (still counting), and received close to 13,000 stars on GitHub.

6 Best Practices for Tuning Network Monitoring Alerts

Network monitoring and alerting provide the foundation for efficient IT operations and cyber resilience. By keeping track of the status and performance of network infrastructure and applications, network monitoring tools can automatically generate alerts when defined thresholds are exceeded or specific events occur. These network monitoring alerts allow IT teams to detect outages, performance degradation, and potential security incidents so they can respond swiftly to minimize disruption.

Server Monitoring with Graphite

Server monitoring is a crucial technique to learn these days to work efficiently with servers. It helps optimize the performance of a server and diagnose issues productively. One useful tool used these days is Graphite, which helps monitor a server’s performance and provides graphing solutions by gaining valuable insights into your server. You can explore MetricFire’s Hosted Graphite service today by signing up for a free trial or booking a demo session.

Top 10 Container Monitoring Tools

Containerization has significantly improved the way we deploy and manage applications over the past years. It has enabled agility, scalability, and efficiency in modern software development. However, the dynamic nature of containers requires robust monitoring solutions to ensure optimal performance and reliability. In this article, we will discuss the top 10 container monitoring tools that are essential for anyone navigating the containerized landscape.

Mastering Prometheus Exporters: Techniques and Best Practices

If you’re into monitoring, Prometheus is probably an essential part of your stack. Thanks to its expressive query language (PromQL), scalability, and configurable data format, it remains one of the most popular tools for data collection. Paired with Prometheus exporters, the tool can adapt to a variety of surroundings, which is one of its strongest points.

Netdata Best Practices: Optimizing Your Monitoring Setup

Effective system monitoring is non-negotiable in today's complex IT environments. Netdata offers real-time performance and health monitoring with precision and granularity. But the key to harnessing its full potential lies in the optimization of your setup. Let’s ensure you are not just collecting data, but doing it in the most optimal way while gaining actionable insights from it. The starting point for optimization is a robust setup.

I've Made a Huge Mistake: Implementing Agile on Infrastructure Teams

Bad planning methods can damage team morale and prevent teams from improving the systems they maintain. In this talk, Sam Handler from Shopify explains how his attempts to fix poor infrastructure planning processes through Agile methods failed. Drawing from this experience, he offers several principles that can help infrastructure teams improve the way they work.

Quick Demo of Logs Pipelines in SigNoz

Log pipeline allows you to preprocess your logs for enrichment, transformation, and attribute extraction before they get indexed. Here's a quick demo of using the Logs pipeline feature in SigNoz to parse Nginx logs. More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator.

System Operators: Unlock Log Management Mastery with systemd-journal and Netdata

System operators know the drill: as the complexity of systems scales, so does the deluge of logs. Traditionally, taming this relentless tide demands a concoction of costly tools and laborious configurations—until now. The dynamic duo of systemd-journal and Netdata is revolutionizing log management, turning what was once a Herculean task into a streamlined, powerful, and surprisingly straightforward process.

Understanding Cybersecurity Insurance Requirements and How Network Visibility Can Help

From supply chain cyber attacks to ransomware, there is plenty of risk facing modern digital businesses. Cyber insurance can help mitigate that risk, but the complexity of cybersecurity insurance requirements can often create a catch-22 for businesses.

Explore the Power of Link Monitoring for Seamless Network Performance! Infraon

Welcome to our latest video! If you're a network admin striving for flawless data flow within your systems, you've come to the right place. In this video, we delve deep into the world of Link Monitoring and its incredible potential to revolutionize your network's performance.

Application Observability on RKE2 With SUSE Rancher and StackState

Please join Jeroen van Erp, StackState's Product Manager while he shows you how you can achieve full observability of your SUSE Rancher managed clusters. He'll demonstrate StackState's Kubernetes troubleshooting capabilities for development teams. You can easily manage your Rancher clusters and gain visibility into all your Kubernetes resources by installing the StackState agent from the SUSE Rancher marketplace. Jeroen will walk you through the service overview, service dependency map and powerful troubleshooting features that StackState offers.
Sponsored Post

Taking down (and restoring) the Raygun ingestion API

In a world where Software as a Service (SaaS) products are integral to daily life, maintaining uninterrupted service for end-users is paramount. However, stuff happens. When it does, our most valuable response (other than restoring service ASAP) is to review the series of events that led up to the incident and learn from them. On August 25th, 2023, at 7:02 AM NZT, Raygun experienced a significant incident that impacted our API ingestion cluster, leading to an outage lasting approximately 1 hour and 15 minutes. While this wasn't fun for anyone involved, this incident did prove to be a valuable learning experience, shedding light on the importance of infrastructure management and resilience.

EventSentry v5.1: Anomaly Detection / Permission Inventory / Training Courses & More!

We’re extremely excited to announce the availability of the EventSentry v5.1, which will detect threats and suspicious behavior more effectively – while also providing users with additional reports and dashboards for CMMC and TISAX compliance. The usability of EventSentry was also improved across the board, making it easier to use, manage and maintain EventSentry on a day-by-day basis. We also released 60+ training videos to help you get started and take EventSentry to the next level.

Monitoring Single Page Applications with Synthetics and Browser-based RUM

Businesses today are increasingly dependent on Single Page Applications (SPAs) for better user experiences. A Single Page App is when a user loads a web document and the application then updates different parts of the page with background requests. This is opposed to the more traditional Multi-page Applications (MPAs) where each click loads a different web document. Like the way you’re (hopefully) reading different pages on this web server.

Enhance your cloud security with MITRE ATT&CK and Sumo Logic Cloud SIEM

As cloud applications and services gain prominence amongst organizations, adversaries are evolving their toolset to target these cloud networks. The surge in remote work and teleconferencing presents unprecedented opportunities for nefarious activities. Enter the MITRE ATT&CK Framework, also known as a MITRE ATT&CK Matrix—a treasure trove for defending cloud infrastructure and on-premises infrastructure against the newest adversary tactics, techniques, and procedures (TTPs).

AI Explainer: What Is Data Cleaning?

In a previous blog post, which was a glossary of terms related to artificial intelligence, I included this brief definition of "data preprocessing": It is common for people familiar with these matters to talk about not having clean data. When dealing with AI for whatever your needs are, clean data is crucial for the quality of results. Garbage in, garbage out, as they say. So, let’s dive into what it means to have clean data.

From isolation to integration: Why siloed IT teams should leverage full-stack observability

Discover how full-stack observability brings siloed teams together for greater productivity, efficiency and profitability. When application entities become increasingly distributed, so does the data they hold — and that’s a huge challenge for organizations managing and governing expanding complex application environments.

Helios Runtime for AppSec: The missing link in application security

Modern development teams increasingly rely on open-source packages to rapidly build and deploy applications. In fact, most, if not all applications consist of far more open-source and 3rd-party code than the code that’s‌ written by their dev teams.

Analyzing the Traffic Patterns of Broadband Subscriber Behavior

Broadband subscriber behavior analysis is the process of collecting and analyzing data on how broadband subscribers use the internet. This data can be used to gain insights into subscriber needs and preferences, as well as to identify potential problems with the broadband service.

What is AIOps? AIOps Explained

What is AIOps? Simply put, AIOps uses big data, analytics and machine learning to automate and improve IT operations (ITOps). AI is particularly important in ITOps functions such as anomaly detection and event correlation, as it has the ability to analyze large volumes of network and machine data to find patterns, identify the cause of existing problems and find ways to forecast and prevent future issues.

What Is OpenTelemetry? A Complete Introduction

What is OpenTelemetry? Simply put, OpenTelemetry is an open source observability framework. It offers vendor-agnostic or vendor-neutral APIs, software development kits (SDKs) and other tools for collecting telemetry data from cloud-native applications and their supporting infrastructure to understand their performance and health. Managing performance in today’s complex, distributed environment is extremely difficult.

Mastering Kubernetes Node Management with the `kubectl cordon` Command

For many developers and engineers, Kubernetes is the de facto choice for container orchestration. That’s primarily because of its efficiency in handling and scaling container workloads. However, the complexity of managing nodes in a Kubernetes cluster can cause recurring headaches for even the most experienced and skilled IT teams. This is where `kubectl cordon` comes into play.

Monitor your OpenAI usage with Grafana Cloud

In the ever-changing field of artificial intelligence, OpenAI is consistently seen as a leader in innovation. Its AI models, starting with GPT-3 and now with GPT-4, are already used extensively in software development and content creation, and they’re expected to usher in entire sets of new systems in the future.

Trouble Finding Citrix Expertise? The Answer is Technology

It is all in the numbers. And the numbers are not changing anytime soon. Citrix sites support over 400,000 customers. On average, an organization has 2-3 total FTEs managing and monitoring their Citrix environment. If you take those numbers, then roughly there is a demand for at least 1.2 million experts who need to be able to interact with Citrix in a knowledgeable way. Now on the supply side. We know there are 62 named Citrix Technology Professionals (CTPs) every year who are highly certified.

Top 6 Kubernetes Monitoring Tools

Kubernetes has become the de facto standard for container orchestration, enabling organizations to deploy, scale, and manage containerized applications with ease. However, ensuring the reliability, performance, and security of applications running in a Kubernetes environment requires robust monitoring and observability solutions.

What is VMware Tanzu? And Why does Tanzu Matter?

Here in the Benelux region, we have recently been seeing increased interest in our capabilities to monitor and automate root-cause diagnostics for VMware Tanzu and other containerized / K8s technologies. Tanzu monitoring is one of the VMware technologies we’ll be demoing at VMware Explore in Barcelona (6 – 9 November 2023) and expect to see a lot of interest in it.

Upcoming changes to Netdata Cloud plans

For full details on the plans and prices, check out the FAQ section of the pricing page. We continue to look for ways to make Infrastructure monitoring a reality for teams and organizations of all sizes. For users to get started with monitoring, relatively small infrastructure for Home labs, students and Non profit organizations with no investment, Netdata offers the Free Community Plan.

[Webinar] End-to-end Azure observability: The complete essentials of Azure monitoring

As your business expands, you need to scale your infrastructure accordingly. And with the complexity of modern cloud infrastructures, it's crucial to have a comprehensive observability strategy in place. Discover ways to achieve operational excellence throughout your Azure infrastructure with our webinar.

How Uber Freight Powers Intelligent Logistics with Datadog

Thiyagarajan Anandan, Uber Freight, shares how he and his team have created a center of excellence for monitoring and DevOps culture. Uber Freight, a division of Uber, delivers an end-to-end enterprise suite of Relational Logistics to advance supply chains and move the world’s goods. With more than 1,000 shippers across $18B freight under management (FUM), it’s critical for Uber Freight to provide a 99.99% uptime for its shippers and customers. Since migrating to the Datadog platform, Uber Freight for the first time has unlocked the full breadth and depth of their systems, thereby significantly decreasing MTTR/MTTD and delivering an improved customer experience.

Okta evolving situation: Am I impacted?

Cybersecurity is never boring. In recent months, we’ve seen major cyberattacks on Las Vegas casinos and expanded SEC cybersecurity disclosure rules are top of mind. Is it any wonder we consistently recommend taking a proactive approach to secure your environment with a defense-in-depth strategy and appropriate monitoring? News outlets reported the recent compromise at the Identity and Authentication (IAM) firm, Okta.

How To Recover a Cribl Stream Instance Without GitOps/GitHub

When Cribl Stream becomes the center of your data universe, your individual settings, routes, pipelines, and packs become a critical aspect of your work. What happens if you lose access to the UI? If you are on a licensed version of Cribl Stream backing up the work that you are in Sources, Destinations, Routes, Pipelines, and Packs would be done easily using the GitOps remote repo.

What is IT Asset Management (ITAM)?

Organizations collect technologies like kids collecting baseball cards. As a company’s IT strategy matures, it adds new technologies to supplement previously existing ones, just like kids add new rookie cards to their collections of classics. While kids can leave their baseball cards randomly piled in a shoebox, organizations need to carefully identify and track their IT assets so that they can appropriately manage digital performance and cybersecurity.

Logic App Best Practices, Tips, and Tricks: #37 How to get distinct values from an array?

Welcome again to another Logic Apps Best Practices, Tips, and Tricks. In my previous blog posts, I discussed some of the essential best practices you should have while working with the Azure Logic App. Today, I will speak about another helpful Best Practice, Tips, and Tricks that you must consider while designing your business processes (Logic Apps): How do we get distinct values from an array or repeating structure?

The Role of Generative AI and Large Language Models in IT Operations

Artificial intelligence, particularly generative AI and large language models have changed how we approach IT operations, cybersecurity, and observability. And though we can point to measurable benefits and outcomes from applying LLMs to ITOps, there is also a lot of speculation to deal with. Phillip Gervasi, Director of Technical Evangelism at Kentik, and Christoph Pfister, Chief Product Officer at Kentik, discuss what generative AI and LLMs are, how they can be used to improve IT operations, and what the future might hold.

Experience seamless Meraki network monitoring with OpManager

According to a study by Zippia, 70% of the organizations already have a digital transformation policy in place or are currently working on one. This implies the growing significance of modernization in IT. With new-age techniques such as SD-WAN, organizations can deliver an enhanced end-user experience, ensure reliable QoS, enhance their network security, optimize their network performance, and reduce their overhead costs.

Sponsored Post

Service Watch and Desktop Virtualization

This article covers the benefits and instructions for deploying Service Watch Desktop for end-user devices and Desktop Virtualization platforms such as Azure Virtual Desktop, Windows 365, or Amazon Workspaces. The use of Virtual Desktop Infrastructure (VDI) is on the rise because of remote work, and increased need for security, and outsourcing. The rise of these environments brings the need to proactively monitor and troubleshoot remote virtual desktops - there are lots more moving parts, for sure.

Observability for Sustainability

For the past 20 years, the various stakeholder communities that together constitute the IT industry have attempted to address sustainability. The original efforts grew out of the realisation that even as far back as 2005, the hardware and software that underlay the digital world were responsible for approximately 5% of overall energy consumption and that both the percentage and absolute amounts of energy required were growing in the double digits.

Introducing Honeycomb for Kubernetes: Bridging the Divide Between Applications and Infrastructure

In our continuous journey to support teams grappling with the complexities of Kubernetes environments, we’re thrilled to announce the launch of Honeycomb for Kubernetes, a dedicated solution designed to bridge the growing divide between infrastructure/platform teams and application developers. This is available to all plans (including Free!) at no additional cost.

What is the Purpose of Syslog Monitoring in Enterprise Software Companies?

Baseball fans know about the various in-game statistics and actions requiring someone to keep them as records. From a player's overall performance at-bat to a game's final score at the bottom of the ninth, dozens (possibly hundreds) of different statistics are happening throughout a season. In Major League Baseball, these records are essential for the team owner, front office workers and coaches to figure out strategies on the diamond or how to distribute fair pay.

How WOW! Modernized Legacy Infrastructure Monitoring with InfluxDB and Kafka

With over 500,000 residential, business, and wholesale customers across multiple markets in the United States, WideOpenWest (WOW!) is one of the United States’ largest broadband providers. They aim to connect homes and businesses to the world with fast and reliable internet, TV, and phone services.

Most Popular Status Page Providers Used by Top SaaS Vendors

For SaaS companies and cloud vendors, a status page isn’t just a tool — it’s a testament to their commitment to users. But with numerous status page providers in the market, which ones are the top SaaS companies gravitating towards? It comes as no surprise that the most popular status page provider in 2023 is Statuspage.io offered by Atlassian. We expected it at StatusGator when we began our analysis. However, our curiosity didn’t stop there.

What is Observability? Grafana for Beginners Ep. 1

When you are getting started with observability, the jargon and concepts used to explain observability may go straight over your head. Let’s take out the complexity and talk about observability in the simplest terms possible. Join Lisa Jung, a senior developer advocate at Grafana, to get your learning on with the Grafana for Beginners series. You will learn about concepts such as observability and DevOps and how Grafana can be used to observe your system as a part of your DevOps practice.

Ask Me Anything WhatsUp Gold and Flowmon Integration

Watch the Ask Me Anything: WhatsUp Gold and Flowmon Integration webinar where you will learn how to leverage Flowmon’s NPMD/NDR within WhatsUp Gold to view details about your traffic analysis through the same interface you use to monitor your infrastructure. We’ll discuss the benefits of the integration and how fewer tools means better MTTR, more efficiency, and better/faster diagnosis for your business.

Grafana Tempo 2.3 release: faster trace queries, TraceQL upgrades

Grafana Tempo 2.3 has been unleashed upon the world, bringing with it the latest iteration of the vParquet backend! Tempo 2.3 has a little bit of everything, but the headline item here is vParquet3 and new features that improve search speeds. Watch the video above for all the details, or continue reading to get a quick overview of the latest updates in Tempo. If you’re looking for something more in-depth, don’t hesitate to jump into the changelog or our Grafana Tempo 2.3 release notes.

APM vs Tracing vs Observability

Application Performance Monitoring (APM), tracing, and observability are fundamental software development and system management approaches. Each of these three concepts uniquely ensures that your applications operate, efficiently, smoothly, and reliably. Your organisation will more than likely already adopt one of these approaches, or even two, potentially all three.

What is Infrastructure Monitoring?

Infrastructure Monitoring can be a powerful tool for engineers to analyze, visualize and comprehend if a backend is affecting users, by collecting health and performance data from containers, servers, databases, virtual machines, and other backend components in a tech stack. Within this article, we will outline what Infrastructure Monitoring is, how it works, what Infrastructure Monitoring as a Service is, and some benefits of the solution.