Operations | Monitoring | ITSM | DevOps | Cloud

February 2023

6 ways to supercharge mobile app performance

With over 7 billion mobile users worldwide, there’s almost one device for every person on the planet. Not surprisingly, the most popular apps are dominated by social media, messaging and entertainment platforms. But consumers are also shopping and managing finances via mobile devices. And while most users are accustomed to waiting a few seconds for a web application response, mobile users are less forgiving and expect an instant reaction to their swipes and taps.

Implementing Distributed Tracing in a Java application

Monitoring and troubleshooting distributed systems like those built with microservices is challenging. Traditional monitoring tools struggle with distributed systems as they were made for a single component. Distributed tracing solves this problem by tracking a transaction across components. In this article, we will implement distributed tracing for a Java Spring Boot application with three microservices.

Webinar Recap: Taming Data Complexity at Scale

As a Senior Product Manager at Mezmo, I understand the challenges businesses face in managing data complexity and the higher costs that come with it. The explosion of data in the digital age has made it difficult for IT operations teams to control this data and deliver it across teams to serve a range of use cases, from troubleshooting issues in development to responding quickly to security threats and beyond.

Announcing the General Availability of Playwright Test Support

Back in October of 2022 we unveiled the beta of @playwright/test. We’re now happy to announce that Playwright Test (PWT) is now generally available! We’ve worked hard to make Checkly the best way to run your Playwright tests, and we’ve also decided to make Playwright—which is experiencing a surge in usage and popularity— the default and recommended web testing framework to use with Checkly.

The SRE Report 2023: Forecasts and the Current Economy

As questions and challenges loom over the tech industry and the larger economy, now is a perfect time for us to take a step back and learn from the past. As reliability engineers, we regularly use Service Level Objectives (SLOs) to understand the performance, reliability, and trends of our systems to help inform and prioritize our decision making.

Ping Management Pack compatibility with SCOM 2022

The Opslogix Ping Management Pack is a powerful Management Pack designed to monitor the availability and performance of network devices and services. It is a must have Management Pack for SCOM, which is used by many IT professionals to monitor their infrastructure. With the release of SCOM 2022, many users are wondering whether the Opslogix Ping Management Pack is compatible with the new version.

Optimize Kubernetes workload resourcing with StormForge and Datadog

StormForge Optimize Live is a machine learning-powered performance and resource optimization solution for Kubernetes workloads. Optimize Live ingests and analyzes production observability data and recommends specific actions to optimize CPU and memory utilization. You can take these actions manually or set them to occur automatically, making it easier to maintain a high level of application performance while minimizing cloud costs.

How to Monitor Redis with Prometheus

The current popularity of Redis is well deserved; it’s one of the best caching engines available and it addresses numerous use cases – including distributed locking, geospatial indexing, rate limiting, and more. Redis is so widely used today that many major cloud providers, including The Big 3 — offer it as one of their managed services. In this article, we’ll look at how to monitor Redis performance using Prometheus, the similarly popular open-source monitoring system.

Choosing the Right AWS Messaging Service for Your Application

With the dawn of microservices and serverless, event-driven architectures have become the way to go when building a new system in the cloud. This approach has allowed for greater scalability, as the system can easily adapt and respond to changes in traffic or demand without having to overhaul the entire architecture. Additionally the Event-driven approach means your application is mainly concerned with routing event data to the right services.

Kubernetes Liveness Probes: A Practical Guide

Have you ever wondered how you can help Kubernetes manage your pods in the most efficient way? Kubernetes can do a decent job “out of the box,” but it can be optimized just like any other system. One such optimization in the Kubernetes world is introducing liveness probes, and in this post, you’ll learn everything about them.

Azure Monitor Pros and Cons

Enterprise use of Microsoft’s Azure Cloud Services is expanding at an unprecedented rate as cloud computing usage expands. However, enterprises that have recently switched to Azure continue to have serious concerns about monitoring their applications efficiently. As the business overgrows, it is essential to understand its functioning and health simultaneously.

Exploring Your Network Data With Kentik Data Explorer

A cornerstone of network observability is the ability to ask any question of your network. That means having an unbound capacity to explore the tremendous amount and variety of network telemetry you collect. It means seeing trends and patterns from a macro level, but it also means getting very granular to pursue any line of analysis of your data. Collecting information from flow records, SNMP, streaming telemetry, BGP, eBPF, and so on is indeed very important.

Grafana 9.4 release: Easy data source setup, custom panels, Grafana Alerting updates, and more

Grafana 9.4 is here! Get Grafana 9.4 With the latest Grafana release, we’re introducing a wealth of new features and improvements that makes getting started with Grafana even easier and that take your visualizations and observability best practices to the next level. In addition to enabling TraceQL, the new query language for distributed tracing in Grafana Tempo 2.0, for all Grafana Cloud users, the Grafana 9.4 release comes with a fresh round of features.

10 Ways to Optimize your Azure cost

In modern times, building and publishing an application has become very easy with Cloud-based deployment. Users don’t need to worry about infrastructure-related challenges like availability, reliability, scalability, etc. The cloud providers are responsible for keeping the deployment flow simple and intact. Providing many advantages and coherence, the high cost incurred for such benefits is the downside.

2023 is When More FinOps Practices will Shift Left and Cost Optimization around Logging will Get Central Stage

Effective troubleshooting and resolution of critical production issues require DevOps and R&D teams to utilize logging and observability. However, selecting the right logging solution can be challenging, given the wide range of available options and associated costs. Additionally, the strategy for logging usage should be tailored to the needs of different personas and use cases, such as DevOps engineers versus developers.

The 2023 Network IT Management Report Part 3: Extending Your Reach

This is the third in a four-part series focusing on the findings from our 2023 annual Field Report for IT Management. We surveyed 4500 IT professionals from internal IT teams and MSPs across North America to gauge where their organizations are heading from a network management perspective. In part three, we’ll discuss how organizations are thinking about extending the reach of their visibility and control over their networks with Wi-Fi and SaaS management.

The Ultimate Guide to SNMP

Simple Network Management Protocol (SNMP) is a basic network protocol designed to collect and report data from network devices connected to IP networks — even if the devices are different hardware and run different software. Most modems, routers, switches, servers, workstations, and printers will support SNMP communication. SNMP messages are transported via UDP on port 161.

Treat Application Performance Like A Feature

I, like many of you no doubt reading this, am an engineer with very strong opinions on how software should work. I am not interested in moving fast and breaking things; I am not interested in changing the world. I am interested in building pleasant, ergonomic software and charging money for it. My company, Buttondown, was born from that ethos.

What is Network Performance Monitoring: The Gandalf of Networks

In today's digital age, network performance monitoring has become a crucial aspect of IT infrastructure management. The ability to monitor and manage network performance is essential for ensuring that networks run efficiently and effectively. Just as Gandalf served as a wise and trusted advisor to the Fellowship of the Ring in J.R.R.

Enabling TLS on a Cribl Leader Node: Step-by-Step Guide

Securing your internal systems with TLS can be a daunting task, even for experienced administrators. However, with the right tools and guidance, the process can be made more manageable. In this blog, we’ll show you how to enable TLS for your internal systems on your Cribl Leader Node. We’ll walk you through the steps, and provide a video tutorial embedded below to help you follow along.

Checkly Introduces Monitoring as Code Workflow, Enabled By a New CLI, to Unite Testing and Monitoring

Checkly introduces monitoring as code workflow, enabled by a new CLI, to unite testing and monitoring. The company announced the general availability of Playwright Test and has selected Playwright as its preferred testing and monitoring framework. Company also unveils additional innovations including general availability of Playwright Test and selects Playwright as its preferred testing and monitoring framework
Sponsored Post

5 Advanced DevSecOps Techniques to Try in 2023

If you're here, you know the basic DevSecOps practices like incorporating proper encryption techniques and embracing the principle of least privilege. You may be entering the realm of advanced DevSecOps maturity, where you function as a highly efficient, collaborative team, with developers embracing secure coding and automated security testing best practices.

One Technology That Makes Renewable Energy More Efficient

Time series data can provide insight into ways to make energy production and consumption more cost-effective and efficient. The year 2022 saw the impact that world events can have on global energy markets. The most drastic fluctuations affected fossil fuels, which led to greater discussion about the practicalities of renewable energy. Fortunately, the move toward increasing reliance on renewable energy remains a consistent trend.

How To Monitor Server Performance with MetricFire

Servers are a critical component of modern IT infrastructure, and they play a key role in delivering the services and applications that power our digital world. Efficient servers can handle higher workloads and respond to requests more quickly, resulting in faster application response times and improved customer satisfaction. By optimizing server efficiency, businesses can ensure that their servers are operating at their maximum potential, with all resources being utilized to their fullest capacity.

Instrumenting Node.js code with Prometheus custom metrics

Automatic instrumentation is great, but to get the most out of your monitoring you often need to instrument your code. In this article I am going to explain how to instrument a Node.js express app with custom metrics using the Prometheus prom-client package. Although this article specifically addresses Node.js and express, my hope is that the general concepts are applicable to other languages too.

How Siemens Mobility is moving its trains into the future with Grafana Enterprise

Railway passengers may think of trains simply as a way to get from one place to another, but at Siemens Mobility — a rail transportation company dedicated to delivering sustainable, smart transport — they are that and much more. Siemens Mobility works with more than 3,000 partners, and its customers include Eurostar and Trans Pennine Express. In the U.K., Siemens Mobility maintains about 500 train units and logs 65 million passenger miles per year.

Efficiencies, Optimization of Existing Software Usage, Third-Party Dependencies Are Key Trends for 2023

PALO ALTO, Calif., February 27, 2023 (Newswire.com) – Uptime.com, the market leader in uptime and website monitoring, expects the industry to focus on the consumption and usage of products and services to improve efficiency and optimization in 2023.

Citrix & VMware Horizon - End User Experience Monitoring & Troubleshooting Platform

Industry-leading Citrix and VMWare Horizon end-user experience monitoring and troubleshooting software, with embedded intelligence and automation, that enables IT pros to anticipate, troubleshoot, and document performance issues regardless of where workloads, applications, or users are located.

Using Cribl Search for Anomaly Detection: Finding Statistical Outliers in Host CPU Busy Percentage

In this video, we'll demonstrate how to use Cribl Search for anomaly detection by finding statistical outliers in host CPU usage. By monitoring the "CPU Busy" metric, we can identify unusual spikes that may indicate malware penetration or high load/limiting conditions on customer-facing hosts. The best part? This simple but powerful analytic is easily adaptable to other metrics, making it a versatile tool for any data-driven organization.

Using Cribl Search for Anomaly Detection: Finding Statistical Outliers in Host CPU Busy Percentage

In this blog post, we’ll demonstrate how to use Cribl Search for anomaly detection by finding statistical outliers in host CPU usage. By monitoring the “CPU Busy” metric, we can identify unusual spikes that may indicate malware penetration or high load/limiting conditions on customer-facing hosts. The best part? This simple but powerful analytic is easily adaptable to other metrics, making it a versatile tool for any data-driven organization.

It had to be said: Teleworking promotes productivity and employee satisfaction

The formula of teleworking, extended among many companies from the phenomenon of Confinement, caused by the pandemic, has given rise to a new model of labor relations. A system already called “hybrid” because it combines teleworking with conventional physical presence.

Exploring DORA: Why creating a path to resilience maturity is a critical success factor for financial services organisations

DORA (the Digital Operational Resilience Act) recently came into force and will soon impact thousands of financial services organisations across the European Union (EU). In this blog, my colleague Clara Lemaire and I share some insights about the requirements of DORA, as well as how Splunk can support financial services organisations on their resilience journey. Let’s explore DORA!

How to choose and track your security KPIs

There's no denying that Key Performance Indicators (KPIs) can be critical for any security program, and many of us are fully aware of that. Nonetheless, in practice, confusion still remains about what security KPIs are crucial to track and how to choose the right KPIs to measure and improve the robustness of your security program. Here we'll propose a few ideas about how to select and track the right KPIs for your organization.
Sponsored Post

Act now on SAP automation and escape the automation paradox

In the last few weeks, I have had a number of conversations with customers about what we have come to call the Automation Paradox. What happens is that organizations seeking to automate manual processes are so busy, that they cannot find the time to invest in automating in the first place, and so they find themselves frozen - unable to move forward. I took some time to talk to customers who got through this, and found a number of themes that may help you get past this. Blocks are often a state of mind

How CCP Games Used Honeycomb to Modernize and Migrate its Codebase

Imagine a universe in which a massively multiplayer online role-playing game (MMORPG) sets Guinness World Records for the size of its online space battles—and that game is built on 20-year-old code. Well, imagine no more. Welcome to the world of EVE Online, where hundreds of thousands of players interact across 7,800+ star systems and participate in more than one million daily market transactions.

How the All in One Worker Group Fits Into the Cribl Stream Reference Architecture

Join Ed Bailey and Eugene Katz as they go into more detail about the Cribl Stream Reference Architecture, designed to help observability admins achieve faster and more valuable stream deployment. In this live stream discussion, Ed and Eugene will explain guidelines for deploying all in one worker group. They will also share different use cases and talk about the pros and cons for using the all in one worker group.

6 common issues and how to tackle them with Kubernetes monitoring tools

Kubernetes, one of the most popular container management systems, automates the work necessary for deploying and scaling a containerized application by managing a large number of interdependent microservices. Additionally, it allows for the migration and deployment of containerized applications across different platforms in a multi-cloud environment.

Using Nerdio Manager to Deploy eG Enterprise for AVD Monitoring: A Quick Start Guide

eG Enterprise is designed to be deployed automatically at scale within IaC type workflows and by products such as Nerdio Manager that facilitate automation. Deployment of eG Enterprise can be automated with or without Nerdio Manager. Many of our customers do choose Nerdio Manager to automate other workflows and to create and manage images for Azure Virtual Desktop (AVD) deployments.

US Hospital Saves $1.7M through Onsite Ticket Reduction

In any hospital, IT tickets raised by doctors and nurses are critical because every IT issue they face takes time and energy away from the delivery of care. The longer it takes to resolve a ticket from clinical staff, the greater the potential negative impact that issue can have on patient experience. Yet even with significant investment in support resources, doctors and nurses may still feel their technology issues are not resolved fast enough.

18 Best Practices for Cloud Automation

Everyone is shifting their workloads to the cloud, but one challenge remains: Workloads need to be automated. Whether they’re employing a cloud-native, cloud server, or hybrid model—IT operations teams need to know what, when, and now also where to automate. Speaking at the recent 2022 Automation Virtual Summit, Dave Kellermanns, Global Advisor for Automation, Broadcom Software explored some lessons learned and best practices for cloud automation. Read on to see some of the highlights.

Autonomously optimize AWS Lambda deployments with Sedai and Datadog

In dynamic production environments, unpredictable traffic loads and frequent code changes can make it difficult for organizations to consistently optimize their cloud infrastructure, resulting in application performance issues, latency, and wasted cloud spend. Teams that manage large-scale cloud infrastructure deployments are often forced to tune their workloads’ configurations using a complicated mesh of script jobs—or worse, manual remediation by on-call engineers prompted by alerts.

OpenTelemetry Nginx Tutorial - Instrument and visualize traces

OpenTelemetry is an open-source standard for instrumenting cloud-native applications for generating different types of telemetry data. A robust observability framework set up using OpenTelemetry can help tremendously while troubleshooting software in production. Nginx is one of the most widely adopted web servers. Most often, nginx is used as a reverse proxy. It serves the frontend or backend applications behind the reverse proxy.

Introducing the XYZ chart: A three-dimensional way to visualize your data in Grafana

This panel is in alpha version and still in development. To use it as is, you need to modify your configuration file and set enable_alpha = true in the panels section. More information can be found on this page. Two-dimensional graphics are the de facto way to visualize data within the observability realm, and Grafana is really good at plotting data this way.

Site Reliability Engineer: Responsibilities, Roles and Salaries

DevOps gained popularity in order to combat siloed workflows, decreased collaboration and a lack of visibility across the software development lifecycle. While establishing a culture of DevOps has helped teams collaborate better and deliver reliable software faster, DevOps teams don’t necessarily have someone specifically dedicated to developing systems that increase site reliability and performance. That’s where a site reliability engineer (SRE) comes into the picture.

How to Analyze Enterprise-Wide Device Battery Health with Nexthink

Hardware is one of the most important, and most expensive, line items in IT’s purview. Constantly refreshing and provisioning hardware takes time and can be a very manual process. And one of the most significant reasons for refreshing hardware is battery life. Monitoring the health status of device batteries is crucial, and determining the health status can help maintain and extend the device lifetime in any environment.

How to track the failures in microservice applications?

Microservices architecture (often shortened to microservices) is an architectural style for developing applications. Microservices allow a large application to be separated into smaller independent parts, each having its own realm of responsibility. To serve a single user request, a microservices-based application can call on many internal microservices to compose its response. It is critical to track failures in microservice to take corrective actions and keep the business process ongoing.

How to Perform a Proactive System Cleanup to Improve System Performance

Device performance issues can arise due to insufficient drive space, this may be one of the largest drivers of device issues. These issues can block OS updates and escalate to BSOD, requiring a hard reset of the device. Although, these are common issues, the business implications of them at scale cannot be overstated. Employee productivity drops and deadlines are missed. This in turn can lead to the business not meet its objectives. Don’t let low system space derail your business.

How to Monitor Website Uptime in 2023

An essential element of your business success lies in establishing trust between you and your users. A big part of this is a reliable website that performs and is there when your users need it. We’ll show you how a website uptime monitoring tool can help you achieve excellence online, with all the wide-ranging benefits that encompasses, not least engendering trust between you and your users.

Sponsored Post

Semi-automated NiCE VMware monitoring administration

Scripted Configuration of the NiCE VMware Management Pack provides a semi-automated means of setting up VMware monitoring administration with the help of approved and tested script functions. The NiCE Development Team has put together a new technical white paper for you to use automated administration easily.

SolarWinds named supplier on the Crown Commercial Service's G-Cloud 13 framework to provide secure cloud services

G-Cloud 13 serves as an online catalogue where public sector customers can buy cloud-based computing services, and with SolarWinds as a named supplier, it's never been easier to support public sector organisations with its services.

7 Best Practices for Data Visualization

A look at best practices, no-code and low-code platforms you can use, common visualization types, criteria for good data visualization and more. Organizations regularly generate an overabundance of data that is essential for decision-making. Data visualizations play an important role in helping people understand complex data and observe patterns and trends over a period of time.

Distributed alerting with the Elastic Stack

Modern computing environments and distributed workforces have produced new challenges to traditional information security approaches. Many traditional threat detection and response strategies rely on homogeneous environments, system baselines, and consistent control implementations. These strategies have been built on traditional environment assumptions that may no longer be true in your environment with the evolution of cloud computing, remote work, and modern culture.

Elastic Synthetics Projects: A Git-friendly way to manage your synthetics monitors in Elastic Observability

Elastic has an entirely new Heartbeat/Synthetics workflow superior to the current workflow. If you’re a current user of the Elastic Uptime app, read on to learn about the improved workflow you can use today and should eventually migrate toward.

How to extract label values from Prometheus metrics in Grafana

Prometheus metrics are usually visualized as numeric values on a graph, with the metrics categorized by labels. But what do you do when the numerical value doesn’t matter, and all of the information is in the labels? In that case, you might need to visualize the labels themselves. This scenario can arise because you’re not always in control of how the metrics get reported, but you do often need to visualize what’s there.

FinOps Observability: Monitoring Kubernetes Cost

With the current financial climate, cost reduction is top of mind for everyone. IT is one of the biggest cost centers in organizations, and understanding what drives those costs is critical. Many simply don’t understand the cost of their Kubernetes workloads, or even have observability into basic units of cost. This is where FinOps comes into play, and organizations are beginning to implement those best practice standards to understand their cost.

Prometheus and Kubernetes Metrics Ingestion

Prometheus is one the the most acclaimed solutions for Kubernetes monitoring. There are multiple add-ons and exporters that facilitate the task of pulling Kubernetes metrics. Sysdig Monitor is a cloud-native observability platform that helps businesses with the whole observability lifecycle. It provides simplicity at all times, allowing companies to rapidly pull their Kubernetes and Prometheus metrics without headaches.

What's New in Sysdig - February 2023

What’s New in Sysdig is back again with the February 2023 edition! I am Michael Rudloff, an Enterprise Sales Engineer based in the United Kingdom, and I am very excited to update you with the latest feature releases from Sysdig. This month, Sysdig Secure brings a couple of new features. We have added reports to Risk Spotlight – Risk Spotlight can show you which packages with vulnerabilities are currently in use in a running container across your whole Kubernetes environment.

SumUp Uses Honeycomb to Improve Service Quality and Strengthen Customer Loyalty

Growing pains can be a natural consequence of meteoric success. We were reminded of that in our recent panel discussion with SumUp’s observability engineering lead, Blake Irvin, and senior software engineer Matouš Dzivjak. They shared how SumUp’s rapid growth spurt compelled them to change their resolution process—both logistically and culturally—to ensure a service level quality that reflects their customer obsession.

Deciding Whether to Buy or Build an Observability Pipeline

In today's digital landscape, organizations rely on software applications to meet the demands of their customers. To ensure the performance and reliability of these applications, observability pipelines play a crucial role. These pipelines gather, process, and analyze real-time data on software system behavior, helping organizations detect and solve issues before they become more significant problems. The result is a data-driven decision-making process that provides a competitive edge.

The Russification of Ukrainian IP Registration

Last summer we teamed up with the New York Times to analyze the re-routing of internet service to Kherson, a region in southern Ukraine that was, at the time, under Russian occupation. In my accompanying blog post, I described how that development mirrored what took place following Russia’s annexation of Crimea in 2014.

Large Health System Reduces Critical Application Crashes by 90% via Nexthink Automation

Digital innovations have completely transformed the healthcare industry over the past decade. Today, clinical staff rely on applications to complete nearly every aspect of their day to day duties, from maintaining data privacy compliance to conducting telehealth. That makes it even more important that these critical applications stay online.

Fixing Security's Data Problem: Strategies and Solutions with Cribl and CDW

Cribl's Ed Bailey and CDW's Brenden Morgenthaler discuss a foundational issue with many security programs that lack the right data to detect issues and make fast decisions. Data drives every facet of security and bad data/incomplete data weakens your overall program. Ed and Brenden will discuss common issues and strategies for solving security's data problem.

Economics and Infrastructure Monitoring: Why Netreo is the Right Choice Right Now

Is the U.S. economy in a recession? A recent Forbes article demonstrates how illusive that answer can be. By the general definition that two consecutive quarters of negative gross domestic product (GDP) equals a recession, the US entered a recession in the summer of 2022. However, the National Bureau of Economic Research (NBER), which defines U.S. business cycles, uses a different definition.

How to Troubleshoot Packet Loss

If you're experiencing network issues, such as slow Internet speeds, poor audio or video quality, or dropped connections, packet loss could be the culprit. Packet loss is a common network issue that affects networks of all sizes, and its impact on users and businesses can be very frustrating if it’s not dealt with quickly. So we’re teaching you how to identify and troubleshoot packet loss, and causes of packet loss, in your network!

Sentry's Frontend Tests: Migrating from Enzyme to React Testing Library

At Sentry, we practice continuous delivery, which means that code can be released as soon as it’s merged into the main branch. This allows us to iterate quickly on our product, making new features, bug fixes, configuration changes, and experiments available in production as frequently as possible. We merge over 700 pull requests a month.

Practical Introduction to Prometheus Monitoring in 2023

Prometheus is a powerful open-source monitoring system that can collect metrics from various sources and store them in a time-series database. It is widely used in the industry to monitor and alert the health of applications, servers, and other infrastructure components. In this article, we will provide a practical introduction to Prometheus monitoring and cover the essential concepts and features that you need to know to get started.

How to untangle monitoring noise and leverage observability best practices

Most organizations suffer from some form of alert noise, shares Adam Blau, senior director of product marketing at BigPanda. “Alert noise is only going to increase as organizations support cloud-native applications spanning multiple public and private clouds, including ephemeral deployments and more. It’s not going to get easier for organizations to understand the signal from all those alerts being sent,” Blau said.

Bracing for Impact: Why a Robust Observability Pipeline is Critical for Security Professionals in 2023

2023 is well underway and now more than ever it’s important to stay ahead of data trends and security concerns that are ever mounting. With the cost of catastrophic cyber attacks estimated to be ten times that of all other disasters combined, businesses need to take proactive measures to implement a security data pipeline to protect their data and comply with security and retention requirements.

Announcing the public beta

We are thrilled to announce the public beta of Spectate, a next-gen monitoring and incident management platform. Initially built as an internal tool, Spectate has now been expanded to enable public use. Spectate allows you to monitor your websites, manage incidents and create elegant branded status pages. With Spectate, you can set up custom alerts and notifications that keep you informed about any issues, helping you to identify and address problems quickly with the help of our Incident AI.

Getting Started with Google Cloud Managed Service for Prometheus

As you scale your services in production, you also need to scale your monitoring. Managing Prometheus at scale to handle increasingly large metric volumes can be a huge operational challenge, but Google Cloud Managed Service for Prometheus is here to help. We’re excited to introduce Google Cloud Managed Service for Prometheus and help you scale by removing operational toil!

Track Errors in Your NestJS Application with AppSignal

Many variables — such as your users' device type and configuration, external hosting services, and third-party libraries — can impact an application's performance. Without a performance monitoring system in place, numerous problems can arise. These issues could even mean that users stop using your application.

Monitor User Behavior to Detect Insider Threats

The risk from insider threats has grown massively, with perpetrators frequently getting around organizations' increasingly complex perimeter protections. It is one of the most common ways customer data or industrial and trade secrets leak. This very complex topic includes many types of threats and techniques. Let's discuss how you could detect insider threat activity at a network level.

How to Create and Manage Secrets in Kubernetes

Kubernetes Secrets are a built-in resource type that's used to store sensitive data. This blog teaches you how to work with Secrets in Kubernetes. Kubernetes can do many things, but we usually refer to it as a “container orchestrator.” Orchestrating containers means starting and restarting them when needed, ensuring their configuration matches the declared state, and autoscaling them. But Kubernetes can do much more than that.

Next Level Oracle Monitoring Options Features DeepDive NiCE 2023Q1

Next Level Oracle Monitoring | Options, Features, Deep-Dive Performance, availability, and Oracle observability are the hot topics for the next decade. Get a guided tour on advanced Oracle monitoring based on Microsoft System Center Operations Manager, run by the NiCE Product Experts. The NiCE Oracle Management Pack, in conjunction with one of the big players in IT Operations Management, easily enables end-to-end Oracle observability and thus helps reduce risk and downtime.

5 Common Collaboration Tool Myths Busted by Collaboration Experience

As part of Nexthink’s launch of the new Nexthink Infinity Platform we also launched Collaboration Experience – adding detailed Microsoft Teams and Zoom call telemetry and insights to the comprehensive “See, Diagnose & Fix” capabilities of the platform. After 4 months, we are ingesting telemetry from almost 2 million employee calls and sessions a day – great testimony to the power and scalability of the Nexthink Infinity platform.

General availability of Flutter Agent for Mobile Real User Monitoring

Cisco AppDynamics Flutter Agent for MRUM is a powerful tool that provides comprehensive performance monitoring for mobile applications. As a mobile app developer, staying on top of the latest technologies and tools is essential to building high-performing, user-friendly apps that keep customers engaged and coming back for more. One of the most popular mobile development frameworks on the market today is Flutter, a free and open source framework created by Google.

Monitor any SQL metrics with Netdata (and Pandas )

We recently got this great feedback from a dear user in our Discord: This is great and exactly what we want, a clear problem or improvement we could make to help make that users monitoring life a little easier. This is also where the beauty of open source comes in and being able to build on the shoulders of giants - adding such a feature turned out to be pretty easy by just extending our existing Pandas collector to support SQL queries leveraging its read_sql() capabilities.

How do I fix high latency issues for remote users?

If you’re working remotely or managing remote teams, high latency can be a frustrating issue to deal with. High latency, also known as lag, is the delay between when a user inputs a command and when the server responds. High latency can make remote work slow and inefficient, making it difficult to get work done. Here are some things you can try to fix high latency for a remote user.

What is data hygiene, and why is it important?

Data hygiene is a critical but often overlooked concept for businesses of all sizes. Think of any activity that involves the use and storage of data, like customer relationship management (CRM), marketing automation, customer service, or analytics; then add in trends like increasing regulations and customer expectations around privacy. Data hygiene needs to be on the top of your list when you’re managing customer relationships across multiple channels.

Get notified on Azure app registration client secret expiration!

Azure AD app registration identities are used to provide access to specific resources in Azure. We use some App Registrations in Azure, for example, the security concerns, many organisations dislike enabling users to build Azure Active Directory (AAD) and Service Principals (SP). Building a manual process, on the other hand, might be a bottleneck and a time sink. However, you can build an Office365 form for users to request AAD application registration.

Introducing OpenTelemetry Support: Take Action on Your Observability Data

As an open source company that grew out of a side project in 2008 to an application and performance monitoring platform (APM) used by over 3.5 million developers, Sentry is committed to open source and the community of developers maintaining and building in the open. Similarly, we take a public approach to building our software, which is why it’s a natural extension of our values to announce our support for OpenTelemetry (or OTel), the leading open standard for observability.

See how reliability management enhancements expand your SLO value

When we announced the general availability of reliability management in Sept 2022, you saw how crucial this functionality was for the digital customer experience. Unique insights from users helped to improve the experience and usability that we’ve incorporated into our latest release. Now you can use a wide range of features that will help you on your reliability management journey.

Importing your Cloudwatch Metrics into Prometheus

Cloudwatch is the de facto method of consuming logs and metrics from your AWS infrastructure. The problem is, it is not the de facto method of capturing metrics for your applications. This creates two places where observability is stored, and can make it difficult to understand the true state of your system. That’s why it has become common to unify all data into one place, and Prometheus offers an open-source, vendor-agnostic solution to that problem.

AWS Configuration for the Cribl Pack for SentinelOne Cloud Funnel

In the blog titled “Streamline Endpoint Data with Cribl Pack for SentinelOne Cloud Funnel” we dove into the Cloud Funnel data, its relevance in the modern SOC, and how Cribl Stream transforms the data while addressing visibility gaps. We left the AWS-specific details to this blog for those not yet familiar with configuring AWS S3 buckets, SQS Queues, and Identity and Access Management (IAM).

Sponsored Post

Discovering Efficiency Through 2 Steps Synthetic Monitoring for Splunk

You're probably familiar with Splunk. It's one of the most popular big data solutions organisations worldwide use to monitor their systems in real-time. But you may not know that Splunk also offers synthetic monitoring solutions via 2 Steps. 2 Steps Synthetic Monitoring for Splunk is a powerful tool that can help you speed up your application troubleshooting process. Today we'll take a closer look at what it is and how it can benefit your organisation.

4 Key IT Operations Practices for Better Management

Here we go again. If 2022 wasn’t enough, there are new challenges in 2023 staring right at information technology leaders. As interest rates rise and consumer demand slows, companies plan to cut costs and do more with less. But what does all this mean for you? Amid this uncertainty, the IT operations department must adapt well to these changes. Because if they don’t, the business they support will be disadvantaged.

NiCE AIX Management Pack released

From industrial business applications to finance and supercomputers, AIX runs the most demanding mission-critical workloads smoothly. AIX, IBM’s Advanced Interactive eXecutive, is a highly secure and reliable OS, which is why performance- and dependability-driven industries use AIX for their enterprise servers. In fact, the majority of Fortune 500 health care, telecommunications, energy, and financial companies run on AIX.

Everything you need to know about hardware monitoring!

Today’s hybrid network environments have a balance of both distributed networks and implementations of modern technology. But they are not short of one core component: servers. Maintaining server availability and health has always been a necessity when it comes to network management. Preserving network uptime comes down to monitoring and managing the factors contributing to network downtime. One such factor that has a high potential to cause performance anomalies is hardware.

Raygun API Beta is now open to everyone

We’re excited to announce the offical Beta launch of the Raygun API, allowing you to extract, manipulate, and visualize insights from your account in innovative ways. This is included with all Raygun plans, and is now readily available to customers of all sizes. We’ve made the decision to release this Beta sooner than anticipated, as we’re eager to receive your early feedback to make sure we’re focussing on the endpoints that provide the most value to your team and business.

Integrate eG Enterprise with Multiple ITSM Tools at the Same Time

Previous versions of eG Enterprise limited the eG Manager integration to a single ITSM system such as ServiceNow ITSM, Autotask, JIRA or others. This limitation was particularly cumbersome for SaaS and MSP (Managed Service Provider) deployments where each tenant/customer may have had their own preferred ITSM system. Our latest release lifts this restriction. ITSM integration can now be done for a specific Organization, Organizational Unit, or even User.

InfluxDB SQL Queries with Python

Recently InfluxData announced SQL support in InfluxDB Cloud, powered by IOx. Users can now use familiar SQL queries to explore and analyze their time series data. The SQL support was introduced along with the usage of Apache Arrow. Apache Arrow is an open source project used as the foundation of InfluxDB’s SQL support. Arrow provides the data representation, storage format, query processing, and network transport layers. Apache Flight SQL provides a method for interacting with Arrow via SQL.

Get the Top 15 Microsoft Teams Alerts to Track Call Quality

To say that IT professionals have a lot on their plates is an understatement and when managing Microsoft Teams, many feel inundated by Microsoft Teams alerts. Since Microsoft Teams is the ubiquitous platform for communication and collaboration in modern workplaces, optimal Teams performance is critical. Microsoft Teams can experience performance issues that can have a significant impact on productivity.

Understand user journeys with AppDynamics Business iQ

Understand where in their journey, do your users abandon your application. It could be due to latency or other similar types of performance issues, or perhaps when trying to communicate with a 3rd party endpoint, such as a payment processing service. Learn how observing key metrics provides the insights you need to ensure your customers have the best-of-class experience.

Best Practices for Our Custom Dashboards

Custom dashboards are essential tools in the data-driven world for businesses, organizations, and individuals. They provide an interactive and comprehensive representation of data, making it easier to identify trends, patterns, and outliers. Custom dashboards are highly customizable, allowing users to select the most relevant data sources, visualizations, and metrics to meet their specific needs.

How We Manage Incident Response at Honeycomb

When I joined Honeycomb two years ago, we were entering a phase of growth where we could no longer expect to have the time to prevent or fix all issues before things got bad. All the early parts of the system needed to scale, but we would not have the bandwidth to tackle some of them graciously. We’d have to choose some fires to fight, and some to let burn.

Master Kubernetes Monitoring with these Must-Track Metrics

Managing a Kubernetes cluster requires a keen eye for detail and a deep understanding of its complex structure. To ensure smooth operation of your applications and optimal performance, it is vital to monitor a wide range of metrics across the different components of your cluster. In this article, we will discuss key metrics that can be used to monitor both self-managed and cloud-managed Kubernetes environments, helping you to keep your cluster running at its best.

How Wells Fargo modernized its observability stack with Grafana Enterprise and Grafana Cloud

Think of a monitoring tool — any monitoring tool. Got it? Good. Odds are, whatever came to mind was probably being used behind the scenes at Wells Fargo not too long ago. “You name it, and we probably had it at Wells Fargo,” said Senior Software Engineering Manager Nikhilesh Tekwani of the complex web of observability solutions that stretched across the U.S.-based financial institution.

ML-Driven Root Cause Analysis for Seagate Lyve Cloud

Seagate Lyve™ Cloud is a storage-as-a-service platform that delivers S3-compatible object storage with a simple and predictable cost structure. It offers ultra-high durability and scale along with enterprise grade security and uptime. When it comes to the reliability and resiliency of the Lyve Cloud Service, there are simply no compromises.

10 Best Grafana Alternatives

Grafana is a powerful open-source data visualization platform created by Torkel Ödegaard in 2014. With its front-end written in Typescript and a Golang back-end, this data monitoring platform allows users to create and share interactive and dynamic dashboards with custom charts and panels using data from various sources, including InfluxDB, Prometheus, Elasticsearch, and many others. Template variables are also available as dropdown options to create dynamic and reusable dashboards.

Catchpoint Explainer Video

Catchpoint is the Internet Resilience Company™. The top online retailers, Global2000, CDNs, cloud service providers, and xSPs in the world rely on Catchpoint to increase their resilience by catching any issues in the Internet Stack before they impact their business. Catchpoint’s Internet Performance Monitoring (IPM) suite offers synthetics, RUM, performance optimization, high fidelity data and flexible visualizations with advanced analytics.

A Closer Look at AlertBot's Failure Reporting Feature

The year was 1995. Michael Jordan returned to the NBA. Amazon sold its first book. Windows 95 unleashed the era of taskbars, long filenames, and the recycle bin. And when people weren’t dancing the Macarena, they were flocking to see Apollo 13 and hear Tom Hanks utter the phrase that would launch millions of (mostly annoying) impersonations: “Houston, we have a problem.”

What is Retrace Application Performance Monitoring (APM)?

Retrace is so much more than your average Application Performance Monitoring tool; and the more you know about the tools and capabilities at your disposal, the easier it is to keep application performance humming. In this Retrace overview video, Stackify by Netreo Customer Success Lead Kyle Jackson shares valuable insights into how and why Retrace works the way it does, and the powerful monitoring capabilities that help you improve the performance of your applications.

Best Practices for Monitoring Your Wi-Fi Network

The bulk of networked devices are now wireless. Most Wi-Fi manufacturers now offer some form of Wi-Fi monitoring, including cloud-based solutions, and often have great dashboards that require no setup at all. But if you don’t know it’s there, or how to take advantage of it, is it really any help? This quick guide is designed to help your team with best practices when it comes to Wi-Fi monitoring: how to understand and make the most of what is available to you.

Implementing a Cost-aware Cloud Networking Infrastructure

Cloud networking is the IT infrastructure necessary to host or interact with applications and services in public or private clouds, typically via the internet. It’s an umbrella term for the devices and strategies that connect all variations of on-premise, edge, and cloud-based services.

Support for Next.js Middleware and Edge Routes

Third-party JavaScript libraries provide developers with the tools they need to build modern web experiences, and a bit of cheatcode at times to not have to start from scratch. I mean, you don’t want to build an entire monitoring solution, so we help with Sentry’s Next.js SDK that only requires a couple of lines of code.

What Is Packet Loss: The Invisible Enemy of Network Performance

Network Metrics No matter the size or scope of your business, network connectivity and performance are essential for any business’ operations. From video conferencing and VoIP, to cloud-based applications and remote work, businesses rely heavily on the network to stay connected and productive. However, network performance can be severely impacted by packet loss, one of the most common issues that IT professionals face.

How to achieve distributed tracing using Application Insights?

In this blog, we will explore how distributed tracing works in Application Insights and how to use it to diagnose the issues in a distributed application. Azure Application Insights is a powerful tool for monitoring and diagnosing application performance issues and supports distributed tracing. It is an extension of Azure Monitor and provides Application Performance Monitoring features out of the box.

Streamline Endpoint Data with the Cribl Pack for SentinelOne Cloud Funnel

Cribl empowers you to take control of your observability, telemetry, and security data. Wherever your data originates from, wherever your data needs to go, and whatever format your data needs to be in, Cribl gives you the freedom and flexibility to make choices instead of compromises. Addressing visibility gaps by ingesting more data sources as the threat surface continues to expand has been a challenge.

Optimize SQL Server performance with Datadog Database Monitoring

Microsoft SQL Server is a popular relational database management system that provides a wide range of performance and reliability features (e.g., AlwaysOn availability groups) to support business-critical applications. As your SQL Server workloads scale and increase in complexity, it can be difficult to monitor all of their components and pinpoint the exact issues that are degrading your databases’ performance.

What I learned running a SaaS for a second year

Two years ago, OnlineOrNot started as a little toy app I built in an afternoon to see what it's like using the Next.js framework, to see if a URL is down from around the world. I gave myself a week to turn that toy into a SaaS people could pay for. It looked like this when it went live: It wasn't ready for real users, but that didn't matter. I had something out there, that people could sign-up for, tell me what they were expecting, and how OnlineOrNot fell short of their expectations.

OpenTelemetry on AWS, beyond instrumentation and into resource attributes

Instrumenting your code is essential to understanding your system’s performance and diagnosing issues as they arise. Traditionally, this was accomplished using proprietary vendor libraries, causing major lock-in. Enter OpenTelemetry. OpenTelemetry is an open-source project that provides a set of APIs, SDKs, and integrations for instrumenting code.

How Did Slack Perform a Year After the Acquisition by Salesforce

Slack is a popular messaging and communication platform used for work, study, groups, and communities. According to Techjury, more than 10 million people used Slack every day in 2022, and 43% of Fortune 100 businesses are paying for Slack subscriptions. On July 21, 2021, Salesforce completed its purchase of Slack for $27.7 billion and announced plans to make Slack an operating unit under Salesforce.

From 200 to 503: Understanding the Most Common HTTP Status Codes

When browsing the web, you may have come across error messages such as "404 Page Not Found" or "500 Internal Server Error." These error messages are HTTP statuses, which are an essential part of the internet's communication protocol. In this article, we'll cover the most common HTTP statuses and what they mean.

Introducing the Cribl Stream Reference Architecture

Join Ed Bailey and Eugene Katz as they unveil the first Cribl Stream Reference Architecture, designed to help observability admins achieve faster and more valuable stream deployment. In this live stream discussion, Ed and Eugene will explain the importance of a quality reference architecture in successful software deployment, and guide viewers on how to begin with the Cribl Stream Reference Architecture by first establishing end-state goals. They will also share different use cases and help viewers identify which parts of the reference architecture are applicable to their specific situation.

How to Create a Dashboard in Kibana

Wondering how to create a dashboard in Kibana to visualize and analyze your log data? In this blog post, we’ll provide a step-by-step explanation of how to create a dashboard in Kibana. You’ll learn how to use Kibana to query indexed application and event log data, filter query results to highlight the most critical and actionable information, build Kibana visualizations using your log data, and incorporate those visualizations into a Kibana dashboard.

The Best Graphite Dashboard Examples

Graphite provides time-series metrics in an open-source database. With Graphite dashboards, you can see key performance indicators (KPIs) as well as other metrics visually. Dashboards typically display data as graphs, charts, and tables and can be customized to meet the specific needs of an organization. Using dashboards, organizations can monitor and analyze various aspects of their performance, such as system utilization, application performance, and resource utilization, using web interfaces.

What Is SQL Performance Tuning?

As database administrators and developers, we need to know how to tune SQL queries and databases. Tuning SQL queries and databases is one of the most powerful tools in our arsenal for achieving the best possible performance results. This post will help you understand more about SQL tuning. I’ll start by explaining what SQL performance tuning is. Then, I’ll go over how to conduct SQL performance tuning in MySQL, Microsoft SQL Server, and Oracle.

Monitoring Loadmaster Performance with Flowmon NPMD

The Loadmaster Network Telemetry feature makes it easier than ever to get key insights on your applications into your Flowmon deployment. By creating both cluster-wide and application specific channels you can quickly build NPM dashboards and topologies that surface essential performance and availability metrics broken down by application, client and server.

What is Syslog and how does it work?

When you’re adding or subtracting fractions, you need to make sure that they have a common denominator, a number that allows you to compare values. In the same way, your IT environment needs a common “language” for your event log data. Your environment consists of various devices running different operating systems, software, and firmware.

VictoriaMetrics Long-Term Support (LTS): Commitment, Current and Next LTS Versions

Share: VictoriaMetrics is always improving, with frequent updates adding new features, performance improvements and bug fixes listed at the CHANGELOG page. We usually make at least a single release every month. All the new features and bug fixes go to the latest release. That’s why we recommend periodically upgrading VictoriaMetrics components to the latest available release. But the latest release may also contain bugs in the latest features.

How to Improve SD-WAN Visibility for Businesses

There’s a lot of talk about SD-WAN technology in the networking world, and so many businesses are making the switch to SD-WAN with promises of higher, more reliable performance. But, when they do, so many companies lack SD-WAN networks visibility to identify performance issues, and see if their SD-WAN service is actually performing as promised. Keep reading to learn how to get the SD-WAN visibility your company actually needs.

Iterating Our Way Toward a Service Map

For a long time at Honeycomb, we envisioned using the tracing data you send us to generate a service map. If you’re unfamiliar, a service map is a graph-like visualization of your system architecture that shows all of its components and dependencies. We didn’t want it to be a static service map, though—the kind you’d view once before going “huh, neat”—and then never looking at it again.

AI & Application Performance Monitoring Opportunities & Challenges

In today’s fast-paced world, app performance equals brand reputation. Customers expect apps that are fast, responsive and available 24/7. That’s where Application Performance Monitoring (APM) comes in. The technology enables businesses to ensure the best possible user experience by monitoring and managing the performance of their applications. But as applications become increasingly complex, identifying and resolving performance issues in real-time becomes increasingly difficult.

Business Resilience: How To Build Resilience Strategically, Tactically & Operationally

The ability to continue business operations for the foreseeable future is a key metric from a financial standpoint. But from a risk management perspective, all dimensions of an organization’s strategic and operational framework must be analyzed in order to… The last part relates to business resilience — and it’s what we’re going to explore here. (This article was written by Joseph Nduhiu. See more of Joseph’s contributions to Splunk Learn.)

Logic App Best practices, Tips, and Tricks: #21 Moving or organizing shapes inside Visual Studio Logic App designer

Today I’m going to speak about another important Best practice, Tips and Tricks that you need to consider while designing your business processes (Logic Apps): Moving or Organizing shapes inside Visual Studio Logic App designer.

Get better IT stack visibility with a monitoring maturity model

In order to better manage complex IT environments and comply with burgeoning regulations, financial services organizations must learn how to proactively manage disruptive internal and external issues and develop policies to prevent them happening again. The reason? The pace of change is rapidly increasing while the size of business-necessary components is shrinking.

5 key functions you need for your Kubernetes monitoring tool

As an open-source container management system, the Kubernetes (commonly referred to as K8s) platform’s constant growth warrants an intricate cluster network, making it challenging to gain system-wide visibility. Even a tiny disruption within the network could collapse the entire operation, resulting in the failure of dependent applications. Businesses that rely on such containerized applications may experience a huge impact in revenue.

Doing more with less in IT, yes this impacts Citrix teams too!

If you work in a technology role, then it won’t be any surprise to you when I say organizations have pivoted from specialist IT job functions, such as Citrix only roles, to making sure the infrastructure team can support many workloads. The adoption of cloud technologies has also created support for this approach because our infrastructure teams are doing less architecting and more application configuration and user work.

ScienceLogic Product Tour: Boost IT Productivity with Easy-to-Build IT Workflows

ScienceLogic’s SL1 PowerFlow is the heart of our IT workflow automation. SL1 PowerFlow Builder allows operations teams to easily create custom workflows to connect your teams, systems, and tools using low-code, no-code. And now you can see how to build those IT workflows with the SL1 PowerFlow Builder—without scheduling a live demo or free trial. The ScienceLogic Product Tours lets you experience first-hand how SL1 can help your organization boost IT productively with SL1 PowerFlow.

Best Wi-Fi Analyzer Tools - Free and Paid Versions

Every organization today relies on Wi-Fi for its business-critical functions, from small businesses with a dozen employees to international enterprises with locations across the globe. When your wireless network performance is slow and spotty, it can be more than frustrating — poor connectivity can bring your operations to a grinding halt. Using the right Wi-Fi analysis tools, you can analyze, monitor, and secure your wireless network – regardless of your size.

Why Clearco switched to Grafana Alerting, Grafana OnCall, and Grafana Incident

Working with technology means dealing with incidents or outages from time-to-time, so staying on top of problems is essential. Back in the spring of 2022, Clearco, the world’s largest e-commerce investor, had an alerting system set up to catch issues, except they had one problem: Clearco’s Customer Success team would learn of a problem before a notification even went off.

DX UIM 20.4 CU6: What's New and Why You Should Upgrade Now

For DX Unified Infrastructure Management (DX UIM) customers, upgrading to the latest release has significant benefits – and that’s especially true with the latest version, release 20.4 cumulative update 6 (DX UIM 20.4 CU6). It offers a significant number of enhancements and new capabilities. DX UIM 20.4 CU6 provides teams with several advantages, including enhanced accessibility, improved operational efficiency, and richer insights.

Symantec Edge SWG (formerly ProxySG) Performance Monitoring: Gain Full Observability with DX NetOps and AppNeta

For teams running secure web gateways (SWGs), also referred to as proxies, in today’s complex, dynamic network environments, extensive observability is a must have. Symantec offers a range of flexible deployment options for its SWGs, offering support for cloud, edge, and hybrid approaches. This blog explores a Broadcom solution that provides comprehensive observability for the Symantec edge offering, Symantec Edge SWG (formerly ProxySG).

Understand serverless function performance with Cold Start Tracing

Serverless developers are undoubtedly familiar with the challenge of cold starts, which describe spikes in latency caused by new function containers being initialized in response to increasing traffic. Though cold starts are usually rare in production deployments, it’s still important to understand their causes and how to mitigate their impact on your workload.

Best Practices for MongoDB Monitoring with Prometheus

The MongoDB document-oriented database is one of the most popular database tools available today. Developed as an open-source project, MongoDB is highly scalable and can be set up in your environment in just a few simple steps. When running and managing databases, monitoring is a key requirement.

Sponsored Post

How Hybrid Cloud Adoption Will Drive the Need for AIOps

Push came to shove with disruption and innovation during the pandemic. Enterprises started leveraging new technologies to gain competitive advantage, enter new markets, support employees from home and maintain business continuity. As a result, IT is faced with new challenges that are critical for realizing the value of the digital transformation that started and accelerated during the pandemic. To state the obvious, CIOs must manage an IT infrastructure that’s safe, sound and secure.

Preventing Outages in 2023

The outages span the giants of the Internet and some of the biggest failures of IT resilience we were subject to – from AWS’s trifecta of outages in December 2021 to the October ‘21 outage that took down Facebook, Instagram, WhatsApp, and interrelated services. We also look at some more intermittent outages that you may have missed.

Webinar Recap: Observability Data Orchestration

Today, businesses are generating more data than ever before. However, with this data explosion comes a new set of challenges, including increased complexity, higher costs, and difficulty extracting value. With this in mind, how can organizations effectively manage this data to extract value and solve the challenges of the modern data stack?

The Consolidation of Networking Tasks in Engineering

In recent years, the rapid development of cloud-based networking, network abstractions such as SD-WAN, and controller-based campus networking has meant that basic, day-to-day network operations have become easier for non-network engineers. The result we’re starting to see today is a sort of consolidation of networking tasks, leading to a need for only a small number of highly skilled network engineers to handle the less frequent heavy lifting of advanced design and troubleshooting.

Why Uptime Monitoring Is Essential for Your Business

Website downtime is a serious concern for businesses as it directly impacts the bottom line and can cause significant downstream effects as users turn to alternatives. You and your team can spend hundreds of hours to improve your websites, add new features, and create great content. If your website comes down at a critical moment, these efforts are wasted and users are left wondering about your business’ ability to function in the digital world.

"I can now sleep at night." How Corevist Achieved Single-Pane-of-Glass Observability

In October 2022, we released SolarWinds® Observability, our cloud-native SaaS observability solution. For companies like Corevist, the solution provided them with the ability to define customized monitoring specifically configured to the Corevist instances within each customer deployment. For them, this was a major game changer.

eG Enterprise rated the No. 1 APM Tool for Customer Experience by SoftwareReviews

The reviews are in! We are thrilled that eG Enterprise has been recognized by SoftwareReviews, a division of Info-Tech Research Group, as a Champion in the 2023 Application Performance Management – Enterprise (APM) Tools Emotional Footprint Buyer’s Guide. Moreover, eG Enterprise APM was ranked in the top spot out of 14 vendor solutions.

You can now add tags to your sites

Some of our users have a lot of sites in their Oh Dear account. A feature often requested is the ability to add a little bit of meta information about each site. To do that, we've introduced the ability to add tags to a site. Tags can be used for instance how important it is to fix a problem with the site immediately. Possible tags could be important or has support contract. Another use case would be to add the technology used, tags could be named wordpress, laravel, js.

Introducing Session Replay from Sentry: Bridge the Gap between Code and UX

You know that annoying bug? The one that doesn’t show up locally? And no matter how many times you try to recreate the environment you can’t reproduce it? You’ve gone through the breadcrumbs, read through the stack trace, and are now playing detective to piece together support tickets to make sure it’s real. To get to the root cause faster - without rolling your head on your keyboard - we built Session Replay, now generally available for all web-based platforms.

Empowering SecOps Admins: Getting the Most Value from CrowdStrike FDR Data with Cribl Stream

Join Ed Bailey and Sidd Shah as they discuss how Cribl Stream can empower Security Operations Admins to make the most of their CrowdStrike FDR data. During the discussion, Ed and Sidd will address the challenges faced by CrowdStrike customers who generate a vast amount of valuable data each day but struggle to leverage it fully due to complexity and size. They will explain how Cribl Stream can help SecOps admins extract the right data for their SIEM, while moving the rest to their Security Data Lake, enabling them to get the maximum value from their data and be cost-effective at the same time.

10 Best Apache Log Analyzers: Free & Paid Tools [2023 Comparison]

Apache is the second most popular web server, after …., with its roots and official release going back as far as 1995. Throughout the years, it gained features, including HTTP/2, caching, and many more, while retaining its most appreciated capabilities: speed, modularity, and great stability. To fully leverage its features, you need to understand the environment, bottlenecks, traffic and user behavior. Just like with every software inside your infrastructure, Apache is no different.

6 new features coming to ManageEngine CloudSpend in 2023

CloudSpend: Cloud cost management With the rising popularity of a hybrid cloud strategy, businesses need to keep their spending in check. To provide high availability for its data, more organizations have started to adopt a multi-cloud strategy, and they have choices for cloud platforms. AWS and Azure accounted for the lion’s share of the market in 2022. With their expertise in providing IaaS, this has also often resulted in disparate deployments with respect to cost estimation and control.

Breathing easy with Grafana dashboards and 3D printing

I lead the Grafana Loki project here at Grafana Labs, and I’ve always loved building things professionally and in my personal life, whether we’re talking about metalworking or coding — or, more recently, 3D printing. A couple years ago, I purchased my first 3D printer, a Prusa i3 MK3S+. I use it periodically to build functional items I use around my house in Upstate New York. For example, I recently decided to build a solar radiation shield for my outdoor weather station.

The Cloud Monitoring Journey

Monitoring is not a goal, but a path. Depending on the maturity of your project, it can be labeled in one of these six steps of the cloud monitoring journey. You will find best practices for all of them and examine what companies get from each one. From classic virtual machines to large Kubernetes clusters or even serverless architectures, companies have adopted the cloud as a mainstream way to provide their online services.

Querying InfluxDB IOx Using the New Flight SQL Plugin for Grafana

Grafana has been a staple visualization tool used alongside InfluxDB since its inception. With the release of InfluxDB Cloud powered by IOx, there is now a new way to integrate InfluxDB and Grafana: Flight SQL. Two of our engineers, Brett and Helen, have been working hard to create a new Grafana plugin called Flight SQL. This open-source plugin allows users to perform SQL queries directly against InfluxDB IOx and other storage engines compatible with Apache DataFusion.

Tackling the Security Budget in Times of Economic Uncertainty: IT and Security Leaders Prioritize Cybersecurity

In today’s economic climate, IT and security budget owners are always looking for ways to increase efficiency while controlling costs. With tighter budgets and increasing workloads, organizations have to find ways of stretching their limited resources while making sure investments are paying off.

How to Monitor MPLS Networks

If you manage an enterprise network, then you’ve definitely come across MPLS. Although many businesses rely on MPLS technology for large, high performing networks, they can suffer from network problems, like network congestion, that can impact user experience. Monitoring MPLS using a Network Monitoring tool is key to identifying and solving network issues that impact MPLS performance.

Observability vs Monitoring - The difference explained with an example

Observability vs monitoring has been a common topic in DevOps recently. There has been a lot of debate, and I have learned a lot from them when I started my observability journey. Most literature on observability is associated with a particular product or shares a textbook definition. In this blog post, I want to give you a practical understanding of observability and the differences between observability and monitoring with different scenarios and examples. We will cover the following topics here.

The Best OpenSearch Dashboard Examples

OpenSearch dashboards are a powerful tool for visualising and exploring data stored in an OpenSearch-compatible data store such as Elasticsearch. With OpenSearch's intuitive interface and advanced analytical tools, this visualisation tool makes it easy to gain insights into your data and monitor and alert upon key metrics. Throughout this article, we'll look at some of the most impressive OpenSearch dashboard examples that showcase it’s capabilities and versatility.

What is the ImagePullBackoff error in Kubernetes and how to fix it?

Like CrashLoopBackoff, the ImagePullBackoff is not an error but a waiting status you might see in your kubernetes pods, with the backoff time increasing after every retry. The error itself is "ErrImagePull", and it happens when there are issues when pulling the container image to the kubernetes node. So how do you solve these pull errors? Take a look at our video to get some ideas on how to resolve the various issues!

Trace-based testing with Elastic APM and Tracetest

This post was originally published on the Tracetest blog. Want to run trace-based tests with Elastic APM? Today is your lucky day. We're happy to announce that Tracetest now integrates with Elastic Observability APM. Check out this hands-on example of how Tracetest works with Elastic Observability APM and OpenTelemetry! Tracetest is a CNCF project aiming to provide a solution for deep integration and system testing by leveraging the rich data in distributed system traces.

Caching Database Queries in SQLAlchemy - Part 1/2

The database is one of the most critical components here at Rollbar and its performance ripples across most of our SLOs. One of our goals last months has been to remove unnecessary or repetitive load from it as much as possible. Caching queries is the 101 solution whenever you need to keep scaling up while maintaining, or even reducing, database costs (very relevant these days). Here’s an example of how just one query has been drastically reduced thanks to caching.

Ship high-quality, secure code faster with Datadog Code Analysis

As your engineering teams grow and commit code more frequently, it becomes increasingly difficult to release high-quality, secure code while achieving your desired development velocity. To create smoother developer workflows that ensure high standards for code quality and security, it’s critical for developers to detect and remediate issues earlier in the software development lifecycle— without switching tools or contexts.

Loki vs Prometheus - Differences, Use Cases, and Alternatives

Loki and Prometheus are both open source tools. While Loki is a log aggregation tool, Prometheus is a metrics monitoring tool. Loki’s design is inspired by Prometheus but for logs. This blog post compares the two common monitoring tools Loki vs Prometheus, to help you understand their key differences. Log management and metrics monitoring are critical aspects of monitoring a software system effectively.

Introducing AppSignal for Hanami

Struggling to keep track of key metrics and potential issues in your Hanami application? AppSignal integrates seamlessly with Hanami 2.0 actions to provide action-level insights, making it easier for you to ensure a smooth-running application. In this blog post, you'll learn about the integration between AppSignal and Hanami and how it solves your performance monitoring challenges.

Using Playwright to Monitor Third-Party Resources That Could Impact User Experience

Today’s web consists of lots of 3rd party resources. Let it be your fonts, transformed and optimized media assets, or analytics and ad scripts, many sites out there include resources that they don’t own. Your website probably has a lot of those dependencies, too! And while implementing third-party resources has downsides for performance and you should self-host your assets when possible, sometimes relying on external files is unavoidable.

Domain Agnostic vs Domain Centric vs Data-Centric AIOps - A Complete Guide for Beginners & Decision Makers

AIOps, or artificial intelligence for IT operations, uses AI and ML technologies alongside big data, data integration and automation to help make IT operations smarter and more predictive. AIOps has come around as a response to a pressing need for optimizing operations and minimizing risks to the IT infrastructure in the modern IT ecosystem.

How to "Live Tail" Kubernetes Logs

DevOps engineers wishing to troubleshoot Kubernetes applications can turn to log messages to pinpoint the cause of errors and their impact on the rest of the cluster. When troubleshooting a running application, engineers need real-time access to logs generated across multiple components. Collecting live streaming log data lets engineers: The challenge that engineers face is accessing comprehensive, live streams of Kubernetes log data.

Load testing Grafana k6: Peak, spike, and soak tests

With k6 Cloud, Grafana Labs is in the business of generating load — lots of load, distributed across a cluster of computers. So while our customers care about the systems they load, we care that our system can generate the load that they need and process the test metrics for them in an intuitive, explorable way.

The Storage Supply Chain and Its Effect on Infrastructure Teams

For the past couple of years, no one has been able to escape the effects of supply chain problems throughout their personal and professional lives. According to our recent State of Hybrid Cloud Storage survey, storage and the IT equipment that supports storage systems were no exception, and disruptions created extra work and headaches for those teams.

What is an SAP automation platform? The Pinewood Derby Story

My son, currently nine years old, is an active Cub Scout and we’re coming up on one of the most popular Cub Scout yearly events, the Pinewood Derby. For anyone not familiar with the Pinewood Derby, it is a competition where the scouts are given a small block of wood along with four little plastic wheels and are given the opportunity to design a car that will be raced against other scouts.

Predictions: AI and Automation

Artificial Intelligence (AI) - or more specifically Machine Learning (ML) - and automation were big topics for many of our customers in 2022. Common reasons for the interest in AI and automation were to: increase efficiency, reduce manual processing, minimise human error and - especially for the use of ML - identify ‘unknown unknowns’.

4 Azure Load Balancer Metrics to Monitor

An Azure Load Balancer is a Layer-4 (TCP, UDP) load balancer that provides high availability by distributing incoming traffic among healthy VMs. A load balancer health probe monitors a given port on each VM and only distributes traffic to an operational VM. Azure Load Balancers are frequently used in Azure Virtual Desktop (AVD) deployments. From our work with Azure Load Balancer, we think there are 4 key metrics and events you should proactively monitor and alert on.

Why Should You Care about SaaS Discovery and Management?

Overall, 59.77% of these customers said “All 3 of These Areas” are on the radar! Software as a Service (SaaS) has become a staple in the modern workplace. With the new last mile of the office network, businesses have had to adapt to a hybrid work-from-home model, making SaaS apps more critical than ever. In fact, Gartner predicts that the SaaS industry as a whole will see 16.8% growth in 2023.

Redis Metrics: An Introduction

Redis is a widely used in-memory database in the industry. As a consequence of its in-memory database, it can concurrently serve data as a key-value-oriented NoSQL database. Due to the use of in-memory data storage in Redis, you can achieve performance that is challenging with conventional databases. It is crucial to monitor Redis' resource usage since it is an in-memory data store.

Logging, Traces, and Metrics: What's the difference?

Several tech giants like Amazon and Netflix have jumped from their monolithic applications to microservices. This has allowed them to expand their business interface tremendously and improve their services. Not only them, but most businesses today are dependent on microservices. Twitter currently has about a thousand such services working together, releasing meaningful outputs.

Svelte vs. React: Which is Better for Performance?

You can hardly even talk about web development without mentioning JavaScript. Because of its popularity, JavaScript has given birth to several frameworks and libraries that developers can barely keep up with. This post will discuss two popular JavaScript front-end frameworks and libraries: Svelte vs React and their performance. We’ll determine if Svelte is faster than React or whether Svelte is better than React. We’ll also compare the two to determine which is better for performance.

Jaeger distributed tracing: Advanced visibility into your app flows

OpenTelemetry (OTel) and distributed tracing can be handy tools when you’re a developer facing what we call app flow blindness – or not being able to see your application flows and microservices components in a distributed cloud environment. In distributed environments, application flows are handled by various services and cloud entities which are generally siloed.

Take control of monitoring and responding to your production Frontend Javascript errors

We are very lucky on the Rollbar Customer Engineering Team because we get to work with many many development teams. Each team develops, tests, and deploys their applications in their own way. They have chosen different languages and frameworks to solve their particular problem. We learn from each team that we work with, and share these learnings to our Product Design team.

Harnessing the Power of Nexthink and Qualtrics

Today, enterprise IT teams are being asked to manage thousands of devices, hundreds of applications, dozens of networks, multiple clouds, and so much more. It’s critical for IT to have clear visibility over end user digital experience. Questions such as: are my fellow employees actively using the applications they have been provisioned, and if so, are they satisfied with the experience, are vital pieces of information for driving overall business efficiency.

Visualize real-time mobile app data with the Embrace data source plugin for Grafana

Customers choose Embrace to help mobile development teams build fantastic mobile app experiences. Embrace provides best-in-class mobile insights that help users prioritize, understand, and resolve issues. All of which leads to better development decisions. But in modern software engineering, no team exists in a vacuum.

Beginner's Guide to Prometheus Metrics

Over the past decade, Prometheus has become the most prominent open source monitoring tool in the world, allowing users to quickly and easily collect metrics on their systems and help identify issues in their cloud infrastructure and applications. Prometheus was originally developed by SoundCloud when the company felt their metrics and monitoring solutions weren’t meeting their needs.

Status pages can now be displayed in multiple languages

In addition to performing various checks to monitor your site, Oh Dear also offers beautiful status pages. Status pages can now use multiple languages. Using these status pages, you can inform your audience about the status of your service. Here's the beautiful Oh Dear powered status page of the Laravel team. Some of our users have a global, multi-lingual audience. That's why we now added support for a status page to be displayed in multiple languages.

Monitor code quality in Datadog with SonarQube

SonarQube is a tool for static code analysis that integrates with your existing CI pipelines to run quality checks on your codebase as it changes. As you develop and release new code, constant monitoring of code quality is crucial to ensure compliance, stability, and security. SonarQube’s Clean-As-You-Code philosophy helps to avoid technical debt by running regular code checks and alerting you to any problems early on.

Exploring Azure Kubernetes Service and its monitoring capabilities

What is Azure Kubernetes Service? Azure Kubernetes Service (AKS) is a fully managed open-source container orchestration service offered by Azure. Provisioning, scaling, and upgrading resources can be done without causing downtime with Azure Kubernetes Service. Container orchestration in Kubernetes enables deployed application components to be isolated in distinct containers that can scale independently. A cluster made up of these containers functions as a microservices-based software product.

Watch: How to pair Grafana Faro and Grafana k6 for frontend observability

Grafana Faro and xk6-browser are both new tools within the Grafana Labs open source ecosystem, but the pairing is already showing a lot of potential in terms of frontend monitoring and performance testing. Faro, which was announced last November, includes a highly configurable SDK that instruments web apps to capture observability signals that can then be correlated with backend and infrastructure data.

How to Use Time-Stamped Data to Reduce Network Downtime

Telecommunication organizations need to ensure they have the necessary resources and technology to maintain service uptime SLAs. Increased regulations and emerging technologies forced telecommunications companies to evolve quickly in recent years. These organizations’ engineers and site reliability engineering (SRE) teams must use technology to improve performance, reliability and service uptime.

10 Best Log Management Tools

Logs are imperative for troubleshooting, performance analysis, health monitoring, and application integrity and security. Log management tools clearly understand how users interact with apps and systems and provide insight into improving software reliability, increasing productivity, reducing risks, and ultimately improving the user experience. Through log management tools, users can further integrate and enrich all of their logs, making queries quicker and more effective.

DX NetOps Flow Management: Modernized Deployment and Visualization

On any given day, network administrators have to contend with significant challenges. They often struggle with key questions: How do I ensure I’m spotting traffic anomalies? How do I use our resources most efficiently? How can I intelligently plan for network capacity upgrades? By employing network flow monitoring, administrators can gain accurate insights into these topics.

Monitor your Argo CD clusters with Datadog

Argo CD is a declarative continuous delivery tool for Kubernetes developed by the Cloud Native Computing Foundation (CNCF). Argo CD automates your application deployment by continuously monitoring the live state of your containers and comparing it against the desired state in your Kubernetes manifest files, then pulling changes into your Kubernetes clusters as needed.

Introducing Netdata Paid Subscriptions

Read more about Netdata introducing paid subscriptions. All Netdata functionality is and will be available for free forever in the Community Plan. Paid tiers include features targeted for businesses and users who would need to customise their monitoring solution with different levels of user access, other notification mechanisms, etc.

Can ChatGPT speed up software error resolution?

One of the hardest tasks for software engineers is often having to stop what you are doing and look into a software bug (error), find the root cause and fix it quickly. This is hard because you may have never seen the affected code (someone else wrote it), it could be code you wrote a long time ago or just the context switching from what you are working on right now.

It's time for government to move beyond monitoring and into observability

When thinking about holistic end-to-end observability, it can help to start with what you already have. Many government agencies are already strategically ingesting and storing logs — a key component of observability. More than a year and a half after the release of M-21-31, US government agencies continue to work through the logging maturity models outlined in the memorandum.

The best Elasticsearch training and support available.

Sematext offers professional-level consulting, production support, training, and monitoring tools for your elasticsearch cluster. With over 10 years of experience in the field, Sematext has worked with some of the largest companies in the world to help optimize their Elasticsearch setup. When you work with Sematext, you get expertise that comes straight from the source.

Agent vs. Agentless Monitoring

Network monitoring is a critical aspect of managing and maintaining the performance and security of a network. It includes monitoring and analyzing network traffic, devices, and systems to identify potential issues and ensure that the network operates efficiently and effectively. Network monitoring can help organizations identify and prevent security breaches, identify and troubleshoot performance issues, and ensure compliance with industry regulations and standards.

How Kizen Reduced Production Challenges While Saving 20% in Engineering Hours With Synthetic Monitoring

Using Checkly’s Playwright Test and GitHub sync integration helped Kizen optimize testing and monitoring workflows Kizen is a no-code, enterprise-grade Predictive Innovation Engine that enables sales, marketing, and operations teams to save time and drive higher revenues and profitability. Their product portfolio includes a flexible customer relationship manager (ƒCRM), operations cloud, automation engine, and a predictive data platform.

Why Culture and Architecture Matter with Data, Part I

We are using data wrong. In today’s data-driven world, we have learned to store data. Our data storage capabilities have grown exponentially over the decades, and everyone can now store petabytes of data. Let’s all collectively pat ourselves on the back. We have won the war on storing data! Congratulations!

Correlate Metrics, Traces, & Logs in a Single View With Circonus Unified Dashboards

× As organizations shift to service-centric environments, they are generating substantially more data. This in turn has placed strains on monitoring and observability teams, who now must sift through an abundance of data in order to identify and resolve issues — a challenge exacerbated by the number of various monitoring tools they’ve implemented over the years.

Complete Guide on Docker Logs [All access methods included]

Docker logs play a critical role in the management and maintenance of containerized applications. They provide valuable information about the performance and behavior of containers, allowing developers and administrators to troubleshoot issues, monitor resource usage, and optimize application performance. By capturing and analyzing log data, organizations can improve the reliability, security, and efficiency of their containerized environments.

Releasing Icinga Director Branches v1.3: Enhanced Branching for Better Configuration Management

If you’re an Icinga user, you already know the importance of having a streamlined and efficient configuration management process. With the release of Icinga Director Branches v1.3, this process just got even better! In this latest version, Icinga Director introduces several enhancements to its branching system, making it even easier to manage and deploy your Icinga 2 configurations.

Gathering, Understanding, and Using Traffic Telemetry for Network Observability

Traffic telemetry is the data collected from network devices and used for analysis. With traffic telemetry, engineers can gain real-time visibility into traffic patterns, correlate events, and make predictions of future traffic patterns. As a critical input to a network observability platform, this data can help monitor and optimize network performance, troubleshoot issues, and detect security threats. However, traffic telemetry can be difficult to understand.

The Future of Tech: Exploring AI/ML and ChatGPT

You don’t often see real change, but when you do see it you know it. Artificial Intelligence/Machine Learning toolsets like ChatGPT are finally starting to offer broad capabilities that will benefit a mass audience. These tools are moving out of the domain of data scientists and math nerds and into mass markets with a little bit for everyone. The potential reach is awesome and a little scary.

Experiment: Migrating OpenTracing-based application in Go to use the OpenTelemetry SDK

Jaeger’s HotROD demo has been around for a few years. It was written with OpenTracing-based instrumentation, including a couple of OSS libraries for HTTP and gRPC middleware, and used Jaeger’s native SDK for Go, jaeger-client-go. The latter was deprecated in 2022, so we had a choice to either convert all of the HotROD app’s instrumentation to OpenTelemetry, or try the OpenTracing-bridge, which is a required part of every OpenTelemetry API / SDK.

Inside ObservabilityCon: 'I picked up so much practical information'

I’ve always been wary about vendor events. In my experience, many of them are mostly marketing pitches, with little or no content that is applicable to my use cases. Despite that, last year I decided to convince my manager to let me attend ObservabilityCON 2022 to see what I could learn from it. My hope was that I would be able to get practical knowledge that could be applied as soon as I got back to work. (Spoiler alert: I did!)

How we reduced flaky tests using Grafana, Prometheus, Grafana Loki, and Drone CI

Flaky tests are a problem that are found in almost every codebase. By definition, a flaky test is a test that both succeeds and fails without any changes to the code. For example, a flaky test may pass when someone runs it locally, but then fails on continuous integration (CI). Another example is that a flaky test may pass on CI, but when someone pushes a commit that hasn’t touched anything related to the flaky test, the test then fails.

What is HAProxy, and what is it used for?

In December 2022, the latest version of HAProxy, 2.7.0, was released. This open-source software is both a proxy and a load balancer, and is immensely popular due to the sheer volume of features it provides to help reduce or even avoid downtime and manage web traffic. Website or application downtime is disastrous for businesses. You want to serve as many users as possible, but if you have nothing in place to manage traffic, then your web applications can quickly become overwhelmed and fail.

How MSPs can Capitalize on the Rush for Localization of IT Services

“If you are a managed service provider (MSP) in Switzerland, your customers are very likely Swiss customers, so they speak the local language, and they want their data to be in Switzerland, and they want your service provider to comply with all the Swiss regulations. And the same is true in 190 other countries, especially after last year,”

15 Best Tools to Test and Measure Core Web Vitals [2023 Comparison]

User experience is key to ensuring the success of your website. There are many metrics that help you gauge and improve it, but Core Web Vitals are probably the most important ones. They are a set of real-world, user-centered metrics that quantify key aspects of the user experience. By measuring dimensions of web usability such as load time, interactivity, and the stability of content as it loads, Core Web Vitals help you understand how your website is doing in terms of performance.

InfluxData Closes Series E Round and Raises $81 Million in Capital

Today is an important milestone for InfluxData. I am thrilled to share that we closed our $51 million Series E round, and a $30 million debt facility, raising $81 million in capital. The Series E round was led by our new investors Princeville Capital and Citi Ventures, with participation from our existing investors Battery Ventures, Mayfield Fund, Sapphire Ventures, Trinity Ventures, Norwest Venture Partners, Sorenson Capital, and Harmony Partners.

Guide on Structured Logs [Best Practices included]

Structured logging is the method of having a consistent log format for your application logs so that they can be easily searched and analyzed. Having structured logs allows for more efficient searching, filtering, and aggregation of log data. It enables users to extract meaningful information from log data easily. Logging is an essential aspect of system administration and monitoring. Logging allows you to record information data about the application's activity.

IT In Motion: Security, Cloud, and the Future of AI

In this week's episode, we are joined by Jo Peterson, Vice President for Cloud and Security Services at Clarify360, as we discover how AIOps is being utilized for better cloud operations through contextual data ingestion, data-driven decision-making, and proactive, test-driven operations in production. Tune in to gain valuable insights and see where you stand on the AIOps spectrum.

What's new in Avantra 23.1

It only feels like yesterday that I was writing about the new features in Avantra 23.0, but today I’m pleased to announce that Avantra 23.1, our first feature update of Avantra 23, is ready for download. We’ve been working on a number of different areas that I’ll try to summarize here, but I apologize for the length of this post! For those who want a complete list of changes, check out our public release notes.

An Introduction to Application Monitoring

Users prefer an application that runs smoothly and without bugs to one that may have an appealing UI and shiny new features but comes with issues. Application monitoring is critical to the health of your application. With application monitoring, you can stay on top of any errors and ensure your application performs as it should. In this article, we'll cover: Let’s dive straight in!

Prometheus Alertmanager best practices

Have you ever fallen asleep to the sounds of your on-call team in a Zoom call? If you’ve had the misfortune to sympathize with this experience, you likely understand the problem of Alert Fatigue firsthand. During an active incident, it can be exhausting to tease the upstream root cause from downstream noise while you’re context switching between your terminal and your alerts. This is where Alertmanager comes in, providing a way to mitigate each of the problems related to Alert Fatigue.

How Security Engineers Use Observability Pipelines

In data management, numerous roles rely on and regularly use telemetry data. The security engineer is one of these roles. Security engineers are the vigilant sentries, working diligently to identify and address vulnerabilities in the software applications and systems we use and enjoy today. Whether it’s by building an entirely new system or applying current best practices to enhance an existing one, security engineers ensure that your systems and data are always protected.

Data Gravity in Cloud Networks: Achieving Escape Velocity

In an ideal world, organizations can establish a single, citadel-like data center that accumulates data and hosts their applications and all associated services, all while enjoying a customer base that is also geographically close. As this data grows in mass and gravity, it’s okay because all the new services, applications, and customers will continue to be just as close to the data. This is the “have your cake and eat it too” scenario for a scaling business’s IT.

What is WMI Provider Host?

Windows Instrument Management (WMI) Provider Host — or WmiPrvSE.exe — is a legitimate and essential component for keeping your computer’s various applications and systems running effectively. This process is part of the Microsoft Windows operating system. Microsoft built WMI management tools into each Windows version starting with NT 3.1.

Democratizing Machine Data & Logs: How Infor saves millions by leveraging Sumo Logic's data-tiering

Infor developed a decentralized governance model for managing the vast Sumo landscape. A landscape with thousands of users, tens of thousands of Collectors, petabytes of log ingestion. By democratizing log management, we implemented a decentralized governance model for our Sumo account. Consequently,, we succeeded in doubling our log ingestion year-over-year, while reducing our log ingestion cost by more than 50%.

Profiling Beta for Python and Node.js

A couple months ago, we launched Profiling in alpha for users on Python and Node.js SDKs — today, we’re moving Profiling for Python and Node.js to beta. Profiling is free to use while in beta — more updates to come when we near GA. Profiling is a critical tool for helping catch performance bottlenecks in your code. Sentry’s profiler gets you down to the exact file/line number in your code that is causing a slow-running query.

Goliath Technologies Launches Intelligent Cloud Monitoring Solution

Philadelphia, PA – February 8, 2023 –Goliath Technologies, a leader in end-user experience monitoring and troubleshooting software, has today announced the launch of Goliath Cloud Monitor – a new cloud monitoring platform that provides centralized monitoring of AWS and Azure cloud environments from a single location. With 75% of organizations worldwide implementing a multi-cloud strategy, the market demand for effective cloud monitoring solutions has never been greater.

Winston Logger - Full tutorial with a sample Nodejs application

Winston Logger is one of the most popular logging libraries for Node.js. It is designed to be a simple and universal logging library supporting multiple modes of transport. A transport is essentially a storage device for the logs. Each logger can have multiple modes of transport configured at different levels. For example, one may want error logs stored in a database, but all logs output to the console or a local file. Some of the features of Winston logger are.

How Are You Making Storage Placement Decisions-and Does It Matter?

According to Virtana’s recent State of Hybrid Cloud Storage survey, most organizations have a little over half of their storage in the cloud, keeping the rest on premises. But how are they deciding what storage goes where? Is there such a thing as a wrong—or even just sub-optimal—storage placement decision? We dug into the data to answer these questions.

Enforce Quotas on Data Ingestion with Redis

Recently, a customer brought me a challenging use case: They were looking to enforce quotas on their internal customers, i.e. other teams in the organization. The analytics team provides services such as searching and reporting capabilities to those other teams, which subscribe to the services through a chargeback model. Each team that subscribes is supposed to limit its ingestion of data to a quota: a maximum permitted ingest per 24-hour period.

SAST vs. DAST

Neglecting security is a rookie mistake. However, DevOps teams struggle to make it a priority in the quest to be continuously faster. Protecting your app from the ground up is challenging, so you need the right tools to improve your debugging process in development and production. To enhance security testing, developers can use SAST tools, which analyze program source code to identify security vulnerabilities, and DAST tools, which come up in later development phases in a running application.

Cisco introduces full-stack observability enhancement: Business Risk Observability

NOW AVAILABLE through Cisco Secure Application, on the Cisco AppDynamics SaaS platform, new security capabilities combine attack mapping and a business risk score for business transactions to help organizations prioritize responses based on likely impact on the business and users.

Profiling 101: What is profiling?

The performance of your app matters. From ensuring a good user experience to retaining users, performance makes a difference in your app’s success. Using the right tools can make it easier to ensure your code is meeting your performance goals, before you have to switch to a bigger EC2 instance or users start complaining. One of the best tools in a developer’s toolbox for ensuring good performance is profiling.

Is Kubernetes Monitoring Flawed?

Kubernetes has come a long way, but the current state of Kubernetes open source monitoring is in need of improvement. This is in part due to the issues related to an unnecessary volume of data related to that monitoring. For example, a 3-node Kubernetes cluster with Prometheus will ship around 40,000 active series by default. Do we really need all that data?

Connecting OpenTelemetry to AWS Fargate

OpenTelemetry is an open-source observability framework that provides a vendor-neutral and language-agnostic way to collect and analyze telemetry data. This tutorial will show you how to integrate OpenTelemetry with Amazon AWS Fargate, a container orchestration service that allows you to run and scale containerized applications without managing the underlying infrastructure.

Root cause log analysis with Elastic Observability and machine learning

With more and more applications moving to the cloud, an increasing amount of telemetry data (logs, metrics, traces) is being collected, which can help improve application performance, operational efficiencies, and business KPIs. However, analyzing this data is extremely tedious and time consuming given the tremendous amounts of data being generated. Traditional methods of alerting and simple pattern matching (visual or simple searching etc) are not sufficient for IT Operations teams and SREs.

AppSignal for Elixir Now Supports Oban

If you're using Oban for managing background jobs in your Elixir application and want to gain a deeper data-driven understanding of how they perform, you've come to the right place. AppSignal for Elixir now automatically instruments Oban, meaning you can now monitor the performance of your background jobs through an AppSignal Magic Dashboard, which gives you detailed information on queue times, processing times, and notifies you of any exceptions.

Get to know TraceQL: A powerful new query language for distributed tracing

At Grafana Labs, we love tracing, which is why we’ve been hard at work on Grafana Tempo, an open source, highly scalable distributed tracing backend. Tempo just had its 2.0 release. In conjunction with that release, we are excited to show off TraceQL — a powerful new query language designed for distributed tracing. In this blog, we’ll provide an overview of why we created TraceQL, how it works, how you can put it to use today, and what we have planned for future iterations.

Release 1.38.0 - DBENGINE v2, Functions, Events, Notifications, Role Based Access, and much more!

The Netdata team is very excited to introduce you to all the new features and improvements in the new version. HIGHLIGHTS: DBENGINE v2 The new open-source database engine for Netdata Agents, offering huge performance, scalability and stability improvements, with a fraction of memory footprint! FUNCTION: Processes Netdata beyond metrics! We added the ability for runtime functions, that can be implemented by any data collection plugin, to offer unlimited visibility to anything, even not-metrics, that can be valuable while troubleshooting.

Running API and Browser Checks Using Terraform, AWS, and Checkly Private Locations

When adding new Checks in Checkly a number of locations are available to check your endpoints from multiple locations around the world. For most use cases this is more than enough to ensure your resources are online. However, these locations are outside of your network and are unable to check on resources deployed more securely inside your private network.

Profiling 101: Why profiling?

This is part 2 of a 3-part series on profiling. If you’re not yet familiar with the what profiling is, check out the first part in our series. By this point, you’re probably already convinced that good performance is important for your app’s success. There are many tools available for performance, but profiling in production with a modern profiling tool is one of the easiest and most effective ways to get a full understanding of your app’s performance.

A Snapshot of our IT Ops Predictions for 2023

Today executives and customers expect IT and digital services to be available and performant at all times; compromised availability or performance is no longer tolerable. Think about it; when was the last time a digital service was unavailable and it didn’t make the news or social media? When was the last time you visited a website that was unavailable and you waited for the outage to be over, rather than finding an alternative in the moment?

Communicating Context Across Splunk Products With Splunk Observability Events

When an IT or Security issue impacts a development team’s software how are they notified? Is your organization still relying on mass emails that lack context and most engineers have probably already filtered out of their inbox? Communicating between siloed tools and teams can be difficult. How would you like to put IT, Security, legacy processes, and business notifications specific to development teams right into one of their most important tools? Now you can!

The 2023 Network IT Management Report Part 2: Confidence and Automation

This is the second in a four-part series focusing on the findings from our 2023 annual Field Report for IT Management. We surveyed 4500 IT professionals from internal IT teams and MSPs across North America to gauge where their organizations are heading from a network management perspective. In part two, we’ll discuss the confidence IT pros have in their networks and how they can use automation to do more with less. You can read the full 2023 field report and compare your own IT statistics here.

Why You Need a Centralized Approach to Monitoring

With a standard model for monitoring data across the organization, different teams can use a common infrastructure and extract maximum value from it. Monitoring (also sometimes referred to as observability) involves collecting and analyzing data from a source over time to track its health and/or performance. Because change occurs over time, virtually all monitoring data is time series data, meaning it has a timestamp.

Two sides of the same coin: Uniting testing and monitoring with Synthetic Monitoring

Historically, software development and SRE have worked in silos with different cultural perspectives and priorities. The goal of DevOps is to establish common and complementary practices across software development and operations. Sadly, in some organizations true collaboration is rare and we still have a way to go to build effective DevOps partnerships.

In a Toxic Relationship with Your Current Observability Search Tool? There's Other Fish in the Sea

IT tools are similar to romantic relationships. Over time, you tend to fall into the same old dull routines, like Rupert Holme’s song Escape (The Piña Colada Song). That routine — collect dataset, route, ingest ($$) and then search, collect dataset, route, ingest, then search, … this approach is not only breaking your heart but your budget too.

Advanced filtering capabilities, Logs performance benchmark, and front page of HN - SigNal 21

Welcome to our first monthly product newsletter of 2023, SigNal - 21! Last month, we worked closely with our users to ship some advanced features which will enable our users to take advantage of their observability data more effectively. We were also trending on the front page of hacker news and got featured as one of the fastest-growing open source startups. Let’s dive in to see what humans at SigNoz were up to in the month of January 2023.

February 2023: Notifications-only sub-users and 30 seconds monitor interval

As we have previously announced: Notifications-only sub-users are here! After receiving your feedback, we realized that full-featured sub-users may not be the ideal option for everyone. This new feature is ideal for our UptimeRobot users who want to get alerted teammates or clients but don’t want to share access to account or monitors. You can manage your team mates and their access on the Team page. Here you can add / edit / remove seats with write, read, or notify-only access.

Get the Big Picture: Learn How to Visually Debug Your Systems with Service Map-Now Available in Sandbox

Honeycomb recently announced the launch of Service Map, a new feature that gives users the ability to quickly unravel and make sense of the interconnectivity between services in highly complex and intricate environments.

Dashboard Fridays: Sample Jira Dashboard

These Jira dashboards give a clear overview of your Jira instance and provide more details on the key items over which the engineering team needs oversight, like build status, critical bugs, and costs. Creating Jira dashboards in SquaredUp means Engineering Management doesn’t have the additional work of collating all the detailed Jira data to make sense of it from a high level. It also enables Release Teams to more easily consume data surfaced from all the engineering teams, while still being able to drill into the details of each dashboard as needed.

Release 1.38.0: Dramatic performance and stability improvements, with a smaller agent footprint

We completely reworked our custom-made, time series database (dbengine), resulting in stunning improvements to performance, scalability, and stability, while at the same time significantly reducing the agent memory requirements. On production-grade hardware (e.g. 48 threads, 32GB ram) Netdata Agent Parents can easily collect 2 million points/second while servicing data queries for 10 million points / second, and running ML training and Health querying 1 million points / second each!

Troubleshooting Teams The Right Way

VIPs almost never open tickets, but you’ll know when they’ve got a problem with Microsoft Teams. Even when the rest of your team logs tickets for troubleshooting Teams issues there’s a good chance they won’t give you enough information about the problem – particularly if its hardware related. These are traditional challenges for IT departments trying to maintain Teams services. Luckily, we have the solution.

Variable Chaining in Dashboards Demo - SigNoz

With variable chaining, you have more granular control over charts in the Dashboards tab of SigNoz. It lets you filter specific views based on your use cases quickly. More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator. SigNoz helps developers monitor applications and troubleshoot problems in their deployed applications. SigNoz uses distributed tracing to gain visibility into your software stack.

Weathering the MS Teams Outage with Zero Impact

While innumerable companies suffered due to the outage, it was business as usual for Nexthink Customers. On January 25th, 2023, millions of unsuspecting employees across the globe were met with an unexpected roadblock during a regular work day. Microsoft Teams, used by over 280 million users worldwide for video meetings, messaging, and collaboration, was officially declared down.

The Developer Obsession With Code Names - 186 Interesting Examples

Code names can be about secrecy, but when it comes to software development, it’s usually not so much about secrecy as it is about the convenience of having a name for a specific version of the software. It can be very practical to have a unique identifier for a project to get everyone on the same page and avoid confusion. It can also be a great way to build excitement and cohesion in a development team. And we want to name our darlings, don’t we?

Exploring Your Network Data with Kentik

In this short video overview, Kentik’s Phil Gervasi explains how the Kentik Data Explorer lets network engineers (including network systems, cloud security and SREs) ask any question about their networks and explore the tremendous amount and variety of network telemetry being collected. Phil talks about the various types of network telemetry you can explore, the many dimensions of that data that can be filtered and analyzed, and gives a quick tour of the Data Explorer interface.

Cribl's Zachary Kilpatrick Awarded 2023 Channel Chief Award from CRN for Second Consecutive Year

The Cribl Partner Program is designed to be a comprehensive solution for organizations looking to grow their customer relationships and revenue streams, while also enabling a fast deployment of observability solutions to serve customers. Our partners receive extensive training, tools, and support to unlock the full potential of observability data for their customers.

Apache Tomcat Logging Configuration: How to View and Analyze Log Files

Apache Tomcat is the Java web server that implements many Java features like web site APIs, Java server pages, Java Servlets, etc. It’s an open-source software widely used in the industry. Tomcat sits on top of your application and is the entry point for reaching your application code. It is crucial to monitor its performance and make sure everything works, get notified when unexpected errors occur, and take action in real-time.

Testing Microservices - Trace Based Integration Testing Example

As engineering organizations transitioned from monolith to microservices architectures, they sought to make their development efforts more scalable and manageable. The microservices paradigm promised to increase development speed, reduce MTTR and improve quality while cutting down on maintenance costs. However, in real life, there are inherent quality caveats when it comes to developing microservices.

Our API tokens can now be scoped by site or status page

Oh Dear has an extensive API that powers various powerful integrations. To use the API, you first need to create an API Token in the Oh Dear UI. Previously, such a token could be used to make API calls to any site or status page in your Oh Dear account. We noticed that some of our users are agencies that use Oh Dear to monitor their clients' sites.

Using OOP concepts to write high-performance Java code (2023)

Java is a class-based object-oriented programming (OOP) language built around the concept of objects. OOP concepts are intended to improve code readability and reusability by defining how to structure your Java program efficiently. There are seven core principles of object-oriented programming, as follows.
Sponsored Post

2023 Trends in AIOps, Observability, and ITOps

Hall of Fame professional wrestler Paul "Mr. Wonderful" Orndorff said of his ascent to become one of the WWE's biggest stars of the 1980s that, "I knew where I wanted to go. I had a plan. I don't care what you do in life, you better have a plan." And for ITOps teams making new plan-or adjusting the sails on their existing ones-it's good to have insights that inform those plans.

Correlate Datadog RUM events with traces from OTel-instrumented applications

OpenTelemetry (OTel) is an open source, vendor-neutral observability framework that supplies APIs, SDKs, and tools for the instrumentation of cloud-native applications and services. OTel enables you to collect metrics, logs, and traces from a variety of sources and route them to various backends. By itself, however, it can’t help you analyze this data or correlate telemetry from different parts of your stack.

How to install the Site24x7 APM Insight Node.js agent

This video will walk you through the process of installing the Site24x7 APM Insight Node.js agent. The APM Insight Node.js agent automatically instruments supported frameworks (like Express, Koa, and Hapi) and records interesting events, like HTTP requests, database queries, errors, exceptions, web API calls, and remote calls. This installation method works in both Linux and Windows environments. Related notes and links.

How to verify the license key for a .NET application

Learn how to verify the license key when the.NET monitor fails to be created in the Site24x7 web client. About Site24x7 Site24x7 offers unified cloud monitoring for DevOps and ITOps. Monitor the experience of real users accessing websites and applications from desktop and mobile devices. Site24x7's in-depth monitoring capabilities enable DevOps teams to monitor and troubleshoot applications, servers, and network infrastructures (including private and public clouds). End-user experience monitoring is done from over 90 locations across the world and various wireless carriers.

Digging Into the Recent Azure Outage

In the early hours of Wednesday, January 25, Microsoft’s public cloud suffered a major outage that disrupted their cloud-based services and popular applications such as Sharepoint, Teams, and Office 365. Microsoft has since blamed the outage on a flawed router command which took down a significant portion of the cloud’s connectivity beginning at 07:09 UTC.

Grafana documentation: A look at the new and improved design

We recently launched a new design for our technical documentation. The goal of the redesign was to make our technical documentation more accessible, modern, and scalable as we grow. In addition to a new look (hello, new typeface and layout!), our updated docs pages reveal the underlying work our team has done to evolve and enhance our technical documentation.

Profiling: Buzzword or Critical Observability Tool? | Snack of the Week

Profiling may seem like the latest buzzword in the monitoring and observability world, but profiling tools have actually been in use for decades. I’m going to quickly explain what profiling is and why modern profilers are getting so much attention lately.

Getting More Web Traffic? How to Prepare Your Site for Large Scale Traffic

It can be challenging to understand the specific factors that can cause traffic to surge on your website. Sometimes there are temporary spikes, usually related to sales or special events, and other times this increase in traffic can be permanent. Website owners should be ready to handle large volumes of visitors as any breakdown, no matter how short-lived, can cause you to lose potential business.

Sponsored Post

How to Choose the Right IT Ops Metrics

Traditionally, we consider IT to be managing and monitoring on-premises network infrastructure, including hardware and software. However, the reality is that most enterprises have accepted and migrated much of their infrastructure to the cloud already. They recognize the benefits of the cloud and that it is here for the long haul. According to the latest study from Deloitte, 90% of organizations have been using cloud services for the last three years, and 79% are hosting workloads with multiple cloud providers. In addition, adopting cloud computing platforms has accelerated significantly in the remote work era.

How to set up Golang application performance monitoring with open source monitoring tool - SigNoz

In this article, learn how to setup application monitoring for Golang apps using an open-source solution, SigNoz. If you want to check our Github repo before diving in 👇 Scalability, Reliability, Maintainability... The list goes on for the benefits of microservices architecture in today's world. But along with these benefits also comes the challenges of complexity.

The role of APM and distributed tracing in observability

Application performance management (APM) and distributed tracing are practices that many teams have been using for years to help detect and mitigate performance issues within applications – while the first one was born in the era of big single-host monoliths, the latter is especially useful for distributed applications that use a microservices architecture, in which tracing is critical for pinpointing the source of performance issues.

Supporting Key Business Applications in the Cloud is Challenging: A Real-World Case Study

These days, many IT executives believe that it is easier to deploy applications in the cloud than on-prem. They are also often under the misconception that once an application is hosted in the cloud, it is the responsibility of the cloud service provider to maintain the availability and performance of the application.

6 Real-World Status Page Examples: And What You Can Learn From Them

A status page is the most effective way to stay in touch with your users and quickly inform them about any outages or ongoing maintenance. As explained in our previous article, status pages can offer many benefits such as cost savings and a reduced number of support tickets. Creating a status page can significantly improve your incident management and relationships with your customers.

Cloud Providers Health Report - January 2023

Check our January 2023 health report on the top most popular cloud providers. We analyze the health of the cloud providers based on the number of outages and problems during the month. The source of the data is made available by the cloud providers themselves via their status page. We normalize it and use it to generate the report.

Extending Netdata's anomaly detection training window

We have been busy at work under the hood of the Netdata agent to introduce new capabilities that let you extend the "training window" used by Netdata's native anomaly detection capabilities. This blog post will discuss one of these improvements to help you reduce "false positives" by essentially extending the training window by using the new (beautifully named) number of models per dimension configuration parameter.

Cyber Resilience: The Key to Security in an Unpredictable World

Join Ed Bailey and Jackie McGuire as they delve into the topic of cyber resilience and its growing significance in today's digital landscape. In this informative video, you will learn what cyber resilience means, why it's important, and how to manage and improve it in an increasingly unpredictable world. With cyber threats becoming more sophisticated and frequent, cyber resilience has become a critical aspect of protecting personal and business assets. This discussion is perfect for anyone looking to better understand the importance of cyber resilience and how to safeguard against potential threats.

Protect Data from Ransomware with Flowmon & Superna

With the rapid rise of ransomware in the recent years, protecting your data is now crucial than ever. Combination of Flowmon' Anomaly Detection System (Flowmon ADS) to provide early warning about upcoming attacks and Superna's capability of taking proactive snapshot of data is a powerful combo to protect your data before any exfiltration attempt.

How Grafana Labs uses and contributes to OpenCost, the open source project for real-time cost monitoring in Kubernetes

While more and more teams are adopting Kubernetes as their standard container orchestration technology, cost insight is lacking. Teams often don’t know how much they’re spending, where in their organization they are spending, or what is driving their infrastructure cost increases. OpenCost helps alleviate this problem by bringing real-time cost monitoring to Kubernetes workloads with a solution that encompasses both an open specification and an open source project.

TL;DR InfluxDB Tech Tips: Downsampling with Flight SQL and AWS Lambda

This tutorial covers how to perform downsampling with the new InfluxDB storage engine, InfluxDB IOx, in InfluxDB Cloud (available on AWS us-east-1 and AWS eu-central-1 starting January 31st) using AWS Lambda. This tutorial describes how to: InfluxDB IOx addresses key user needs including (but not limited to): We achieved these goals by building InfluxDB IOx on the Apache ecosystem (Apache Parquet, Apache DataFusion, Apache Arrow, and Apache Flight SQL).

Autocatalytic Adoption: Harnessing Patterns to Promote Honeycomb in Your Organization

When an organization signs up for Honeycomb at the Enterprise account level, part of their support package is an assigned Technical Customer Success Manager. As one of these TCSMs, part of my responsibilities is helping a central observability team develop a strategy to help their colleagues learn how to make use of the product.

The Importance of Uptime for Your Website

Business operations have been revolutionized by the advent of web-computing services. Many organizations now look to decrease or eliminate expenditure, increase efficiency, and maximize profits by moving their processes online because of the unmatched flexibility and ability to scale the cloud affords them. With this sea-change to online, cloud-based operations for businesses has come a new challenge: availability.

Sponsored Post

Complete observability & monitoring of your integration infrastructure

Integration is a fundamental part of any IT infrastructure. It allows organizations to connect different systems and applications together in order to share data and information. As organizations become more complex and interconnected, they need to ensure they have complete observability and monitoring of their integration architecture. This is essential in order to discover, understand and fix any issues that can arise.

Synthetic Monitoring for Azure Virtual Desktop (AVD) Applications

Synthetic Monitoring Platforms should be used for synthetic monitoring of AVD applications as they provide more complete and accurate results. These platforms are designed to work within virtualized environments and can interact with the underlying operating system, allowing for more in-depth testing and diagnosing performance issues.

How 1Password Relies on Checkly for Secure System Health Monitoring for Thousands of Business Customers

1Password uses Checkly to provide transparent, advanced synthetic monitoring to 1Password SCIM bridge customers 1Password is a leader in human-centric security and privacy, with a solution that’s built from the ground up to enable anyone—no matter the level of technical proficiency—to navigate the digital world without fear or friction when logging in.

SolarWinds Observability: Helping to Accelerate Application Development

Observability is the practice of equipping software and infrastructure with tools capable of gathering actionable data showing not only when an application error or issue occurs but why it occurred. Most traditional monitoring tools gather information passively; observability practices are different. They focus on actively gathering relevant data, especially factors driving operational decisions and actions.

ScienceLogic Recognized in TrustRadius 'Best of' Awards for 2023

At ScienceLogic, we’ve built our customer operations to ensure our customers have an excellent end-to-end customer experience, with a rock-solid plan for improving our customers’ ability to meet and exceed their desired business outcomes. And well, we must be doing something right: ScienceLogic has been recognized in this year’s TrustRadius ‘Best of’ Awards—through our customer’s direct feedback.

New in Grafana Tempo 2.0: Apache Parquet as the default storage format, support for TraceQL

Grafana Tempo 2.0 is finally here, and it’s being released with two new important features. It took us longer than we would have liked to get this release going, but it turns out that rewriting your backend AND building a new query language is quite difficult. Thanks to a massive team effort, we are proud to release Tempo with support for TraceQL and with Apache Parquet as the default backend storage format. Read on to get a quick overview of this huge release.

Grafana Agent v0.31 release: new Helm chart, Flow support for Grafana Phlare, and more

Here at Grafana Labs, we aim to create products which integrate well with open standards and are easy to install everywhere. Today, we’re excited to announce Grafana Agent v0.31, which allows you to connect to even more types of observability signals for both scraping and remote writes. And to help you install the Agent more easily, there is now an official Windows Docker image and an official Helm Chart. Here’s a breakdown of the latest features and upgrades in Grafana Agent v0.31.

Network Observability: A Side-By-Side Comparison of WhatsUp Gold NTA & Flowmon

Imagine starting your car in the morning and having your attention captured by a little red check engine light. After expressing frustration in your own unique way, your next objective is to determine why this little light has brought darkness to your morning. Your owner's manual clearly outlines how to operate and routinely maintain your vehicle, but all you know about this little light is that you’ll soon be meeting your local mechanic.

Best Practices for Enriching Network Telemetry to Support Network Observability

Network observability is critical. You need the ability to answer any question about your network—across clouds, on-prem, edge locations, and user devices—quickly and easily. But network observability is not always easy. To be successful, you need to collect network telemetry, and that telemetry needs to be extensive and diverse. And once you have that raw telemetry data, you need to interpret it.

Announcing the General Availability of Our New High-Performance Time Series Engine in InfluxDB Cloud

Back in October 2022, our Founder and CTO Paul Dix announced the limited release of InfluxDB IOx, our new database engine. After several months of beta testing, we’re excited to announce the next phase of our database engine: general availability. As of today, InfluxDB IOx releases to the rest of the world as the new and improved InfluxDB Cloud.

The 5Ws (and 1H) of the New InfluxDB Cloud

Some things are inevitable, like Thanos, paying taxes, and change. While it would be nice to simply snap our fingers and deliver new products, things aren’t so simple in the real world. InfluxDB has been the leading time series database since January 2016. But we’re not content to rest on our laurels. The quest to improve InfluxDB is constant and ongoing. As of today, we’re beginning the rollout of an all-new and improved InfluxDB Cloud powered by IOx.

10 Best RUM Tools for Improving Your User Experience

Having user-friendly programs is paramount for any software company’s success. To gain a better understanding of your users’ experience and enjoyment, it’s vital that you learn how customers interact with your app or website. Real user monitoring (RUM) solutions enable your company to visualize how users interact with your software, helping you learn what works best for your customers so you can thrive against the competition.

Local Variables for NodeJS in Sentry

Stack traces show us exactly where an exception occurred, but you can still be left wondering: What arguments or state caused the exception to occur? If you can reproduce the issue locally with a debugger attached you’ll have access to these local variables, but with Sentry you can identify the exception location without needing to reproduce the issue locally. By including local variables with stack traces, Sentry events become much closer to the full debugging experience.

What Causes Packet Loss: Reasons Why Your Data is Going MIA

Packet loss is a common problem in computer networks that can have a significant impact on network performance and business productivity. When data packets are lost through the network, the data they contain needs to be retransmitted, which can slow down data transfer rates and cause interruptions in real-time applications like video conferencing and voice calls.

Optimizing VPC Flow Logs - Part 2

As cloud deployments scale, Amazon Web Services (AWS) VPC flow logs become an invaluable network visibility and security tool. They are also one of the most voluminous classes of data, making them an expensive choice to add to analytics platforms. With growing infrastructure and traffic, managing these logs presents significant challenges. ‍In part 1 of this series, we took a look at common use cases and problems associated with storing and processing VPC Flow Logs.