Operations | Monitoring | ITSM | DevOps | Cloud

March 2022

An Introduction To Using Nastel With Amazon MSK

Apache Kafka is a very popular open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Amazon MSK (Managed Streaming for Kafka) is a fully managed Kafka service that makes using Kafka even easier. But how do you get started with Kafka on AWS?

Monitor your hybrid mobile applications with Datadog

Hybrid mobile applications allow you to incorporate web-based content into your mobile offerings. By embedding webviews inside your iOS or Android app, you can repurpose existing code to build key mobile functionality, such as authentication processing or ad rendering. While hybrid apps can help streamline your development process, they can also make monitoring your system more complex.

Digital experience and digital experience platforms, defined

Usually, people want a seamless experience when they interact with any organization. Whether a B2B or B2C interaction, the expectation is the same. In our world today, it's nearly impossible to interact with an organization without using technology. When a person interacts with an organization via a digital medium, it can be termed digital experience.

TL;DR InfluxDB Tech Tips: Converting InfluxQL Queries to Flux Queries

If you’re a 1.x user of InfluxDB, you’re most likely more familiar with InfluxQL than you are with Flux. To gain a deep understanding of Flux, it’s important to understand: However, you can still use Flux without studying those topics. In this TL;DR, we’ll convert common InfluxQL queries into Flux and identify patterns between the two languages to help you get started using Flux more easily if you come from a InfluxQL or SQL background.

Heartbeat check and more - is your monitoring still alive?

SIGNL4 is a cloud-based mobile alerting and incident response service. Third-party systems like monitoring tools, control systems or IoT sensors detect abnormalities and transmit events to SIGNL4 over the Internet. What if your systems cannot transmit critical events anymore? That might happen when the Internet is down or when the tool itself has a problem. In this case, SIGNL4 would miss critical events and could not turn them into alert notifications to your IT admins, technicians and experts.

The NetOps Expert - Episode 5: Broadcom Software and AppNeta - Part 2

Jeremy Rossbach, Head of DX NetOps Product Marketing and Alec Pinkham, Head of AppNeta Product Marketing continue their discussions on the recent acquisition of AppNeta by Broadcom and the reasons why the combination of both network monitoring solutions sets it apart from the industry and why our customers should be excited for the future of their network visibility.

APIs for IT Monitoring Solutions

The majority of monitoring and management solutions used in enterprises provide their customers with APIs (Application Programming Interfaces) and a CLI to facilitate DevOps type workflows. With IaC (Infrastructure as Code) becoming de facto and ubiquitous, decent APIs have long been a must have on product evaluation checklists; there are of course a few exceptions – namely products aimed only at SMB (Small and Medium Business), immature startups, or freeware.

Top Infrastructure Monitoring Tools and Best Practices

As more and more organizations adopt cloud-native technologies, the need to align business objectives and end-user experience with IT infrastructure’s availability and performance is ever-growing. This trend requires infrastructure monitoring setups to ensure that all of your systems are active and working together across your cloud environments, host operating systems, storage systems, etc.

AWS vs GCP: Top Cloud Services Logs to Watch and Why

The infrastructure and services running on public cloud platforms like Google Cloud Platform (GCP) and Amazon Web Services (AWS) produce massive volumes of logs every day. An organization’s log data provides details about their entire IT environment in real time, or at any point in time in history.

Flowmon and WhatsUp Gold: Discover application experience issues through single pane of glass

Have you ever experienced user complaints and struggled to find the root cause of the performance degradation? I'm sure every IT operations professional has. Is it the application? Is it the underlying infrastructure? Is it the network? What if you have a single pane of glass that will gather all the relevant metrics and telemetry and display it in an intuitive and easy to understand fashion?

Benefits of Localized Distributed Tracing

Distributed tracing is a household term nowadays – if your house is an IT department! Modern enterprises use cloud-native applications for agility and responsiveness to customer needs. When monitoring cloud-native applications, distributed tracing follows how transactions perform while traversing services or containers in the backend architecture. By definition, we’re describing production applications with requests, methods, database calls and logs that accompany a transaction.

How to get started with OpenTelemetry auto-instrumentation for Java

If you’re new to OpenTelemetry, like I was, you might be wondering how to quickly get started. OpenTelemetry is becoming the gold standard to collect all of your machine data and is changing observability as we know it. Instead of learning multiple technologies to collect all data, you can leverage a single cloud-native framework to complete your observability.

Taming the Complexity of Windows Event Collection with Cribl Stream 3.4

OK, first things first. I have to admit that I am, first and foremost, an old-school UNIX systems administrator. I’m that grizzled sysadmin in your shop who soliloquizes wistfully about managing UUCP for email “back in the day.” Centralizing Logs? Yeah, we had syslog, and saved it all off to compressed files.

Why You Should Monitor BGP and Where to Start 031122

BGP isn’t just for ISPs and hosting providers anymore. As we saw with Facebook’s historic outage, it’s now a necessity for digital enterprises to proactively monitor BGP. In this webinar, Director of Product Management Anil Murty will introduce Kentik’s new proactive BGP monitoring capabilities. Join Anil to learn.

How to automate verification of deployments with Argo Rollouts and Elastic Observability

Shipping complex applications at high velocity lead to increased failures. Longer pipelines, scattered microservices, and more code inherently lead to bigger complexity where small mistakes may cost you big time.

6 Common DynamoDB Issues

DynamoDB, the primary NoSQL database service offered by AWS, is a versatile tool. It’s fast, scales without much effort, and best of all, it’s billed on-demand! These things make DynamoDB the first choice when a datastore is needed for a new project. But as with all technology, it’s not all roses. You can feel a little lost if you’re coming from years of working with relational databases. You’re SQL and normalization know-how doesn’t bring you much gain.

New StackPod Episode: Implementing an SRE Practice with Yousef Sedky of Axiom/Hyke

For our latest StackPod episode, we invited Hyke’s DevOps team lead and AWS Cloud architect: Yousef Sedky. Axiom Telecom is one of the largest telephone retailers in the United Arab Emirates and Saudi Arabia and Hyke, its sister company, is a distribution platform for mobile products.

Sentry Points of Presence: How We Built a Distributed Ingestion Infrastructure

Event ingestion is one of the most mission-critical components at Sentry, so it’s only natural that we constantly strive to improve its scalability and efficiency. In this blog post, we want to share our journey of designing and building a distributed ingestion infrastructure—Sentry Points of Presence— that handles billions of events per day and helps thousands of organizations see what actually matters and solve critical issues quickly.

Managing Kafka on AWS with Nastel Navigator

Apache Kafka is a very popular open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Amazon MSK is a fully managed Kafka service that makes using Kafka even easier. But how do you get started with Kafka on AWS? How do you migrate your existing Kafka environment to the cloud? How do you manage many Kafka environments at once?

How to use path wildcards in Graphite Module

The Graphite module of Icingaweb2 is a great tool for rendering graphs from performance data. The latest version 1.2.0 introduces some bug fixes and new features like dark and light mode support, a lot of new templates, support for path wildcard, etc. The support for wildcard * was already available in the previous version. We have extended this support and added all the wildcards that graphite offers.

Spark Performance Monitoring using Graphite and Grafana

In this article, we will explore what Apache Spark is, what key metrics you need to track to keep it running, and how to set up a metrics tracking process. We will also cover monitoring tools such as Graphite and Grafana, which make the process of monitoring metrics very easy, as well as how using MetricFire can make running your monitoring exponentially easier. Check out MetricFire for free or book a demo with our team and learn more about all the benefits of using MetricFire solutions.

Announcing Grafana Mimir, the most scalable open source TSDB in the world

Today we’re introducing you to Grafana Mimir, the most scalable, most performant open source time series database in the world. Mimir allows you to scale to 1 billion metrics and beyond, with simplified deployment, high availability, multi-tenancy, durable storage, and blazing fast query performance that is up to 40x faster than Cortex. There’s supposed to be a video here, but for some reason there isn’t. Either we entered the id wrong (oops!), or Vimeo is down.

Elastic on Elastic - Using Elastic Observability to optimize the performance of detection rules in Elastic Security

Elastic Security’s developer support team has recently seen a surge in reports from customers about sluggish performance in our UI. Our initial inspection of logs for troubleshooting provided some insights, but not enough for a true fix. Luckily, we have Elastic Observability and its APM capabilities to dive in deeper and look under the hood at what was really happening within Elastic Security. And, more importantly, how we could improve its performance for customers.

OpenTelemetry: How Cisco and AppDynamics are contributing to the future of observability

At Cisco and AppDynamics, we believe that OpenTelemetry™ is the future of observability. To that end, we’re working hard alongside the open source community to ensure the telemetry you collect can be leveraged to deliver the world-class business insights you need.

How to set up Prometheus monitoring for your services

When you run applications in production, you need to monitor the infrastructure they run on - and collect important signals about application health like error rates and latency. In this episode of Engineering for Reliability with Google Cloud, Yuri will demonstrate how to instrument your service to expose application-specific telemetry with Prometheus and how to configure Google's managed service for Prometheus to collect those metrics.

Synthetics 101 - Part 2: Protecting and growing revenue with proactive monitoring

In part 1 of our synthetics series, we looked at tracking network performance to drive better business outcomes. Here in part 2 of our series, we’ll dig into the very first and most basic business outcome of using digital experience monitoring (DEM). That is, we’ll look at how to protect and grow revenue by proactively monitoring the health, availability and uptime of your critical applications and services, so you can fix issues before your customers’ experience suffers.

Introducing Symphony: Catchpoint's New User Experience Platform

We at Catchpoint are always striving to help our customers improve their products’ user experience, because, as we believe that “the experience is the point.” With a motto like that, you can bet we take the usability and effectiveness of our own platform very seriously, and are constantly striving to deliver a world-class user experience in the Catchpoint Portal. With that in mind, we have spent the better part of the last few years redesigning Catchpoint Portal from the ground up.

Android Manifest Placeholders

Android Manifest file is essential for any Android app, which contains specific information about your app, Android build tools, Google Play, device permissions, app launch information, operating system config and more. Every Android app must have an AndroidManifest.xml file in the directory structure. Android Manifest usually contains pre-defined or static information which is then used to run the app.

What is a network monitoring system?

Network monitoring is a set of automatic processes that help to detect the status of each element of your network infrastructure. We are talking about routers, switches, access points, specific servers, intermediate network elements, and other related systems or applications (such as web servers, web applications, or database servers).In other words, network monitoring can be understood as taking a look at all the connected elements that are relevant to you or your organization.

Serverless observability with OpenTelemetry and AWS Lambda

Nowadays, microservice architecture is a pattern that helps to innovate quicker by enabling easier scalability, giving language flexibility, improving fault isolation, etc. Systems built this way also bring some downsides. Moving parts, concurrent invocations, and different retries policies can make operating and troubleshooting such systems challenging. Without proper tools, correlating logs with metrics may be difficult. To overcome these challenges, you need observability.

What's New and What You Can Do with WhatsUp Gold 2022

We’re excited, WhatsUp Gold 2022.0 has arrived! As you know, network and system admins are required to monitor everything connected to the network and fix problems before they impact end-users. And, with WhatsUp Gold, you can do just that! Find and fix network problems fast and before anyone even notices.

Data-Driven Enterprise (or Just Talking Like One)?

Who’s faking it? Data-driven decision-making (and ensuring related investment is delivering on its promises) has been elevated in importance. Every enterprise is replacing opinion with data-driven decision-making, correct? A brand new research report from EMA (Enterprise Management Associates) titled, “A Data-Driven Enterprise” looks behind the curtain to separate facts and reality from fiction and aspiration. Here are some of the findings from the latest research on enterprises.

What's next for the Internet of Things?

IoT devices seem to be ubiquitous, but the truth is we’re not nearly there yet. In fact, IoT Analytics continues to predict steady growth as the future for IoT for years to come, with more than 27 billion devices online by 2025. They’re not alone in their bullish IoT predictions either. MarketsAndMarkets projects the global IoT market will more than double from 2021 to 2026, growing from just over $300 billion to over $650 billion.

InfluxData Recognized for Industry Leadership in 2022 Data Breakthrough Awards

InfluxDB wins Best Use of Data for IoT Applications category SAN FRANCISCO, March 29, 2022 – InfluxData, creator of the leading time series platform InfluxDB, today announced InfluxDB has been named a winner in the Data Breakthrough Awards for the Best Use of Data for IoT Applications category. Conducted by Data Breakthrough, an independent market intelligence organization, the awards recognize the top companies, technologies and products in the global data technology market today.

What is SecOps?

SecOps is a short form for Security Operations, a methodology that aims to automate crucial security tasks, with the goal of developing more secure applications. The purpose of SecOps is to minimize security risks during the development process and daily activities. Under a joint SecOps strategy, the security and operations teams work together to maintain a safe environment by identifying and resolving vulnerabilities and resolving any security issues.

Debugging JAXB Production Issues

Java Architecture for XML Binding (AKA JAXB API) is a popular API for marshaling XML data. It’s a framework for mapping between XML documents and Java POJOs (Plain Old Java Objects, AKA regular Java classes) almost seamlessly. The API is very easy to use and many frameworks leverage it to provide their XML support. JAXB2.0 has gained popularity both in desktop applications (Java SE) and in application server code (Spring Boot, Java EE/Jakarta EE, Microprofile etc.).

How to Run Java Inside Docker: Best Practices for Building Containerized Web Applications [Tutorial]

Containers are no longer a thing of the future – they are all around us. Companies use them to run everything – from the simplest scripts to large applications. You create a container and run the same thing locally, in the test environment, in QA, and finally in production. A stateless box built with minimal requirements and unlike virtual machines – without the need of virtualizing the whole operating system.

Add severity levels to your alert policies in Cloud Monitoring

When you are dealing with a situation that fires a bevy of alerts, do you instinctively know which alerts are the most pressing? Severity levels are an important concept in alerting to aid you and your team in properly assessing which notifications should be prioritized. You can use these levels to focus on the issues deemed most critical for your operations and triage through the noise.

Getting Started with Prometheus and Grafana

The world is moving towards a data-driven society, resulting in businesses gathering more data to leverage for profit. This data is either in structured forms like JSON/CSV/XML or unstructured forms like text written on a notepad or typed texts. Regardless of the data’s formatting, the important thing is the information it contains. Various tools in the market extract data from different agents and platforms that will improve your metric monitoring system.

Video: Top 3 features of the New Relic data source plugin for Grafana Enterprise

Grafana Labs and New Relic have a long history of working together to drive cross-functionality so joint customers can benefit from using Grafana and New Relic together. The New Relic data source plugin for Grafana — which is available to users with a Grafana Cloud account or with a Grafana Enterprise license — is no exception. In this quick tutorial video, we’ll not only show you how easy it is to configure the New Relic data source plugin in Grafana.

Cribl Edge: Nobody Puts Data in the Corner

Has this ever happened to you: ‘I have too many agents to help me collect data for processing into separate SIEMs. It’s a pain to make any changes to their configuration!’ Or perhaps this one: ‘I have a large kubernetes deployment, but I just can’t seem to get metrics and logs out of it and into my SIEM or TSDB!’ Fear not, weary administrators, Cribl Edge is here!

A Better Environment for Observability, at Your Service

We’ve made some big changes under the hood at Honeycomb to give you better control over how you put your apps data to work—we’ve expanded our core data model with formal Environments and Services! In short, the best observability (o11y) platform in town just got better! Before we dig in, an important note. Existing Honeycomb teams are not impacted by this update. If you’re already a Honeycomb user, congratulations! Your team is now a Honeycomb Classic team.

Bill Marantz of Linode on automation, mentorship, and problem solving | Network AF Episode 14

Bill Marantz joins Avi and Network AF to discuss his love for open source and automation technologies. The two also discuss mentorship and recruiting during rapid stages of company growth, and touch on problem solving in networking without a technical background.

How to Gather Insights From Your Network Traffic Pattern Analysis

What’s your network doing right now? Where is traffic flowing to, and where’s it coming from? Are there bottlenecks you don’t know about? Where’s the next problem going to be? Network traffic pattern analysis answers these questions and more. It’s a way for you to examine how your clients use your networks. You may think you know how heavily your clients utilize each segment and VLAN and where the weak points are. But do you?

The Sentry Ruby SDK now supports Release Health

Developers work tirelessly to publish updates to improve their products and services because, as we all know, a better user experience = happier customers. While shipping updates, features, and improved capabilities can help improve your user’s experience, introducing new code can also introduce new issues; and finding exactly what update caused a release to degrade can be time consuming and costly.

NiCE VMware Management Pack 5.5 Release | Cross-Datacenter HA Monitoring

Unlock business benefits by monitoring your VMware‘s High Availability infrastructure. Secure, protect, and manage your VMware environment and digital workspaces based on advanced analytics. The NiCE VMware Management Pack for Microsoft System Center Operations Manager provides a complete and easy-to-set-up VMware monitoring suite. Auto-Discovery, out-of-the-box metrics, graphs, reports, and enriched alert content help you spot and solve issues before they affect the system.

Shadow DOMs come to light with Uptrends Transaction Monitor

If you build or manage websites that rely heavily on transaction monitoring technology, then you are probably acutely aware that as these applications grow, they often encounter brittleness based on interactions between HTML, CSS, and JavaScript coding. Fortunately, developers have arrived at a solution to circumvent these problems in the form of shadow Doms.

SigNoz - Open-source alternative to AppDynamics

If you're looking for an open-source alternative to AppDynamics, then you're at the right place. SigNoz is a perfect open-source alternative to AppDynamics. SigNoz provides a unified UI for both metrics and traces with advanced tagging and filtering capabilities. In today's digital economy, more and more companies are shifting to cloud-native and microservice architecture to support global scale and distributed teams.

Obkio Vision Release: Free Visual Traceroute Tool

After a year of hard work, Obkio is proud to announce the release of our brand new free Visual Traceroute tool! Learn about the story behind our new tool and how IT pros can leverage Obkio Vision visual traceroute tool to quickly identify and troubleshoot network and Internet problems faster than ever before.

Where Will Process Historians Fit in the Modern Industrial Technology Stack?

When Rolls Royce Power Systems recently needed to improve its operational efficiency within its manufacturing plants, it didn’t expand its use of a legacy process historian or purchase historian connectors to export data to their business intelligence systems. Instead, it decided to go with a modern time series database, InfluxDB. Graphite Energy, another customer we featured in our recent IIoT announcement, also chose InfluxDB over the legacy process historian vendors. Why?

400% Increase in User Adoption in 1 Month? How This Company Made It Happen

After three months of no activity, this organization breathed life back into its new self-service portal with a creative outreach campaign. “Fast” and “user adoption” rarely go hand in hand. In fact, 83% of senior executives said their biggest challenge with digital transformation was getting their staff to use the software. But that doesn’t mean we have to accept this as the status quo.

10 Best Examples of Branded Website Status Pages

Status pages are a valuable asset for any website or SaaS business – especially today when outages and downtime have never been more common and uptime expectations have never been higher. Whether your site is down or all systems are perfectly operational, hosted status pages provide external users and internal stakeholders with a single source of truth regarding uptime performance. When done well, status pages are elegant and custom-branded.

Webinar Recap: Launching Cribl Edge

Last week, Cribl launched the latest component of its observability architecture: Cribl Edge. ICYMI, Cribl Edge is a next generation observability data collector that greatly simplifies gathering your metrics, events, and logs. Edge incorporates all of the capabilities of Cribl Stream’s workers, allowing you to route, redact, filter, and enrich data directly from the source. Why is this important?

Celebrate We Will!! Cribl Turns 5 With 300 Employees!!

Today, Cribl is celebrating two significant milestones that are incredibly special to our founders and the entire company. Yesterday, Cribl celebrated its fifth anniversary, a day also shared with Clint’s son’s birthday. While we’re sure there was much celebrating (and cake!), it really earmarked the day our founders decided that building innovative software to help solve technology professionals’ most pressing problems was only going to happen if they were driving it.

UBER's Microservice Architecture

UBER’s Microservice Architecture 💡 Microservice Architecture is a framework that consists of small, individually deployable services performing different operations. Amazon, Netflix, Twitter, Uber, and many other high-growth companies are now shifting from a monolithic architecture into multiple codebases to form a microservice architecture.

How the Great Firewall of China Affects Performance of Websites Outside of China

The Great Firewall of China, or as it’s officially called, the Golden Shield Project, is an internet censorship project to block people from accessing specific foreign websites. It is the world’s most advanced and extensive Internet censorship program. This project implements multiple techniques and tactics to censor China’s internet and controls the internet gateways to analyze, filter, and manipulate the internet traffic between inside and outside of China.

How to achieve Observability for Microservices-based apps using Distributed Tracing?

Modern digital organizations have rapidly adopted microservices-based architecture for their applications. Microservices-based apps have components designed around business capabilities serving a specific purpose. It enables smaller engineering teams to own specific services that lead to increased productivity. But componentization also leads to complexity. Today’s modern internet-scale businesses have hundreds or thousands of microservices.

Monitor IBM MQ Appliance Metrics

Nastel technical specialist, Terry House, has written a blog post here looking at the IBM MQ appliance and comparing it to the traditional software form of IBM MQ. He explores the additional security and cost benefits and also the additional monitoring and management requirements that are addressed by the Nastel Navigator X i2M platform.

A guide to Apdex score: Calculations, improvements, and more

Apdex scores focus and align the varying perceptions of different teams. If you ask people in your organization to define what “performance” means for the application you’ve developed and deployed, you’re likely to get different answers. An SRE engineer might argue that a performant application has the highest possible uptime. A designer might say a critical dimension of performance is how easily users can get tasks done thanks to a carefully-crafted UI.

Avantra 21.11.4: Monitor Java instances without an SAPControl user

Each Avantra version packs powerful features and enhancements that make it even simpler to work with and integrate - leading to better system quality, resilience and compliance. Avantra 21.11.4 lives up to this expectation by offering the new remote Java monitoring support feature.

Our Approach to Machine Learning

There is a lot of buzz in the world of machine learning (ML) and as a layperson it can be hard to keep up with it all. Therefore, we decided to write down some of our thoughts and musings on how we are approaching ML at Netdata. We’ll touch on the current state of applied ML in industry in general, and zoom in on ML in the monitoring industry.

Mist Clears the Way for Multicloud Observability

A multicloud strategy is a necessity for modern businesses, as the recent AWS outages made clear, but managing this infrastructure remains a huge challenge. Infrastructure management teams have long struggled to juggle diverse technology solutions, policies and services to get access to a point-in-time view of their resources. The result is either waste through overprovisioning or huge overheads for nitpicking manual management and repetitive tasks.

How Distributed Tracing augments the APM experience?

There are standalone distributed tracing tools like Jaeger, and there are APM tools that do not provide distributed tracing capabilities. In this article, we will see how distributed tracing complements an APM tool for a holistic performance monitoring experience. Both APM and distributed tracing are critical tools to understand the performance of your applications. And if your application is facing performance issues impacting customer experience, you need to understand what’s causing it fast.

Frontend Performance Monitoring - Tools and How to Choose the Right One

Monitoring is as important as application development to keep an application running healthy for the best user experience. For this reason, a strong monitoring strategy is essential for your company's success, ensuring that metrics such as constant performance, high availability, and accessibility are never a concern. Many businesses neglect the importance of frontend monitoring for their applications.

Is the Cloud an Experience or a Destination?

In a recent episode of the Cloud Happens podcast, Archana Venkatraman, Associate Research Director in Cloud Data Management at IDC Europe talks about how the cloud isn’t a destination. It’s a continuum; a journey. In this blog, we explore that idea a bit more and dive into what really encapsulates a cloud experience. How can modern enterprises benefit from their cloud journey to solve the most gnarly data challenges to unlock innovation, enhance security, and drive resilience.

API & HTTP Headers: How to Use Request Headers in API Checks

In previous posts we covered why it’s important to monitor APIs and how to monitor and validate data from APIs. In this post we’ll focus on a simple but key feature that helps Splunk Synthetic Monitoring users create robust checks for availability, response time, and multi-step processes: Request Headers

Visibility Anywhere: Key Takeaways from the NetOps Virtual Summit

What do big mountain ascents and modern network operations have in common? You’ll only succeed when you’re learning from experience. This was one among many compelling takeaways that attendees took from our recent NetOps Summit. Centered on the theme “visibility anywhere,” this event featured a number of compelling presentations, including a keynote from Jimmy Chin, the professional climber, photographer, and Academy Award-winning filmmaker.

Prometheus network monitoring: a new open source generation

Prometheus seeks to be a new generation within open source monitoring tools. A different approach with no legacies from the past. For years, many monitoring tools have been linked to Nagios for its architecture and philosophy or directly for being a complete fork (CheckMk, Centreon, OpsView, Icinga, Naemon, Shinken, Vigilo NMS, NetXMS, OP5 and others). Prometheus software, however, is true to the “Open” spirit: if you want to use it, you will have to put together several different parts.

Dashboard Fridays: Sample Salesforce dashboard

In our latest Dashboard Fridays episode we showcase this sample Salesforce metrics dashboard built using SquaredUp. Pulling data from Salesforce, this dashboard provides visibility of new business performance for the entire company. Accessed individually or viewed on an office wallboard – all employees from Engineering, Marketing, Sales and Finance can stay up to date with latest sales performance data.

Monitoring Proxmox VE via Managed VictoriaMetrics and vmagent

In this blog post we’re going to walk you through how to monitor Proxmox VE via Managed VictoriaMetrics and its vmagent, including a step by step guide on how to set up, configure and visualise this environment. Proxmox VE is a complete, open-source server management platform for enterprise virtualization. It tightly integrates the KVM hypervisor and Linux Containers (LXC), software-defined storage and networking functionality, on a single platform.

Monitor your AWS Lambda functions' ephemeral storage usage

AWS Lambda is AWS’s solution for highly portable, serverless computing. With Lambda functions, you can deploy and run business logic code without managing the underlying servers. Today, AWS announced that Lambda customers can now provision up to 10 GB of ephemeral storage for each of their functions, making them well-suited for new, data-intensive workloads—including machine learning inference, large media file processing, financial analysis, and more.

InfluxDB as an IoT Edge Historian: A Crawl/Walk/Run Approach

The question of how to get data into a database is one of the most fundamental aspects of data processing that developers face. Data collection can be challenging enough when you’re dealing with local devices. Adding data from edge devices presents a whole new set of challenges. Yet the exponential increase in IoT edge devices means that companies need proven and reliable ways to collect data from them.

Grok Pattern Examples for Log Parsing

Searching and visualizing logs is next to impossible without log parsing, an underappreciated skill loggers need to read their data. Parsing structures your incoming (unstructured) logs so that there are clear fields and values that the user can search against during investigations, or when setting up dashboards. The most popular log parsing language is Grok. You can use Grok plugins to parse log data in all kinds of log management and analysis tools, including the ELK Stack and Logz.io.

SolarWinds Orion + Squadcast: Alert Routing Made Easy

SolarWinds Orion is a scalable infrastructure monitoring and management platform. It is designed to simplify IT administration for on-premises, hybrid, and software as a service (SaaS) environments, in a single pane of glass. SolarWinds Orion ensures you do not have to struggle with numerous incompatible point monitoring products, as it consolidates the full suite of monitoring capabilities into one platform with cross-stack integrated functionality. Squadcast is an end-to-end incident response tool.

The Best Open Source Logging Tools

Users of open-source log collectors and log monitoring solutions often preferred these solutions due to them being well suited for speed, flexibility and their ability to attract talented contributors who are willing to invest time to maintain technology projects they are passionate about. In this post, we’ll look at some of the best free and open-source logging tools out there today.

Troubleshooting SCOM 2022 Teams Integration

When we were setting SCOM 2022 Teams integration in our environment these are some of the issues and fixes we discovered. For the full setup guide for SCOM 2022, or if you are looking for integrations for older versions of SCOM (2012 R2+) or more functionality, such as bi-directional sync, check out our blog on ‘How does SCOM 2022’s native Microsoft Teams Integration Work?’

New StackPod Episode: Best Practices for AWS Observability With Russell Foster of StackState

We’re excited to share that we are celebrating our tenth podcast episode! For this episode, we invited Russell Foster. As a DevOps engineer at StackState, Russell is responsible for making sure our SaaS product runs smoothly on AWS. Over the years, Russell has worked at both startups and more mature companies, where his responsibilities ranged from keeping things up and running in cloud environments to making sure hybrid and on-premise environments remain stable and reliable.

Parameterizing GitLab CI/CD?

While doing packaging for Icinga, I noticed we have a lot of YAML files describing GitLab pipelines doing very similar jobs. The same build job across different operating systems. That’s wasteful behaviour, which leads to a bigger workload when it comes to modifying these jobs. Tasks like adding new versions and especially adding new operating systems become tedious. What I’m looking for is a way to have interchangeable values for our building jobs.

G2 awards Sematext as high performer in Spring 2022 Reports

At Sematext, we are dedicated to making troubleshooting easier for ops teams. When we started to receive positive reviews from our customers around the globe, we knew we were doing something right. Even as our userbase grew across multiple industries, we continued to get positive feedback. We even received a few awards along the way. In this post, we’re delighted to announce that Sematext Cloud is featured in the G2 Spring 2022 Reports under Monitoring Software Solutions category as.

Full trace retention search comes to Grafana Cloud

This week we have turned on full trace retention search (beta) in Grafana Cloud Traces. This feature was also introduced in the recent release of Grafana Tempo v1.3. Previously, if you brought up your Grafana Cloud Traces data source, you were greeted with this message: This message simply warned the user that the Grafana Tempo search was calibrated for recent traces only, regardless of the selected time window.

Citrix Workspace Monitoring and Management

This is another customer blog highlighting common customer scenarios that we typically see while working with large enterprise customers. There are many use cases in regard to a typical Citrix Digital Workspace environment where the endpoint is managed by a different team, outsourced to an external organization, or even unmanaged when employees work from home on a non-corporate device.

Ask Miss O11y: Do I Need Observability If My Stack Is Boring?

Observability came out of microservices and cloud-native, right? If you have a simpler architecture, does o11y matter?” — this question came up during recent office hours Yeah, sort of. On both counts—yeah, it sorta came out of microservices and cloud native, and yeah sorta, you need it with a simpler architecture (though perhaps not as desperately as you otherwise might).

Quickly identify client side errors and latency, with Splunk RUM

Splunk RUM (Real User Monitoring) helps software engineers quickly identify and isolate the most glaring client side performance issues, from errors to latency. Easily measure performance across your entire production environment, to specific services or applications, and seamlessly click into backend service performance.

How Many Servers Do I Need for Your Monitoring Solution?

What kind of questions do you ask when you look at infrastructure monitoring tools? You probably start with some of the more important ones: All of the above are good questions and definitely should be asked. After all, if you acquire a solution that only monitors half of your network devices, has an incredibly complicated and difficult to use UI or doesn’t move you from reactive to proactive, the benefits are limited.

Distributed Tracing 101

The boom of digital commerce is making all businesses take a closer look at how they deliver great customer experiences. To stay competitive, businesses today are using cloud-native architectures, because the cloud-native applications they produce deploy quickly and better support the continuous improvement cycles of agile methodology. Behind the practicality of keeping online customers happy are distributed cloud environments that business applications use for each customer interaction.

Welcome to 10GB of tmp storage with Lambda

Every Lambda function comes with 512MB of ephemeral storage in the shape of a /tmp directory. This storage space can be reused across multiple invocations for the same instance of a Lambda function. Each instance of a function has its own /tmp directory and data is not shared amongst different instances of a function.

Monitoring GraphQL APIs with OpenTelemetry

GraphQL is a query language for APIs developed by Facebook in 2012. It was then open-sourced in 2015. GraphQL enables frontend developers or consumers of APIs to request the exact data that they need, with no over-fetching or under-fetching. In this article, we will learn how to monitor GraphQL APIs with the open-source APM tool, SigNoz. GraphQL has become a popular alternative to REST because of its ease of use. It enables developers to pull data from multiple data sources in a single API call.

Virteva Customer Spotlight

Virteva is a multi-award winning ServiceNow Elite Sales and Services Partner that helps organizations create and maintain a competitive advantage through innovative workflow design, white glove support and technology that enhances effectiveness and resilience in a rapidly changing landscape of work. Virteva has over 2,500 implementations and 20+ years of Enterprise Service Management experience.

How to Integrate The Things Stack with InfluxDB Cloud in Minutes

In this demo, Samantha Wang shows how incredibly easy it is to integrate The Things Stack with InfluxDB Cloud, the managed time series platform. All you have to do is use the Things Network InfluxDB Template to ingest your Things data and view monitoring dashboards. From there you’ll be able to perform analytics and set up alerts & notifications. Watch to see how you can do it all in minutes.

Building An Agent From First Principles

Yesterday, we officially announced Cribl Edge, a next-generation observability agent. You can find more about its features here. In this post, I am going to walk you through the journey of incepting and building this new product. Our most important core value at Cribl is “Customers First, Always.” and that involves actively listening and being on the lookout for any pains our customers might be experiencing.

Improve Observability in Your CI/CD Pipeline

The most basic component of automated software development is a CI/CD pipeline. While the term "pipeline" has been used to describe a wide range of computer science concepts, we use it at CircleCI and throughout the DevOps industry to refer to the vast range of behaviors and activities that are involved in continuous integration (CI).

SCOM 2022 - New Delegated Admin Role Customisation Function

Possibly the most anticipated feature of SCOM 2022 is Delegated Administrators! It’s not the shiniest of features but it has long been a pain point of SCOM admins, which have only ever been able to use three pre-defined roles to grant to users (plus read only). In SCOM 2022 you now have the ability to create Delegated Administrators, which overcomes some of the pain points previously felt with rigid roles and permissions.

Netreo Further Strengthens Security Posture, Earning Veracode Verified Team Recognition for Entire Product Line

Huntington Beach, Calif. – March 23, 2022 – Netreo, the award-winning provider of IT infrastructure monitoring and observability solutions and one of Inc. 5000’s fastest growing companies, today announced the company has achieved Veracode Verified Team status for Netreo’s full-stack monitoring and observability suite.

What Are the Differences Between Elastic Beanstalk, EKS, ECS, EC2, Lambda, and Fargate?

Life before containerization was a sore spot for developers. The satisfaction of writing code was constantly overshadowed by the frustration of attempting to force code into production. For many, deployments meant hours of reconfiguring libraries and dependencies for each environment. It was a tedious process prone to error, and it led to a lot of rework. Today, developers can deploy code using new technology such as cloud computing, containers, and container orchestration.

DevOps State of Mind Ep. 9: Recruiting for a DevOps Culture

Liesse Jones: Today we're joined by Anna-Marie Gutierrez-Lee, affectionately known as AMG, who's the Director of Talent Acquisition at LogDNA. She's passionate about mentoring recruiting teams and connecting talent to their dream careers, while fostering a genuine and positive candidate experience. Today, we're going to talk about how to recruit for a DevOps culture and why it's so important to bring more underrepresented talent into tech.

Why it's important to have a monitoring strategy

Developing breathtaking apps isn't easy. You have to generate amazing ideas, incubate projects, hire best-in-class talent, and scale your solution across various hiccups and road bumps. So, this shouldn't be too surprising: 99.5 percent of apps fail. But here's what is surprising: most apps don't fail because of bad ideas, budgetary issues, or scaling woes. According to surveys, nearly 90 percent of users leave apps due to ongoing performance issues — not a lack of interest, time, or money.

WhatsUp Gold Lesson: How Do I Configure a Device to Monitor My Voice over IP (VoIP) SLA?

WhatsUp Gold provides specialized Voice over Internet Protocol (VoIP) Service Level Agreement (SLA) monitoring necessary for tracking Quality of Service (QoS) and SLA metrics for your network of deployed VoIP devices (telephones, conference room devices and more). After you discover and configure devices for VoIP monitoring, you can view results and graphs on the WhatsUp Gold Custom Performance Monitors dashboard.

What's new in Sysdig - March 2022

Welcome to another iteration of What’s New in Sysdig in 2022! The “What’s new in Sysdig” blog has fallen to me, Jason Donahue, for the month of March! I am a Solutions Engineer based in New Jersey and a member of the Sysdig US East Enterprise team since September, 2021. I have worn many hats in my career, from Networking to Systems Administration to Software Engineer.

More Choice, Less Compromise: We're Taking You to the Edge!

It’s been a busy Winter at Cribl! Today we are officially announcing Cribl Edge, a next-generation agent that expands the scope of observability. In Edge, we’ve taken the very concept of “agent” and given it a Cribl power-up by taking our best-in-class observability pipeline technology built into Cribl Stream and moving it all the way out to edge systems.

Announcing Cribl Edge & Cribl Stream

In 2022, administrators are still managing agents which collect data for observability and security the same way they did 15 years ago: typing in configuration files by hand. A lot has changed since 2006 when Amazon announced AWS. Instead of racking and stacking servers in data centers, we’re spinning up compute resources in a variety of forms – at the click of a button, or automatically through APIs.

Customers First, Always: Thanks for Making the Best Even Better

We’ve come a long way in a short time and that is thanks to you, our customers. Cribl set out to listen to our customers and use that to guide us forward. Today we’re announcing Cribl Edge, a next generation agent designed to to scale your most precious commodity; you. We’re also announcing a name change to the product formally known as LogStream. Now, as with all our releases, it doesn’t stop there. We have some upgrades that all go towards allowing you to scale.

Video: How to configure and customize Grafana OnCall

Managing your on-call rotations just got a little less stressful. With Grafana 8.0, we introduced unified alerting, which centralizes alerting information into a single, searchable view. With the introduction of Grafana OnCall, an easy-to-use on-call management tool available in Grafana Cloud, you can now extend the alerting workflow in Grafana to ensure that the right notifications reach the right people at the right time using the right method.

Finding & Fixing Asymmetric Routing Issues

Asymmetric routing is a situation in which packets take one path to go from source to destination, but replies take a different path to return. Notice I called it a “situation” and not an “issue”? That’s because it’s not always a problem. It only becomes a problem where there’s something stateful in the path, like a NAT device or a firewall.

How to collect metrics and logs for NGINX using the OpsAgent

The Ops agent is Google’s recommended agent for collating your application’s telemetry data, and forwarding them to GCP for visualization, alerting and monitoring. The Ops agent collates logs and a metrics collector into one single powerhouse. Some of the key advantages of using the Ops agent are outlined below.

Partner Amplification - Logz.io Achieves AWS Security Competency

We’ve got some outstanding news to share in the arena of security partnerships: Logz.io® Cloud-based SIEM has officially achieved Amazon Web Services (AWS) Security Competency! This designation within the Logging, Monitoring, SIEM, Threat Detection, and Analytics category further demonstrates Logz.io’s proven commitment to delivering best-in-class security.

Instrumentation: Measuring Application Performance at the Code Level

Technology is taking over the world at a great pace. Everyone is getting in the loop of technology, whether it be a company, organization, startup, or even a 2-year-old kid. Most companies are using the latest technologies for updated features. They took their business online in order to remain in the current market. They make some applications for their base products and run their business through that website or application. But having an application is challenging.

Zenoss Core Sunset

Last week, we announced that we are sunsetting Zenoss Community Edition, which was previously called Zenoss Core. (The title of this blog post refers to it as Zenoss Core, as that was the name for most of those 15 years and is the name by which most people know it.) Zenoss Community Edition was a free, on-prem monitoring tool the company made available for over 15 years, which had been downloaded millions of times. Zenoss Community Edition version 1.0 was released Nov. 15, 2006.

New Monitoring Features and Dashboards in eG Enterprise

Small thoughtful, beautifully packed gifts are the best – just like our UI features Sometimes it is the small things that infuriate IT users the most – those tiny annoyances in a GUI that make you use some annoying and clunky workaround several times a day. Our eG Enterprise product management team understands this, and as our customers know, we are always willing to consider even small feature changes to improve usability.

Mitigate Industry 4.0 technology challenges with full-stack observability

Manufacturers accelerating Industry 4.0 are facing challenges associated with complex cloud-based systems and highly interconnected things. Full-stack observability provides the insights needed to ease transitions and deliver organizational agility required for growth.

Decoding the robust Azure architectures with fail-proof monitoring

Have you just begun your cloud journey to Azure by moving away from on-prem? If so, it’s always better to opt for the right set of patterns and strategies for developing and monitoring your applications on the Azure cloud. In this webinar, Tord Glad Nordahl, Microsoft Azure MVP, exclusively exposed the secret sauce for building and monitoring innovative cloud-native applications. Major topics covered,

Introducing Cribl Stream

It took THREE rounds of approvals to say what we’re about to say: We’re dropping a Log 😳. Yes, we said it: we’re dropping the Log in LogStream. Cribl LogStream is now known as Cribl Stream to reflect the enhanced functionality it delivers. LogStream already processed a lot more than just Logs, so it’s now known as Cribl Stream. Today’s announcement isn’t just about a name change, though.

Using Telegraf to send syslog metrics to Graphite

When you own and operate software, they generate various types of logs from disparate sources such as databases, servers, and applications. The metrics from these important digital assets are what companies monitor continuously. When they show you a sign of unreliability, companies need to take swift actions to fit the cause and prevent it from growing to a larger problem. The key to success in this activity is owning a good Syslog application and metrics software where you can clearly see metrics.

The Ugly Truth About (Most) Cloud Rightsizing Recommendations

Rightsizing is about finding the optimal cloud configuration options to ensure that you get the performance you need—within any given constraints you are operating under—at the lowest possible cost. This is a simple proposition, but deceptively so. For one thing, business requirements are constantly changing, meaning that your workloads must adapt to support them, which in turn changes their operating parameters.

What are Suspect Spans?

Suspect Spans surfaces a list of spans that correspond to where the most time in a transaction is spent. Instead of clicking into every trace in an attempt to identify the bad actor, check out the Spans tab or Suspect Spans section in every transaction summary and jump directly to the span that needs your attention. In this video, we dive more into what suspect spans are, and we go through a demo of how you can also use Suspect Spans as a complement to your performance monitoring.

Foolproof Cloud Monitoring: 6 Ways to Utilize the Tools at Your Disposal

The cloud offers unparalleled flexibility. However, that flexibility comes at a cost. The amount of moving pieces increases. The environment becomes more heterogeneous. So, if you want to stay on top of things, you need a more comprehensive view of your cloud infrastructure. After all, you don’t want your customers to realize that something has gone awry before your people do. In this post, I’m going to talk about cloud monitoring.

Performance Monitoring and more updates to Sentry for Electron

For those who aren’t that familiar with it, Electron is an open-source framework that allows developers to build cross-platform desktop applications in JavaScript. Some of the most popular desktop applications like VS Code, Slack, Discord, and Atom, are all built in Electron.

WebSocket Application Monitoring

WebSockets have been around for over a decade now, but the real-time web existed long before they came. This preceding ‘real-time’ web was typically slower and hard to achieve. It was attained by hacking available web technologies that were not primarily built for real-time applications. There was no solution with TCP/IP socket-style capabilities in a web environment that could address all concerns associated with operating in a web environment.

Observability Pipelines & AIOps can make IT Smarter

Enterprise data systems are like busy family households. You see a constant flow of activity to varying degrees from room to room. This activity includes people wandering, opening and closing doors. And then there are other streams constantly flowing through the household- electricity, water, Wi-Fi networks and more. In modern enterprises, the data deluge is a critical issue. While we take the complexity for granted in a household, such is not allowed in a connected enterprise.

Unified Serverless Observability With OpenTelemetry and StackState v4.6

StackState has always believed in the importance of open source and open standards, and we’ve demonstrated our commitment through ongoing support of open technologies. From the beginning, StackState supported StatsD and OpenMetrics. Even our agent is open source, designed to help organizations easily onboard our platform and to give them an extensible open way to observe their services. StackState is now proud to announce our next big open source step.

Nexthink + Qualtrics: A New View on the Employee Experience

These last two years have been eye opening for many business leaders and board room directors. With the “great resignation” in full effect, the importance of employee experience to overall business success is stronger than ever. In the simplest sense, the happiness and wellbeing of employees can make or break the organization. As a leader in the future of work Nexthink is thrilled to announce our partnership with Qualtrics.

Debugging Java Equals and Hashcode Performance in Production

I wrote a lot about the performance metrics of the equals method and hash code in this article. There are many nuances that can lead to performance problems in those methods. The problem is that some of those things can be well hidden. To summarize the core problem: the hashcode method is central to the java collection API. Specifically, with the performance of hash tables (specifically the Map interface hash table). The same is true with the equals method.

How relabeling in Prometheus works

Relabeling is a powerful tool that allows you to classify and filter Prometheus targets and metrics by rewriting their label set. The purpose of this post is to explain the value of Prometheus’ relabel_config block, the different places where it can be found, and its usefulness in taming Prometheus metrics. Much of the content here also applies to Grafana Agent users. For reference, here’s our guide to Reducing Prometheus metrics usage with relabeling.

Monitoring as code for modern DevOps teams

Software engineering teams that adopt “as-code” practices, like using configuration files and automated workflows instead of manual configuration and tools, gain major improvements in velocity. But even companies that enjoy the success of as-code practices for development and delivery lag behind in applying them to operational concerns like monitoring and observability.

8 UX Best Practices: to improve your design in 2022

Establishing your business’s online presence has become a fundamental need rather than a secondary step. In the wake of the 2019 pandemic, a large chunk of interaction, social and business, has escalated online. The user’s journey is as important as functionality to stand out in the crowded digital marketplace.

Nagios Alternatives: Migrating from Nagios to Sensu

If you have worked in any part of the application delivery lifecycle long enough, there’s a very good chance you have directly used or been a consumer of Nagios. For a period of time in the early 2000s, it was the go-to solution for basic server monitoring and alerts. Fast forward 20 years and you might be surprised how many organizations still rely on Nagios for mission critical workloads — although not without a fair amount of duct tape and super glue challenges.

Observability vs Visibility - what's the difference?

Observability is a new term that’s slowly entered the mainstream over the last two years. Today it’s used in the context of monitoring, but it’s much more than that. And it also goes way beyond visibility. So, in this blog, we set out to explore observability vs visibility and find out, what’s the difference? In a recent podcast, our friends at Riverbed neatly explained that seeing and observing are two different things, and can be compared to hearing vs listening.

Apica Quick Guides - Browser Check Configurations

Have you ever wondered what that one checkbox does, where that button takes you or what a specific function does? These quick guides are designed to explain every function as quick and precise as possible so you can continue your monitoring without any disturbance. This guide will explain all of the browser check specific configurations and their individual functions.

Icinga Feedback Week 2022 - The results

We started the Feedback Week, because we wanted to hear what your thoughts around the Icinga Universe are. Whether it is about Icinga as a company, the products or the Community: in order to know what you feel and think about Icinga, we can develop the best possible monitoring tool for you. So here we are, recently finished off the second Feedback Week with a total of nearly 1300 votes all over Social Media and the Community!

How to Get Started with Heroku Logging

Heroku is a platform for deploying, running, and managing applications, which is written in a variety of programming languages, including Python, Java, C#, JavaScript, PHP, and others. Heroku's goal is to free you up to focus on your applications rather than infrastructure management. Logging is usually included in infrastructure management. Heroku provides a high-level log maintenance tool. In this Heroku logging article, we'll learn how to get the most out of Heroku logs.

SRE Metrics: Four Golden Signals of Monitoring

SRE (site reliability engineering) is a discipline used by software engineering and IT teams to proactively build and maintain more reliable services. SRE is a functional way to apply software development solutions to IT operations problems. From IT monitoring to software delivery to incident response – site reliability engineers are focused on building and monitoring anything in production that improves service resiliency without harming development speed.

Observability versus monitoring in software development

To supervise the behavior of distributed applications and track the origin of service failures and downtime, developers often use traditional monitoring technologies and tools. However, this approach can fall short in its ability to measure the overall health of modern cloud-native architectures, which can span multiple hosting environments and encompass hundreds of microservices.

Why is Distributed Tracing in Microservices needed?

Microservices architecture allows technology companies to build application services around business capabilities. It enables rapid development and also boosts developer productivity. But it also introduces complexity. Troubleshooting and operating an internet-scale application based on microservices is hard. And that’s where distributed tracing comes into the picture. Traditional monolithic application architecture is easy to develop, deploy and monitor.

Uptime com Transaction Check Basics in less than 3 minutes

Get a lightning fast intro into setting up the Uptime.com Transaction check to monitor your important processes like login forms, contact submissions, and shopping carts. For more in-depth info, check out our detailed video on Transaction Check best practices, and using the Uptime.com Transaction Recorder for an easy, no-code approach to configuring your synthetic monitoring.

What is Kafka Monitoring?

Apache Kafka is a distributed messaging system that can be used to build applications with high throughput and resilience. It is often used in conjunction with other big data technologies, such as Hadoop and Spark. Kafka-based applications are typically used for real-time data processing, including streaming analytics, fraud detection, and customer sentiment analysis. There are many derivatives such as Confluent Kafka, Cloudera Kafka, and IBM Event Streams.

5 Top Tools for Application Security Testing: Features to Look For, Pros and Cons

When it comes to application security testing, choosing the tool best suited for the job is critical. There are so many various tools on the market that determining which one is best for your needs may be difficult. In this article, we will discuss 10 of the best testing tools and outline the features you should look for when making your decision.

Using InfluxDB as an IoT Edge Historian

InfluxDB is increasingly being used in IoT solutions to store data from connected devices. Now it can also be used on IoT edge gateways as a data historian to analyze, visualize and eventually transmit aggregated IoT data up to a centralized server. In this article we’re going to look at three simple ways you can connect an instance of InfluxDB on your IoT Edge device to another instance of InfluxDB in the cloud.

Meet VirtualMetric's VMware Monitoring

With #VirtualMetric VMware Monitoring, you can see all your #VMware clusters, vCenters, servers, datastore, virtual machines, resource pools, etc. You also get 360 degrees observability over your VMware environment. Get detailed statistics on processors, memory, storage usage and network information. VMware Monitoring is a Powerful and Easy To Use Server Monitoring and Management Tool​

Managed self-hosted Git service Gitea with the new integration for Grafana Cloud

Today I’m back with another integration available for Grafana Cloud, our observability platform that gathers all your metrics, logs, and traces under a single roof with Grafana. I’m going to highlight how you can use Grafana with Gitea, an open source forge software package for hosting software development version control. It uses Git as well as other collaborative features like bug tracking, wikis, and code review. It is a great choice for those who manage Git repositories.

Citrix Performance Monitoring: What, Why and How

Organizations in all verticals and sizes are deploying digital workspaces to offer secure, remote access to employees and partners – in many cases, across a wide area of networks. Citrix workspace technologies are the most popular form of digital workspaces. In this blog, we discuss what is Citrix monitoring, why it is important, and what tools you need to monitor Citrix infrastructures effectively to ensure optimal digital employee experience (DEX).

What is the Distributed Cloud?: The Hybrid Cloud Solution Driving the Future Workplace

“It’s in the cloud!” Whether you’re talking about workplace technology or struggling to explain file-sharing to a technologically-challenged relative, we’ve all been trying to learn the ins and outs of cloud computing over recent years. It’s no easy task, as cloud computing has been changing rapidly – which brings us to the latest evolution: the distributed cloud.

Proper Python Instrumentation: 5 Things to Keep In Mind

Python’s USP as a programming language is that it’s flexible, easy to use, and quick to get started and iterate with. These virtues have led it to become the most popular programming language in 2022 and be used by millions of developers. As Python applications continue to multiply and scale to cater to millions of users worldwide, instrumentation and monitoring tools play a role more crucial than ever – in building robust, performant software.

The Cost of Doing the ELK Stack on Your Own

So, you’ve decided to go with ELK to centralize, manage, and analyze your logs. Wise decision. The ELK Stack is now the world’s most popular log management platform, with millions of downloads per month. The platform’s open source foundation, scalability, speed, and high availability, as well as the huge and ever-growing community of users, are all excellent reasons for this decision.

Observability for the Software Industry: Top 4 Challenges and How Monitoring Can Help

Software companies face more challenges than ever before to keep pace in an ever-evolving industry that requires innovation and reliability. The pressure to adapt to fast-moving and sophisticated technologies can easily create silos amongst teams and a lack of visibility within organizations. With legacy technology stacks and systems now in the rearview mirror, the focus of large-scale software development has shifted to ensuring standardization and distributed computing in a growingly serverless world.

3 Metrics to Monitor When Using Elastic Load Balancing

One of the benefits of deploying software on the cloud is allocating a variable amount of resources to your platform as needed. To do this, your platform must be built in a scalable way. The platform must be able to detect when more resources are required and assign them. One method of doing this is the Elastic Load Balancer provided by AWS. Elastic load balancing will distribute traffic in your platform to multiple, healthy targets. It can automatically scale according to changes in your traffic.

Galileo Enhancements for Brocade Data Collection

Are you migrating to Azure or another public Cloud? A client of ours did, and they didn’t use Galileo Cloud Compass. Do you know what happened? They didn’t experience the savings they expected, and their Azure invoices were a lot more than expected—about 60% more. There are a couple of reasons for this. One is that they didn’t have an accurate way to price their workload for the Cloud. Just think about understanding the pricing model for each of the major Cloud vendors!

Why Right-Sizing in the Cloud is Everything

Are you migrating to Azure or another public Cloud? A client of ours did, and they didn’t use Galileo Cloud Compass. Do you know what happened? They didn’t experience the savings they expected, and their Azure invoices were a lot more than expected—about 60% more. There are a couple of reasons for this. One is that they didn’t have an accurate way to price their workload for the Cloud. Just think about understanding the pricing model for each of the major Cloud vendors!

Say goodbye to client network outages caused by configuration mishaps

Network downtime opens the gates to productivity loss and customer attrition and can affect business growth moderately or even severely. Usually, the reasons behind a network outage are the following: Errors in network endpoints: Things like a network bottleneck or a spike in temperature can interrupt a client’s network operations and then snowball into an outage. Operational slip-ups: According to research done by Uptime Institute, 70% of data center and service downtime is due to human error.

AIOps Lessons Learned: Be Careful When Selecting a Vendor

The concept of AIOps is simple: Infuse artificial intelligence(AI) into IT to make operations speedier and more efficient. In theory, AIOps at its best should lead to an autonomous IT environment in which functions can run themselves with little or no human intervention. In practicality, the path to this nirvana state is anything but straightforward and raises several questions. Where should you start? How do you measure the value? Is AI ready to scale across production environments?

Interview with Tom Granot - Developer Observability, KoolKits and Reliability

In preparation for the upcoming Developer Observability Masterclass we’re hosting at Lightrun with Thoughtworks and RedMonk, I sat down for a brief interview with Tom Granot – the Director of Developer Relations at Lightrun. Tom will MC the event as he did for the Developer Productivity Masterclass we ran back in December.

A Monitoring Reality Check: More of the Same Won't Work

On December 7, 2021, Amazon’s cloud services recently suffered a major outage that not only affected Amazon services, but also many third-party services we use day-to-day, including Netflix, Disney+, Amazon Alexa, Amazon deliveries and Amazon Ring. Causes for the outage, which began at 7:30 am PST and lasted nearly seven hours, were detailed in a Root Cause Analysis report published by AWS that shed light on factors that may have contributed to the extended length of the disruption.

Eliminate Network Blind Spots with SL1 and Restorepoint

ScienceLogic customers are well-aware of the challenges inherent with the growing complexity of their IT infrastructures. That’s why you’re making investments in digital transformation and adopted SL1 as your AIOps platform of choice for your IT operations monitoring and management strategy.

Taking care of your loved ones with Grafana and other open source solutions

Amon Reich is the founder of SmartLiving.Rocks based out of Schweinfurt, Germany, an IoT solutions provider for smart homes and small businesses. Amon maintains the open source SeniorenSmarthome project, which enables Ambient Assisted Living through Grafana dashboards and other open source technologies. I’ve been working in the field of smart technology for over 10 years.

Cloud Application Performance Monitoring

Source: APM in the new normal, survey by eG Innovations and DevOps Institute Cloud adoption is increasing at a rapid pace. The eG Innovations & DevOps Institute APM survey indicates that 88% of organizations are using at least one form of cloud technology. Organizations move to the cloud for agility – they can deploy and have applications running in the cloud in minutes. Cloud computing also offers options for high availability, automated backups, and such.

startSpan vs. startActiveSpan

TL;DR: startSpan is easier and measures a duration. Use it if your work won’t create any subspans. startActiveSpan requires that you pass a callback for the work in the span, and then any spans created during that work will be children of this active span. I’m instrumenting a Node.js app with OpenTelemetry, and adding some custom instrumentation. For this important activity that I’m doing (let’s call it “retrieve number”), I’m creating a custom span.

Splunk Beyond Logs: Getting to Observability

Those of us of a certain age know well the saying “Nobody got fired for buying IBM.” In the log analysis and security world, we’ve become lucky to get to the point where people are saying “Nobody gets fired for buying Splunk.” Our success in these areas has definitely created a perception for what products Splunk has and what we can offer to our customers. The problem is that most of these perceptions don’t capture the full power of Splunk.

How Monday.com Accelerates Time to Triage with Code Observability

Monday.com was on a mission to better aggregate and manage server errors for their monolith backend. But, what started as a minor change turned into a “life-changing decision”—their words, not ours—to incorporate a whole new workflow for frontend, backend, and soon mobile. Join Software Engineer Roni Avidov as she explains how Monday.com started monitoring their client-side app alongside their backend to quickly uncover blindspots and accelerate time to resolution by nearly 20 minutes per issue.

How to Optimize Your WordPress Site With Pingdom Real User Monitoring

To keep their applications and websites available and accessible, today’s businesses put a lot of emphasis on infrastructure monitoring to ensure their servers are healthy and running. Amid the hustle, many companies overlook the need to monitor a different aspect of their application: user experience.

Network AF, Episode 11: The art of connecting with PacketFabric's Jezzibell Gilmore

In episode 11 of Network AF, Avi talks with Jezzibell Gilmore, co-founder and chief commercial officer (CCO) of PacketFabric. Jezzibell is a powerful woman in networking who is modernizing and paving the way for infrastructure in the digital universe. In the conversation, she shares how she’s weaving together technology, business drivers and cutting-edge innovation, while keeping her foot firmly on the ground.

Learn How Tanzu Observability Helps OpenShift Users Manage the Grafana Licensing Change

Grafana Labs recently announced that they are relicensing their core projects from Apache 2.0 to Affero General Public License (AGPL) version 3. This is great news for the open source community, since the new license is still Open Source Initiative–approved and adheres to an additional clause in which network access of any AGPL-licensed software counts as a type of distribution.

Finding value through problem localisation

As the tech ecosystem continues to swell and digital solutions become a beacon of productivity and profitability for companies across industries, development speed and agility are becoming differentiators — even for small companies on the outskirts of the ever-expanding tech fog. Here's how problem localisation can minimise your change failure rate, boost your deployment frequency, and create a stable and usable environment across your tech stack.

Advanced Network Traffic Analysis and Enhanced Security in the New, Easier-to-Use-Than-Ever WhatsUp Gold 2022.0

Part of our job here at Progress is to not only look at the challenges our customers are currently facing, but also to look at the technology landscape in general and consider the challenges they’re going to be facing. We don’t have a crystal ball, of course, but there are a plenty of trends that are obvious to even the most casual observer. Security has always been important, but the last year has made it clear exactly how important it is for every organization with a network.

Returning to the Office? Learn Best Practices to Support ITOps

Whether you accept it or not, the return to the office is happening. Companies like Microsoft, Google, Apple, Goldman Sachs, and Citigroup have asked their employees to resume operations on-site. Overall, the number of Covid cases in the US has steadily declined. And many states are now lifting their mask mandates. So sooner or later, more businesses will follow a similar suit, including yours.

Real-time distributed tracing for .NET Lambda functions

In 2020 we released distributed tracing for AWS Lambda functions written in Python, Node.js, and Ruby, providing you with health and performance insights across your serverless applications. Since then, we’ve expanded our support to additional Lambda runtimes such as Java and Go, and are pleased to announce that real-time distributed tracing is now also available for.NET Lambda functions.

Stay secure: Enhanced SAP HotNews integration in Avantra 21.11.4

Securing your SAP environments is critical to the operational success of your business. And SAP does a great job of trying to stay ahead of any vulnerabilities in their solutions by offering HotNews. As critical vulnerabilities are discovered, SAP weights their critical quality, declaring a level of severity and attributing a score - 10 being the most critical - along with a description and resolution of the patch.

The Uptime.com monitoring guide for launching your website

Websites are an amenity packaged as a utility. If your site isn’t reliable and fast, users have options to replace it and they probably will. How you build your site and how you maintain it impacts your site’s success. Regardless of whether you’re launching your website, relaunching, or even rebranding, you don’t have to break the wheel to build a solid website that can withstand the traffic and growth of your enterprise.

What does Pinal Dave think of SQL Monitor?

Last week we had the pleasure of speaking to SQL Authority’s Pinal Dave to show him some of our favorite SQL Monitor features! Pinal has used – and been a fan of – SQL Monitor since it launched in 2008 (fun fact: it was named SQL Response back then). There are, however, some newer features that Pinal isn’t too familiar with, and we were delighted to introduce those to him.

System Monitoring for AWS EC2 Cloud Instances with AWS CloudWatch

In this blog post, I follow on from my previous blog on AWS CloudWatch Part 1 of 2 to explore how you can go beyond basic agentless CloudWatch monitoring by deploying the CloudWatch agent and some of the key information and planning you need to do this. I’ll also cover how eG Enterprise offers out-of-the-box functionality to avoid complex JSON scripting or tooling to implement monitoring.

Building an Always-on Business Leaves No Room for Downtime

As is often the case with digital products, your users could be experiencing issues you might not be aware of. The unknown unknowns could include random bugs or memory leaks slowing down performance and, in many cases, those issues aren’t reported… folks just bail. If uptime is a core tenet of your business success, unreported issues and users moving on to the next best thing isn’t an option.

How To Get Buy In To Support Your Observability Efforts

We’re well into 2022, and it’s full steam ahead addressing challenges and moving IT and SRE projects to completion. Are you ready for the challenges ahead of you? Do you feel prepared to handle the work you know about…and the work that’s sure to come your way? Are you ready for the end-of-the-year budget planning process that will be here before you know it? To help, I’d like to share my learnings from 20+ years in IT.

Honeycomb Terraform Provider Now Officially Supported by Honeycomb

Previously announced as a community-led project, the Terraform provider for Honeycomb is now officially maintained by Honeycomb in partnership with Hashicorp. We recognize how valuable supporting configuration as code is for our customers, and this change in ownership affirms our commitment to ensuring your ability to quickly make the most of Honeycomb’s Management API.

A deep dive in public relations with Ilissa Miller | Network AF Episode 13

In this episode of Network AF, the network engineering podcast: Ilissa Miller, CEO of iMiller Public Relations, joins Avi to talk about her start working in infrastructure communications, and her advice for companies interested in engaging more with her customers and media.

Distributed Systems and the 21st century

At the end of the last century I had the opportunity to help in a very ambitious computer project: the search for radio messages emitted by extraterrestrial civilizations… And what the hell does it have to do with Distributed Systems? Recently my colleagues wrote an interesting article on distributed network visibility, which I really liked and I came up with the idea of taking it to the next level.

Passing Grade: Top 3 Recommendations for School Network Administrators

Digital demands from students and staff to keep pace with modern education methods has put a lot of pressure on IT operations. Consistently delivering quality service for all is a tall order. Technology advancements, IoT proliferation, device demand, lower cost, and ease of wireless access means supporting all the things is the job facing education network administrators.

Hybrid Infrastructure Monitoring and Alert Management on a Single Platform Made the Difference for Liquid IT

Liquid IT is a New Zealand IT services company that delivers cybersecurity, network connectivity and integrated workspace services to government and corporate clients in New Zealand. The firm manages a hybrid IT environment for its customers that includes Microsoft Azure cloud services, Cisco Meraki cloud network services, plus on-premises compute, storage and networking from the likes of VMware, Dell EMC, Nutanix, AeroHive and Palo Alto Networks.

What is Application Performance Management (APM)?

Your business can live or die based on the performance of the applications through which your customers access it. For a large company, downtime can mean thousands of dollars in lost revenue. Problems arising that make your application unusable could push away new customers who must instead move towards solutions who have stayed on top of their problems. In this digital world, most businesses highly depend on software to monitor and support various elements.

Running Tracealyzer 4 on Linux hosts

To run Tracealyzer 4 on Linux, the first thing you will need to install is Mono. For most distributions there’s a package called “mono-complete”, though some distributions and package systems may instead use simply “mono”. There may be additional requirements, in particular for Debian/Ubuntu and Fedora based systems. See below for distribution specific instructions. Mono version 5.14 (or newer) is required for Tracealyzer.

Financial Services Network Challenges: Compliance, Security and Availability Top Concerns

Financial services firms face three key network issues: maintaining compliance with an array of regulations, keeping a growing horde of financial data hungry hackers at bay, and earning the trust of users with an always-on responsive network. Financial data is so valuable, cybercriminals make getting it a top priority. And financial services networks are so interconnected and complex, there are all sorts of ways hackers can try to break in. The security threat to finance is more than bad.

How to Optimize Cloud Monitoring Costs Using Flow Logs in Progress Flowmon

This blog post discusses some of the best practices for balancing the costs of cloud traffic monitoring while maintaining a reasonable level of visibility. Progress Flowmon 12 has introduced the processing of native flow logs from Google Cloud and Microsoft Azure, plus it has enhanced support for Amazon Web Services (AWS) flow logs.

APM is Legacy. Distributed Tracing is Designed for Modern Teams

Some background. Having implemented at least 20 or more APM systems in production as an end-user at various companies, and both deployed and managed countless monitoring tools outside APM, I understand the role of the practitioner. Later on, I shifted to Gartner and led the APM Magic Quadrant for four years, finally spending another four years at AppDynamics (operating under Cisco after two years).

Debugging Race Conditions in Production

Race conditions can occur when a multithreaded application accesses a shared resource using over one thread. Unless we have guards in place, the result might depend on which thread “got there first”. This is especially problematic when the state is changed externally. A race can cause more than just incorrect behavior. It can enable a security vulnerability when the resource in question can be corrupted in the right way. A good example of race condition vulnerabilities is mangling memory.

Introducing StackState 4.6: Harnessing the Power of Topology + Telemetry + Traces + Time

Companies depend on observability insights to provide reliable online services to their customers. To support their efforts, StackState is proud to announce a new version of our unique topology-powered observability software, StackState v4.6, available now. This new version brings powerful new capabilities to DevOps and SRE teams who need to maintain a deep understanding of how their stack is behaving to meet their SLOs.

Observability for AWS Fargate Deployments Powered by Graviton2 Processors

Today, cloud native technologies empower a number of organizations to build and run scalable applications in public, private and hybrid cloud environments. Developer and operation teams can build and deploy applications, APIs and microservices architectures with the speed and immutability of containers. Gartner predicts that by 2024, more than 75% of large enterprises in mature economies will be using containers in production.

Start with Python and InfluxDB

Although time series data can be stored in a MySQL or PostgreSQL database, that’s not particularly efficient. If you want to store data that changes every minute (that’s more than half a million data points a year!) from potentially thousands of different sensors, servers, containers, or devices, you’re inevitably going to run into scalability issues. Querying or performing aggregation on this data also leads to performance issues when using relational databases.

Linking Employee Happiness and Productivity: What is IT's Role?

It’s no secret that employee happiness and productivity often go hand-in-hand. But just how much impact does an employee’s happiness have on productivity – and is this a question that should concern IT leaders? It’s not exactly news that employees want to feel happy at work – but during the Great Resignation, we’re now seeing what lengths they’re willing to go to when they’re unhappy with their current employers.

How We Monitor Elasticsearch With Metrics and Logs

As an architect at SolarWinds, it's essential to work with our own monitoring tools as a form of quality control and source for innovation. As one of the largest players in the IT monitoring and management world, we're always thinking about ways to make it seamless for customers to work across our suite of tools. One of those tools I'll focus on today is SolarWinds® Loggly®—our log management and analytics product, which is also a part of our APM integrated experience.

What is Garbage Collection in Java?

For many, the world of Java is shrouded in mystery and endeavor. One such endeavor is garbage collection. There is many a viewpoint on garbage collection – whether it is good or bad, when is it needed, how often should it run, how to tune garbage collection operation, how to know when it is not operating as expected, and so on. In this educational post, we will try to clear the air on Java garbage collection and make it easy for developers and administrators to deal with it.

Threads in Java

A thread, in the context of Java, is the path followed when executing a program. All Java programs have at least one thread, known as the main thread, which is created by the Java Virtual Machine (JVM) at the program’s start, when the main() method is invoked with the main thread. In Java, creating a thread is accomplished by implementing an interface and extending a class. Every Java thread is created and controlled by the java.lang.Thread class.

Swarm Support Model

From a customer’s standpoint, it is always agonizing to wait for the resolution of a complaint about the product or service we have bought from a company. None of us would want to hear, “We have escalated your concern to our seniors; your patience is highly appreciated.” Let us switch to the other side of the table. Most organizations rely on a tiered approach to resolve an issue from a support perspective.

How to Implement Global View and High Availability for Prometheus

Ensuring that systems run reliably is a critical function of a site reliability engineer. A big part of that is collecting metrics, creating alerts and graph data. It’s of the utmost importance to gather system metrics, from several locations and services, and correlate them to understand system functionality as well as to support troubleshooting.

How to Detect Network Congestion | Obkio

Network Congestion, similar to road congestion, occurs when your network cannot adequately handle the traffic flowing through it. While network congestion is usually temporary, it can cause annoying network problems that can affect performance and can be a sign of a larger issue in your network. That's why you need an end-to-end monitoring tool to help you proactively detect network congestion. Obkio Network Monitoring continuously monitors end-to-end network performance from your local network (LAN, VPN), as well as third-party networks (WAN, ISP, and Internet Peering).

Application Discovery with DX Unified Infrastructure Management

Having any form of application discovery can be of great benefit. With these capabilities, you can determine what is deployed within your infrastructure and better understand what monitoring to apply to each device. When you know which applications are running within your environment, you can group devices by their associated applications.

Getting Started with C++ and InfluxDB

While relational database management systems (RDBMS) are efficient with storing tables, columns, and primary keys in a spreadsheet architecture, they become inefficient when there’s a lot of data input received over a long period of time. Databases designed specifically to store time series data are known as time series databases (TSDB). For example, an RDBMS might look like this.

Jitter vs Latency - What are the Differences and Why Those Things Matter

The jitter and latency are the characteristics related to the flow in the application layer. Jitter and latency are the metrics used to assess the network's performance. The major distinction between jitter and latency is that latency is defined as a delay via the network, whereas jitter is defined as a change in the amount of latency. Increases in jitter and latency have a negative impact on network performance, therefore it's critical to monitor them regularly.

Optimize your signal with Rollbar

Learn some helpful tips on reducing noise and improving the value of your Rollbar events. In this session, we will be discussing the different ways to improve your signal in a dual manner. First, we'll review the ways to improve and modify your item grouping, and then we'll discuss how to reduce noise inside your stack traces and boost the value of your individual events by enhancing their content.

We think Grafana Labs has built something special - and two prestigious lists agree

We have always thought of our organization as special. Our plans were never to build a traditional business, and we know we have a unique culture. But it is nice when others outside of our company recognize that Grafana Labs is something special, too. This week, we were excited to be included on two very prestigious lists: The Enterprise Tech 30 and America’s Best Startup Employers.

Dashboard Fridays: Log Analytics VM Updates

Join Adam Kinniburgh in this latest Dashboard Fridays episode, in which he showcases this VM Updates dashboard, built with the WebAPI tile for the CE and SCOM editions. Since there is a native connection to the Log Analytics workspace in the Azure edition, here it is created using the native Logs tile. This dashboard surfaces key metric data for Virtual Machines, regardless of where the servers are hosted.

Introducing New Storage Dashboards in the Cloud Monitoring Console (CMC)

Monitoring and gaining additional insights about usage of your Splunk Cloud Platform deployment is essential for effective management as a Splunk admin. Your Splunk Cloud comes with the Cloud Monitoring Console (CMC) app, which displays relevant information about the status of your Splunk Cloud environment using pre-built dashboards.

The Virtual Experience: 3 Future-Facing VDI Use Cases

In today’s world of flexible work, technology has to be one thing above all else: versatile. A decade ago, virtual desktop infrastructures (VDI) wouldn’t have been able to meet this moment – VDI used to be too narrow and restricted in terms of what use cases it could service. Fortunately, VDI has evolved dramatically since those days. Now, with the right resources, there’s very little you can’t accomplish through virtual desktops.

What Is Log Retention?

The idea of paying money to store logs nobody is looking at may seem like a waste. Well, that is until you need those logs. At that point, you see how valuable log retention is, especially if there’s a security or compliance issue. When you prioritize log retention, you can look back to investigate an incident or provide data for an audit — especially when you centralize log and metric data in one platform.

Running VictoriaMetrics on ARM-based processors

ARM processors become more popular and more cost-effective according to many benchmarks. One of them was made by Percona for MySQL. Some of our users reported issues with VictoriaMetrics at AWS Graviton instances. The main concerns were higher CPU and disk IO usage compared to x86 instances of the same size and for the same workload. By that time, we verified that VictoriaMetrics works fine for raspberry and IoT devices, but didn’t do any optimizations for ARM builds.

What is IBM Cloud Pak for Integration?

IBM Cloud Pak for Integration (CP4I) is a platform that helps you quickly and easily integrate your hybrid cloud applications with the systems and applications that are important for running your business. It can help to collaborate between the different application teams and businesses that exist in your organization and ensure that they are working together at maximum efficiency.

Germany's most popular direct bank ING counts on Icinga

We´re proud of our many customers and users around the globe that trust Icinga for critical IT infrastructure monitoring. That´s why we´re now showcasing some of these enterprises with their Success stories. It´s stories from companies or organizations just like yours, of any size and different kinds of industries. Some of them are our long-standing customers, others have just recently profited from migrating from another solution to Icinga.

Detect Broken Forms

Broken forms can have a massive impact on businesses. Detecting problems as fast as possible is essential in ensuring you are providing the best service to your users and preventing major company issues. Does your website rely on forms in order to process payments, capture lead data or provide search capabilities? Do you have an ordering system trusted by hundreds of people to complete their daily jobs?

SaaS Observability Done Right

SaaS (software as a service) is the common model for many businesses today. Even longstanding behemoths such as Cisco and Microsoft have been strategically shifting their software products to SaaS and recurring revenue models (just think Office365 shift from licensed Office). These SaaS businesses need agility to move fast and remain competitive. This means agility in the IT stack, but also agility in the business models to support bottom-up GTM and product-led growth (PLG).

New Process for Exception and Performance Incident Emails

Before today, we would send an email for every Performance or Exception incident if a notification rule was triggered. This meant that if you deployed a new version of your app and your incident notification settings were set to "first in deploy", you'd get two emails if two errors occurred.

Help Yourself to Splunk Knowledge

How do I…? During your time as a Splunk customer, you will begin many of your questions this way. Our products have a lot of features to grasp, a lot of flexibility to master, and a lot of power to help you solve your business problems. Learning how to get the maximum value out of our capabilities can take some time. That is why there are dedicated groups of Splunk knowledge workers creating content to help you take advantage of opportunities quickly.

What's new in Avantra 21.11.4

As Product Manager for Avantra it gives me great pleasure to announce that our newest version, Avantra 21.11.4 is now available, packed full of new features and a few enhancements our customers have been asking for. Let’s dive into a few of the newest features that are the most exciting. As always, our release notes are publicly available here.

Best practices for alerting on Synthetic Monitoring metrics in Grafana Cloud

Ever wonder what your application looks like from the “outside in”? Synthetic monitoring can give you a global overview of your application from your customer’s point of view, observing how systems and applications are performing by simulating the user experience. One tool to help achieve this is the Synthetic Monitoring app, which is a blackbox monitoring solution available in Grafana Cloud. You can use Synthetic Monitoring to monitor your services from all over the world.

How to aggregate your Metrics using MetricFire

This article covers such a popular topic as using aggregation rules for metrics. We will learn why it is important to use aggregations and what tools exist for working with them. Also, we will explore all the benefits of using MetricFire's Hosted Graphite solution to store, process, analyze and monitor your metrics.

SQL Sentry Advisory Conditions Tutorials

As a customer success engineer, I hear directly from customers about what issues they need help solving with SolarWinds SQL Sentry®. This insight first motivated me to create posts for the SQL Sentry Tips & Tricks series, which explains how to get the most out of SQL Sentry. Now, I’d like to create more tutorials about what I consider to be one of the most powerful features in SQL Sentry: Advisory Conditions.

How to monitor RabbitMQ logs and metrics with Sumo Logic

As organizations have moved toward a microservices design pattern, the need for reliable and performant solutions that enable decoupled services to communicate with one another has grown. RabbitMQ is an open-source message broker designed for this purpose. We’ll discuss what RabbitMQ is, how it works, why it needs to be monitored and how Sumo Logic can effectively do this.

Why Uptime.com Chose Apdex as a Performance Monitoring Standard

Early Twitter was an adventure. Every day was an open question: would you be able to log in or did the next big story crash the platform? It was taking off and crashing and flying and crashing again. All in real time. It was an exciting time for the internet, and while everything has changed since then it got us thinking: why did we used to tolerate stuff just not working? And why do we still tolerate stuff not working?

AppScope 1.0: Changing the Game for Infosec, Part 1

This is one of a series of blogs in which we introduce AppScope 1.0 with stories that demonstrate how AppScope changes the game for SREs and developers, as well as Infosec, DevSecOps, and ITOps practitioners. In the coming weeks, Part 2 of this post will tackle another Infosec use case. If you’re in Infosec, at some point you’ve doubtless had to vet an application before it’s allowed to run in an enterprise environment.

Hot Storage vs. Cold Storage

When it comes to data storage, all data isn’t equal. After all, the data you use daily doesn’t need the same level of protection or ease of access as long-term hot storage vs. cold storage backup. A large percentage of a business’ data remains unleveraged due to data management and security challenges, which highlights the need to implement a data storage strategy.

Why is website monitoring so important?

Have you found yourself asking this question when seeing website monitoring solutions flash up on Google? Has your dev team been trying to convince you to get a monitoring tool but you’re not sure what the benefits are? Don’t worry, I’ve compiled a list of the top reasons why website monitoring is so important to you and your website. But you don’t have to take my word for it, read on and find out.

How to Detect Memory Leaks in Java: Common Causes & Best Tools to Avoid Them

There are multiple reasons why Java and the Java Virtual Machine-based languages are very popular among developers. A rich ecosystem with lots of open-source frameworks that can be easily incorporated and used is only one of them. The second, in my opinion, is the automatic memory management with a powerful garbage collector. The Java garbage collector, or in short, the GC, takes care of cleaning up the unused bits and pieces.

Building Digital Platforms for Adaptive Resilience: Looking Inside Gartner Predicts 2022

With digital service expansion putting pressure on IT, 'Gartner Predicts 2022: Build Digital Platforms for Adaptive Resilience’ is a helpful guide for I&O leaders with their sights on 2025. If you looked up “tech trends” right now, how many search results would you expect to see? 100 million? 500 million? Think again. Between blog posts, research reports, and news articles, you’d actually find roughly 1.5 billion search results.

A Complete Guide to Node.js Process Management with PM2

Process management refers to various activities around the creation, termination, and monitoring of processes. A process manager is a program that ensures that your applications always stay online after being launched. Process managers can prevent downtime in production by automatically restarting your application after a crash or even after the host machine reboots. They are also useful in development: they auto-restart an app once its source files or dependencies are updated.

How APA-Tech Uses Observability to Make Sense of Tons of Monitoring Data With Georg Höllebauer

APA-Tech is a managed service provider in Austria, responsible for all IT services within the Austrian Press Agency - Austria's national and largest press agency - as well as other customers. In this video, Georg Höllebauer, Enterprise Metrics Architect at APA-Tech, explains how he and his team use topology-powered observability to make sense of all the monitoring data they were collecting and get a better overall picture of their customer's IT environment.

InfluxData Launches InfluxDB University

Live and on-demand trainings simplify application building for faster Time to Awesome™ SAN FRANCISCO – March 9, 2022 – InfluxData, creator of the leading time series platform InfluxDB, today announced the launch of InfluxDB University (InfluxDB U), an online education platform for customers and developers working with time series data.

Class is in Session - Announcing InfluxDB University

At InfluxData, it’s no surprise that we are passionate about time series data. Our team is committed to helping our community understand its capabilities and sharing easier and more efficient ways of working with InfluxDB, Telegraf and Flux. Our end goal is always to deliver faster Time to Awesome™ for our users. To this end, we’re excited to announce the launch of InfluxDB University.

11 DevOps Best Practices You Should Know to be More Productive

Software engineering teams are continually seeking methods to improve and speed up the software development process. DevOps, an engineering methodology that brings development and operations together, is one popular strategy. Development and operations teams are frequently isolated in traditional engineering companies, which can lead to conflict between these two vital arms.

How We Monitor Elasticsearch With Metrics and Logs

As an architect at SolarWinds, it’s essential to work with our own monitoring tools as a form of quality control and source for innovation. As one of the largest players in the IT monitoring and management world, we’re always thinking about ways to make it seamless for customers to work across our suite of tools. One of those tools I’ll focus on today is SolarWinds® Loggly®—our log management and analytics product, which is also a part of our APM integrated experience.

Backup and recovery procedures

Backup copies are a practically mandatory prevention method in any environment in order to have the most critical elements secured against possible damages or information loss. Therefore, today we bring you this video where we are going to see how to make backup copies of the main elements of Pandora FMS and how to recover them in a very simple way.

It's Time for Observability Built for a Digital World

Today marks the start of a new chapter at Catchpoint, as we launch our digital experience observability platform. In this post, I’ll share with you some of the wider contextual factors driving this launch, as well as how the continuous evolution of our platform supports a massive market need.

How to monitor your Apache Spark cluster with Grafana Cloud

Here at Grafana Labs, when we’re building integrations for Grafana Cloud, we’re often thinking about how to help users get started on their observability journey. We like to focus some of our attention on the different technologies you might come across along the way. That way, we can share our tips on the best ways to interact with them while you’re using Grafana Labs products.

Splunk UI and the Dashboard Framework: More Visual Control Than Ever

If you attended.conf21, or followed any Splunk blogs by Lizzy Li for the past year, then you likely have heard of Splunk Dashboard Studio — our new built-in dashboarding experience included in Splunk Enterprise 8.2 and higher and Splunk Cloud Platform 8.1.2103 and higher. With new, beautiful visualizations and the ability for more visual control over the dashboard, our customers and Splunkers alike have been creating beautiful and insightful dashboards to turn data into doing.

Anodot Cloud Costs vs. CloudHealth by VMWare

We are often asked what’s the difference between Anodot and CloudHealth. Since both platforms offer cloud cost management solutions, the differentiation might be unclear. In this article, we’ll quickly clarify what each platform is built for, and why — despite some overlaps in features — these are two fundamentally different creatures.

Case Study: What a Migration to DX APM SaaS Looks Like

If you’re running earlier versions of Application Performance Management (APM), including version 10.7, on-premises and considering upgrading to DX APM SaaS, you’re undoubtedly curious what the migration process might look like. In this blog post, I’m going to share the story of one of Broadcom’s Fortune 50 customers and how they successfully migrated more than 30,000 production agents while navigating time constraints around their busy holiday season.

Prefix Premium - Profile, Test & Fix Code As Your Write It

Like all of us today, I’m buying more and more products and services online. But even the slightest hiccup in my digital experience might cause me to switch vendors. Multiply that risk by millions – the result of digital commerce growing at an exponential rate – and it’s easy to see how bad user experiences could literally sink a company.

End User Experience Monitoring: Why You Need It

These days, a new venture’s success begins and ends on customer experience. Due to a large number of similar digital products out there, end users will stop using a product immediately if the user experience is not seamless. Similarly, a mobile application delivering a substandard user experience (UX) will fall behind the competition. A good digital product can turn into a failure if it does not work for the end user.

Enhanced monitoring for your Azure Logic App

Implementing a business process can be challenging because you typically need to make various services work together. Think about everything your company uses to store and process data. How do you integrate all these products? Azure Logic Apps gives you pre-built components to connect to hundreds of services. You use a graphical design tool to put the pieces together in any combination you need, and Logic Apps will run your process automatically in the cloud.

Percepio DevAlert - The Device Feedback Loop

What if IoT device developers could be notified about real-world issues in IoT devices automatically and get detailed diagnostics on the very first occurrence? This is provided by Percepio DevAlert, a novel cloud service that gives real-world feedback about issues in the device software, that allows for rapid continuous improvement and for embracing DevOps in IoT device development. Learn from real-world usage and make a great product that beats the competition.

Keeping Federal and Local Government Networks Safe Through Monitoring

It is always big news when governmental organizations are attacked. And they are attacked frequently. Hackers love headlines, which is one reason to go after high profile government targets. But the real reason hackers love governmental organizations is because that’s where the juicy data is. Even small governmental organizations hold confidential and classified information—exactly the secrets state-sponsored groups and other cybercriminals drool over.

Webhooks for Raygun Alerting - Create custom third-party integrations

Since the introduction of Alerting to Raygun in late 2021, development teams have had more visibility into emerging issues than ever before. While the initial solution enabled you to get alert notifications by email, we knew that the next step was to give you more control over where you receive alerts.

Why You Need APM-and How it Works

There’s a lot to consider when engineering and implementing software, whether as an update patch or a newly-introduced product. End users have certain expectations when introduced to new or updated software—at the top of the list are aesthetics, ease of use, stability, and response time—the last two of which can be significantly improved when you employ application performance management or APM.

What Is Prometheus?

Prometheus is a metrics-based monitoring and alerting stack that is purpose-built for handling metrics generated by dynamic cloud environments such as Kubernetes. Another example target for Prometheus to gather metrics from could be a web app or an API server. As well as its use as an alerting and monitoring stack, Prometheus is also able to perform addition, average metrics over time and add and multiply any time-series data ingested.

Rollbar Certified for SOC 2 Type II and SOC 3

We are pleased to announce that an independent service auditor has certified that Rollbar meets SOC 2 Type II criteria and also SOC3 criteria. This extends our security, data privacy, and compliance certifications. Last year Rollbar was certified to meet SOC Type I to go with our existing ISO 27001 certification. The ISO 27001 standard promotes continuous improvement of security processes and demonstrates our commitment to customer support, customer excellence, and data privacy and security.

Read: 2021 Gartner 2021 Market Guide for AIOps Platforms by Gartner

We are excited to be named a Representative Vendor in the domain-agnostic AIOps platforms market in the 2021 Gartner Market Guide for AIOps Platforms. We believe that this validates our unique approach to delivering an observability solution that accelerates your success with: Read our summary of this Gartner research report, below. If you haven’t noticed, AIOps is taking off.

From Monitoring to Observability - Any Size, Anywhere

As you respond to a changing IT landscape, you must update your approach to supporting your organization’s services. We believe 2022 will be the year of Observability. While Observability means many things, SolarWinds can help you understand how it can transform your team and organization. Please join Rohini Kasturi, EVP and Chief Product Officer, and Richa Dhanda, VP of Product Marketing, as they share how SolarWinds is evolving our product portfolio to deliver comprehensive visualization into full-stack solutions for hybrid IT and cloud environments. Rohini will also preview some exciting developments coming in 2022.

APM's Evolution to Observability

The evolution of IT, the changing requirements for ensuring availability, the performance of critical applications, and the performance of the underlying infrastructure—no matter where it’s running—are all part of the story of the application performance management (APM) story. Still, the transition and need for observability are much more than this.

Communication Breakdown: Deploying Datadog and New Relic Across Teams is Unwieldy

As an industry analyst at Gartner, we would often discuss whether people were in a centralized or decentralized cycle. In business, it’s normal to investigate options for creating innovation and moving quickly, or focus on reducing cost and optimizing teams and technologies.

The Who, What and Where of Delivering Microsoft Teams Mobile Calling Service Quality

Microsoft Teams users, whether they use peer-to-peer calls, meetings, or PSTN Telephony very often switch from their laptop to their mobile and expect reliable call quality on all devices. Monitoring and ensuring call quality for Microsoft Teams users, in general, is already challenging at times. But when it comes to Microsoft Teams mobile calling, there is added complexity for IT teams.

Getting Started with Splunk on Google Cloud

In April 2021, Splunk launched Splunk Cloud on Google Cloud. Since then, a large and growing number of integrations, applications, tools, and solutions have been created to enable or enhance use cases across data protection, productivity, safer remote working and other security visibility needs. We’ve highlighted a few of the more noteworthy additions below for any current or prospective users of Splunk Cloud on Google Cloud.

On the Brittleness of Dashboards

Dashboards are one of the most basic and popular tools software engineers use to operate their systems. In this post, I'll make the argument that their use is unfortunately too widespread, and that the reflex we have to use and rely on them tends to drown out better, more adapted approaches, particularly in the context of incidents.

Elastic Observability 8.1: Visibility into AWS Lambda, CI/CD pipelines, and more

Technologies such as serverless computing frameworks and CI/CD automation tools help accelerate software development lifecycles (SDLC) to give development teams a competitive edge in the marketplace. Armed with these technologies, teams can deploy and innovate faster and more frequently by automating repetitive tasks and eliminating the need to manage or provision servers.

AppScope 1.0: Changing the Game for SREs and Devs

SREs and Devs are used to solving problems even when an awkward or inefficient way is the only way. In AppScope 1.0, SREs and Devs have a new alternative to standard methods, that the AppScope team thinks will make that problem-solving a lot more fun. We in the AppScope team constantly hear firsthand about life in the SRE trenches. For this blog, we “interview” a fictional SRE/Dev whose thoughts and comments are a mash-up of things we’ve heard from real people we know.

Virtualization Management: What It Is, What It Does, and How It Can Streamline Your Dashboard

Let’s say that you’re a real estate investor with lots of buildings in your portfolio. But you choose not to employ a caretaker when you fully know that you aren’t always available to monitor every building. What do you think will become of some of them?

Using Centralized Log Management for ISO 27000 and ISO 27001

As you’re settling in with your Monday morning coffee, your email pings. The subject line reads, “Documentation Request.” With the internal sigh that only happens on a Monday morning when compliance is about to change your entire to-do list, you remember it’s that time of the year again. You need to pull together the documentation for your external auditor as part of your annual ISO 27000 and ISO 27001 audit.

Now available: Direct monitoring

For more than 7 years, StatusGator has monitored the world’s status pages and aggregated the status of more than 1,000 cloud services into custom dashboards. With a quick glance, you can see the status of every service you depend on. But what if a website you depend on does not have a status page? What about internal tools, sites you host yourself, and other services without public status information?

What Does it Take to Run a Smart City?

From smart lighting, to waste management, to air quality, IoT devices and other “smart city technologies” are rapidly changing life in urban areas. And with a recent MarketsandMarkets research report projecting spending in the market for smart city solutions will nearly double to over $873 billion by 2026, the trend is only set to accelerate.

The best maintenance pages for website downtime throughout the years

You might think that a customer seeing a maintenance page message when they land on your website is a bad thing but think again. If you’re clever with your branding, you can really show off your personality, and make website visitors feel better about not being able to access your website. Never thought that could happen, right? Well I’ve put together the best maintenance pages across the years to show you exactly how you can make planned website downtime look like a breeze.

Open sourcing our pay calculator

We all know that pay is just as hard as it is important. Having a team distributed over 11+ countries makes pay even harder than in a traditional setup. I joined Checkly in mid-2021 with the promise of a fair and transparent culture in a remote tech startup and my goal was clear: make Checkly one of the best employers in our industry. A topic that I wanted to tackle early on was pay. How can we make pay less nebulous and more transparent, fair, and predictable?

Kubernetes vs. Docker

Container technology is changing the way we think about building, shipping, and running applications. Containers are lightweight packages of software that include everything it needs to run an application. This includes operating system components as well as libraries and other dependencies. Emerging technologies such as Docker and Kubernetes empowers organizations to deliver quality software with speed and ease.
Sponsored Post

The Best Kubernetes Monitoring Tools

In this article, you'll learn about the best Kubernetes performance monitoring tools that are currently on the market. Although there are a number of application performance monitoring solutions out there, this article covers the best options in terms of their key features, functionalities, ease of setup, and the support garnered from each of their respective communities.

Updated: Cogent and Lumen curtail operations in Russia

Reader’s note: On Friday, March 4, we published this blog post to comment on Cogent’s decision to terminate their commercial relationships with their Russian customers. Today, March 7, another international telecom, Lumen, announced that it will also take action. We’ve updated this blog post to reflect the latest information we have.

Don't pay the price for not automating SAP operations

At Avantra, we always comment that the customers who need us most are too busy to meet with us. It is so common that SAP operations professionals turn up to meetings late, reschedule at short notice, or are not really present - instead, working on their laptop to solve problems. What is curious is that this is normalized, and has been going on for decades without change. How can this be? Let’s dive into this.

Clinical Digital Experience: Happy Staff, Healthier People, Lower Cost

It’s been a dynamic few years for healthcare, to say the least. The entire sector has been under immense pressure – not just for the clinical staff on the front lines but also for personnel behind the scenes. The rapid expansion of telehealth and remote work, ongoing application upgrades and refactoring, mergers and acquisitions, revenue pressure due to delayed procedures, staffing challenges of all kinds… All have had an impact on satisfaction and overall experience.

eSecurity Planet Ranks Flowmon in Best Network Monitoring Tools

Modern enterprise and SME networks are complex constructions. They comprise on-premises network equipment and servers, multiple public cloud infrastructure components, operational technology links to monitor physical items, edge networks, and large numbers of endpoint devices that connect from various locations over many different networks.

Effectively Bridging the DevOps - R&D Gap without Sacrificing Reliability

DevOps culture revolutionized our industry. Continuous Delivery and Continuous Integration made six sigma reliability commonplace. 20 years ago we would kick the production servers and listen to the hard drive spin, that was observability. Today’s DevOps teams deploy monitoring tools that provide development teams with deep insight into the production environment. Before DevOps practices were commonplace, production used to fail. A lot.

Server Monitoring and Alerting

Server and IT infrastructure monitoring are critical to ensuring the performance and longevity of your client systems. Even more so, remote monitoring technology, in particular, has helped define the entire modern IT industry. In this post, we’re going to discuss several of the main monitoring concepts, including metrics, alerting, and monitoring, and why they are important.

Using the New Flux "types" Package

As a strictly typed language, Flux protects you from a lot of potential runtime failures. However, if you don’t know the column types on the data you’re querying, you might encounter some annoying errors. Suppose you have a bucket that receives regular writes from multiple different streams, and you want to write a task to downsample a measurement from that bucket into another bucket.

Use Your Load Balancer to Monitor Application Health

HAProxy and HAProxy Enterprise collect a vast amount of information about the health of your applications being load balanced. That data, which uses the Prometheus text-based format for metrics, is published to a web page hosted by the load balancer, and since many application performance monitoring (APM) tools can integrate with Prometheus, it’s likely that you can visualize the data using the APM software you already have.

Understanding Log Management: Issues and Challenges

Log messages - also known as event logs, audit records, and audit trails – document computing events occurring in IT environments. Generated or triggered by the software or the user, log messages provide visibility into and documentation of almost every action on a system. So, with all that in mind, let’s explore all the biggest log management challenges of modern IT and the solutions for these problems.

Monitor real-time distributed messaging platform NSQ with the new integration for Grafana Cloud

Today, I am excited to introduce the NSQ integration available for Grafana Cloud, our platform that brings together all your metrics, logs, and traces with Grafana for full-stack observability. NSQ is a real-time distributed messaging platform designed to operate at scale, handling billions of messages per day. It’s a simple and lightweight alternative to other message queues such as Kafka, RabbitMQ, or ActiveMQ. This will walk you through how to get the most out of the integration.

Mapping Microsoft Teams Call Flows to the Microsoft Front Door - Martello Vantage DX Feature Spotlight

In late 2021, Microsoft published an article regarding the connectivity principles for securely managing Microsoft 365 traffic in order to get the best possible performance. Take me to the Article: Microsoft 365 Network Connectivity Principles >> In the section focused on avoiding network hairpins, Microsoft states that as a general rule, the shortest and most direct route/call flow between the user and their closest Microsoft 365 endpoint will offer the best performance.

Get more insights from your Java applications logs

Today it is even easier to capture logs in your Java applications. Developers can get more data with their application logs using a new version of the Cloud Logging client library for Java. The library populates the current executing context implicitly with every ingested log entry. Read this if you want to learn how to get HTTP requests and tracing information and additional metadata in your logs without writing a single line of code.

The Top 4 Reasons to Start Your Observability Pipeline Journey with Cribl.Cloud

Talk to anyone in the tech space and you’ll likely hear horror stories of how home lab setups can grow out of control or about long lists of VMs used to test various software systems. As a Criblanian, I’m no exception – I have at least a half dozen instances of Cribl LogStream deployed everywhere from my local machine, on docker containers, or on a few EC2 instances in AWS.

OpenTelemetry and Distributed Tracing in JavaScript

In our Configuring OpenTelemetry in Ruby blog post, we showed how to configure OpenTelemetry in a Ruby on Rails backend. In this post, we’ll cover how to configure OpenTelemetry in the front-end JavaScript in order to measure performance of browser and mobile devices and how to configure distributed tracing to work across the frontend and back end telemetry collection. Let’s dive in!

Searches in Loggly Simplified

SolarWinds® Loggly® was built to cut through large volumes of noisy log data to quickly pinpoint the exact events relevant to your search. Whether your log data is structured into neat field and value pairs which lend themselves to precise search queries or written in unstructured text blobs, Loggly enables you to extract meaningful insights from your logs—even if you’re not a query master.

Optimizing Web Performance: Understanding Waterfall Charts

Waterfall charts are diagrams which represent how website resources are being downloaded, parsed by the engine, in a timeline that gives us the opportunity to see the sequence and dependencies between resources. It assists in identifying where important events happened during the loading process. They can also let the user easily see how good or bad the performance of their website is, showing you exactly what is slowing down your site.

How to Monitor the Health of your Applications

Between the meteoric rise of telehealth visits (which has seen a whopping 3,800 percent increase since COVID-19), the ever-increasing healthcare tech stack, and the up-and-coming wave of Medtech pushing the boundaries of patient and healthcare experiences, one thing is clear: healthcare is grappling with digital technology. There's an ever-so-tempting $3 trillion in cost savings on the horizon. But to ride that digital wave to success, you need to avoid wipeouts.

Supporting, Extending, and Protecting Virtual Environments

Virtualization remains the key technology of the data center. It's also been the foundational technology that allowed cloud to proliferate. In modern environments, to get the most out of cloud, multi-cloud, and hybrid cloud, organizations need to stay on top of the latest developments in virtualization. Watch this On-Demand event to learn about the latest technologies in virtualization and virtualization-adjacent areas, like containers, software-defined storage, software-defined networking, the software-defined data center, and more!

A sleek new trace filter and trace details tab, 50+ contributors in our tribe - SigNal 10

One who moves the hill sets off by taking away the rocks. This is our 10th monthly update, and looking back at it, I can’t help feeling proud of our consistent efforts. Numerous releases, GitHub issues, sprints, and standups have brought us here. And it’s incredible to see what small teams with purpose can build with a consistent effort. Without further ado, let’s see what our 10th edition has to offer.

Netdata Meetup: Real World Scenario on How to Install and Monitor from Scratch

In our first Netdata Meetup, Thiago Marques will present and show you how to install Netdata from scratch on a specific host and demonstrate how to understand navigating through the many, in-depth Netdata dashboards. Thiago will also cover understanding metric distribution. Monitoring is not only to visualize collected data, which is why we will show where host notifications are, and how to access A.I. to simplify even more the correlation between issues and hardware/software.

How to manage log files using logrotate

Logs are records of system events and activities that provide valuable information used to support a wide range of administrative tasks—from analyzing application performance and debugging system errors to investigating security and compliance issues. Large-scale production environments emit enormous quantities of logs, which can make them more challenging to manage and introduces the risk of losing important data if underlying resources run out of space.

What is a Good API Response Time?

It's hard to imagine a world without APIs. APIs connect our mobile phones or computers to do everything from making purchases and payments to interacting on social media, extracting or sharing data or any other computer to computer interaction in our business or daily life. If you want to open up a heated debate, ask one of your programming partners or developer colleagues what makes up a good API response time?

Dashboard Fridays: Log Analytics VM Insights

Join Adam Kinniburgh in this latest Dashboard Fridays episode, in which he showcases a Log Analytics VM Insights dashboard. This dashboard, built with the WebAPI tile for the CE and SCOM editions of SquaredUp, surfaces key metric data for Virtual Machines, regardless of where the servers are hosted. In this short video, we'll demonstrate how this dashboard was built using SquaredUp dashboards, the challenges it solves, and how you can easily replicate it in your own environment.

Azure DevOps: Fun with Observability Events and Alerts!

If you’re working with microservices in a large distributed environment, you’ve probably got your monitoring and logging on lock, and you may even be lucky enough to have properly instrumented APM (distributed tracing) for consumer calls. But, did you know you’re likely still facing an observability gap? How many incidents have you worked that required hours of sleuthing only to end with a single team needing to roll back a deployment? It’s more common than you may think!

Ask Miss O11y: Making Sense of OpenTelemetry: Who's There? The Resource.

Ah, I too have wondered about this. TL;DR: The Resource says what program is sending these spans and where it’s running. You can skip it if you define OTEL_SERVICE_NAME in the environment. When I’m setting up tracing (for instance, in a Node.js app), I have to create a Resource object in order to set up the OpenTelemetry SDK: If I don’t define that resource parameter, then tracing will still work. But my spans will show up with aservice.name of unknown_service:node.

Why You Should Monitor Your E-Commerce and How to Start

According to research by the SolarWinds® Pingdom® team, retail and e-commerce sites have plenty of room for improvement, as the global market generated revenues up to US$4.921 trillion in 2021. As more people look to the internet to make purchasing more convenient, online sales will become even more critical to businesses. What does this mean if you’re a seller?

Get to Know WhatsUp Gold Free Edition

We sympathize with the IT teams that keep their networks running – we really do. We understand it’s a thankless job where they’re ignored if everything’s working and blamed when everything’s not. That’s why we’ve tried to make our network infrastructure and application experience (AX) products as simple and intuitive as possible.

PerfOps vs. CDNPerf: Which Service is Right for You?

No matter which one you choose, all PerfOps products are about the data. We’re committed to providing high-value DNS, CDN, and Cloud analytics for providers and businesses. The data derived from our analytics software is the most accurate and robust DNS, CDN, and Cloud data in the world. Rest assured, as far as providers go, there are no “red-headed stepchildren” here. No bias. No number skewing. The data is what the data is. Pure and simple.

Sponsored Post

The Dirty Data Problem: Why Modernizing Infrastructure Monitoring is Pivotal to AIOps Success

Jeff Dean at Google Brain once said that the most sophisticated AI algorithms succumb to the quality of the dataset they rely on. That's a fancy way of saying: "Garbage in, garbage out." And if your organization is struggling with the effects of dirty data-inaccurate analytics, sub-optimal automations, and persistent problems with IT operations management-chances are you've got visibility gaps in your infrastructure that have you operating with a CMDB filled with inaccurate, incomplete, or obsolete information.

How Dapper Labs uses Grafana Cloud to meet the global demand of NFT Mania

Ever since a JPEG created by the digital artist Beeple sold for more than $69 million in 2021, the worldwide obsession with NFTs (non-fungible tokens) that represent digital collectibles, art, and media has been growing. A company at the forefront of the NFT world is the blockchain gaming studio Dapper Labs, which leverages blockchain to build addictive games (such as CryptoKitties), verify authentic digital collectibles, and run fan tokens for sports personalities and music artists.

A Look at the 6 Best Python Error Monitoring Tools in 2022

Errors are the necessary evils of software development. They bring to your attention critical information about what’s wrong with your application and what needs fixing before your end-users suffer. Error monitoring tools offer significant help in this cause by aggregating all the errors and issues your applications (and their end users) are struggling with under one roof and providing valuable insights to resolve these and optimize performance.

Pandora FMS wins the Open Source Excellence 2022 award along with four other SourceForge awards

We love uploading this kind of post to our blog. Articles in which we boast about our work and where all the effort of our team throughout the year comes to light. Because yes, we are rewarded once more, Pandora FMS is proclaimed winner in several categories in the SourceForge Awards. No more and no less than four awards, including the Open Source Excellence 2022 award, possibly one of the most desired and disputed in the industry in this specific sector.

Why You Need a Digital Experience Monitoring Strategy?

Websites are the economic engine for modern businesses and service providers. A user-friendly, always-on, secure site reassures visitors and shows customers, business partners and others you are serious about your business. As CTO, DevOps manager, or IT lead, you need a digital experience monitoring (DEM) strategy to prevent or minimize website or API downtime.

Burning Green - 3.5 million PCs Show Heavy e-Waste in Corporate IT

Almost since the moment the world shut down at the start of the pandemic, we’ve heard what a blessing remote and hybrid work is for environmental sustainability. With fewer of us driving and flying for work, it can feel like hybrid work is the sustainable solution we’ve dreamed of for so long. Yet at the same time, attention is also being cast on big tech companies like Facebook and Google for their role in producing (and curbing) e-Waste.

How to Setup AWS CloudWatch Agent Using AWS Systems Manager

Before we jump into this, it’s important to note that older names, and still in use in some areas of AWS, are often referred to SSM which stands for Simple Systems Manager. AWS Systems Manager is designed to be a control panel for your AWS resources so you can manage them externally without having to SSH into the resources individually. What is important to remember with AWS Systems Manager is that features contained within the tool may occur additional pricing.

The 10 Most Common HTTP Status Codes

As a typical Internet user, nothing is more frustrating than waiting for a web page to display, only to receive a “Page Not Found” 404 error status code. Sure, we try reloading the page, and sometimes that gets the gremlins to start working, but most times, the issue is out of our hands. For all of us typical users, we either go onto the next thing or find a different site. There’s a lot going on in the background that most of us are completely unaware of.

What Healthcare Companies Need from Network Management and Network Monitoring

IT pros in the healthcare industry have one of the toughest jobs imaginable. Herculean task number one is protecting patient data, with failure to do so bringing hefty HIPAA fines and more than a little bad press. Gargantuan task number two is stopping breaches (and then doing forensics if one busts through). Failing either of these is not exactly a confidence booster. Don’t forget, almost all hacks and breaches either attack the network itself or go across it to reach their target.

4 compelling reasons why you need a network discovery tool and 5 ways OpManager helps

Businesses now scale exponentially and so do their networks. Managing a hybrid IT environment that comprises wired, wireless, and virtual networks can be a challenging task for network administrators. However, continuous monitoring of these devices for fault and performance is crucial. Network discovery is key to successful monitoring solutions.

Sponsored Post

Detect and Fix Slow Proxies For SaaS and Cloud

Crowdsourcing is becoming an essential part of many enterprises, from big companies to startups. This is because it is an incredible way of creating an ecosystem to facilitate different processes within a business. Leveraging the Exoprise platform, companies can benchmark their slow proxies, detect, and fix slow network experiences. Crowdsourcing entails getting information, goods, or services from disparate people worldwide. Often, crowdsourcing is made possible through the magic of cloud-based applications and platforms because of the way the Internet connects people and organizations. Exoprise specializes in crowdsourced monitoring of cloud and SaaS services. We call it crowd-powered.

Sponsored Post

Golden Signals - Monitoring from first principles

Building a successful monitoring process for your application is essential for high availability. In the first of this three-part blog series, Safeer discusses the four key SRE Golden Signals for metrics-driven measurement, and the role it plays in the overall context of Monitoring. Monitoring is the cornerstone of operating any software system or application effectively. The more visibility you have into the software and hardware systems, the better you are at serving your customers. It tells you whether you are on the right track and, if not, by how much you are missing the mark.

Sponsored Post

Error monitoring and exception handling in large-scale software projects

Large-scale software projects don't care how many unit tests you put into your code. Or how sophisticated your CI/CD pipeline is. Or how robustly you run blue-green deployments to ease into newly-deployed code. These projects will inevitably find themselves subjected to your users, who will uncover bugs your team didn't catch and didn't even think to test for.

A 6-point action plan for software companies to manage the global chip shortage crisis

Since mid-2020, several global factors including a drought in Taiwan and the COVID-19 pandemic have resulted in a semiconductor chip shortage, which worsened further as demand for electronics shot up during the pandemic. As offices and schools moved to homes and digital services became essential, in 2020 alone, 297 million PCs were sold, up 11% over the previous year.

A Complete Guide to Tomcat Performance Monitoring

Metrics and runtime characteristics for application server monitoring are critical for the applications running on each server. Monitoring also helps to prevent or manage possible problems promptly. Apache Tomcat is one of the most widely used servers for Java applications. JMX beans or a monitoring tool like MoSKito or JavaMelody can be used in Tomcat performance monitoring.

Logz.io Now Fully PromQL Compatible

The popularity of Prometheus speaks for itself. The project doesn’t post official numbers, but there are at least 500,000 companies using this project today as one of the most mature CNCF projects – one that has over 40k Github stars as of the writing of this blog. And since Prometheus is highly interoperable, compatibility is key. This comes into play not only with the exporters, but also with long-term storage options and alerting systems.

Sematext recognized as one of the Best Software Products by G2 and Gartner review platform communities

At Sematext, we are dedicated to making troubleshooting easier for ops teams. We knew we were doing something right when we started to receive awards and positive reviews from our customers around the globe, ranging from startups to enterprise clients across a wide range of industries. In this post, we’re listing just a few of the recognitions Sematext Cloud has received from the community via review platforms such as G2, Capterra, GetApp or SoftwareAdvice.

Datadog On Rust

Rust is a programming language that has been gaining popularity over the past few years, with its adopters claiming that it helps them write faster, memory efficient, and more reliable software. At Datadog many backend services are written in Go, but some teams have begun adopting Rust when building new services, especially when performance is critical.

New in Grafana 8.4: How to use full-range log volume histograms with Grafana Loki

In the freshly released Grafana 8.4, we’ve enabled the full-range log volume histogram for the Grafana Loki data source by default. Previously, the histogram would only show the values over whatever time range the first 1,000 returned lines fell within. Now those using Explore to query Grafana Loki will see a histogram that reflects the distribution of log lines over their selected time range.

Customer Panel: CDI and PROACT Discuss Observability Platforms for MSPs

In this customer panel video Chris Black, CTO Managed Services at CDI, and Per Sedihn, CTO & VP Portfolio & Technology at ProAct, talk about their partnerships with LogicMonitor. Topics include the importance of predictability, automation, and intelligence in observability platforms for Managed Service Providers, specific ways LogicMonitor helped both companies scale quickly, and the future of support for hybrid infrastructures with unified observability.

The Hidden Magic Of Extensions

As serverless architectures start to grow, finding the right troubleshooting approach becomes a business-critical aspect. This talk will dive into "the instrumentation approach" - keep track of internal events on the lambda and export processed telemetry data. We should handle legacy code, multiple code owners, and a massive stack of serverless technologies as with any real-life project. Our goal is to write as least code as possible, avoid any existing code changes, be cross-runtime, and leave no latency impact.

Serverless in production, lessons learned after 5 years

Serverless has changed the way we build software for the better. But it’s also a paradigm shift that challenges many of our pre-existing practices and habits, like how we test our code and how we monitor their health. In this session, Yan Cui will share many of the lessons he has learned from running serverless in production over the last five years. Including tips on testing, observability, and how to keep your AWS cost in check.

Banks are enabling personalization with Elastic - your industry can, too

Think about the moments when something is presented to you that is just what you’re looking for. Those moments, when it feels like a company you trust knows you, are all too rare in commerce. And of course, presented incorrectly, they can even feel invasive. But done well, they solidify your relationship as a customer, and reinforce that you’re getting the service you deserve.

Automation Anxiety: What Role Do Humans Play in an AI-Driven Future Workplace?

What tasks can machines do for us? That’s what business leaders were pondering back when automation and AI were still new, cutting-edge technologies. Now the question has changed. What tasks can’t they do? We used to think of automation as a useful method to avoid doing the repetitive, menial tasks that require so little of our brain power and take up so much of our days.

Now in beta: Direct monitoring

Contact us to get access to our direct monitoring beta feature today! For more than 7 years, StatusGator has monitored the world’s status pages and aggregated the status of more than 1,000 cloud services into custom dashboards. With a quick glance, you can see the status of every service you depend on. But what if a website you depend on does not have a status page? What about internal tools, sites you host yourself, and other services without public status information?

Infographic: Achieving True Observability With the 4Ts

Ready to “rewind the movie” to see exactly what was going on in your stack at any moment in time? Ready to quickly go straight to the original source of the problem to solve issues faster? Check out our new infographic, Achieving True Observability With the 4Ts, to see how StackState’s unique 4T® data model correlates topology, telemetry and traces at every moment in time, to deliver real-time contextual insights into your entire IT landscape.

Sponsored Post

Intelligent Machine Monitoring

Artificial Intelligence (AI, also called Machine Learning) is certainly making its way in the world. Technologies such as Voice Recognition, Face Recognition, Predictive Analytics, Self-driving cars, and Robotics are now becoming embedded into our society. With the advent of big-data, these technologies can become more and more powerful and more and more a part of our everyday lives. I'm sure that there is much controversy over this. I'm sure that many people consider it invasive.

Sponsored Post

Simplifying ESX monitoring with OpManager

IT admins have to adapt to new market trends and networking concepts to meet ever-evolving IT demands. However, solely relying on physical components to support this changing landscape puts them at a disadvantage when it comes to scalability, network distribution, and cost-effectiveness. To remove the strain on physical components such as servers and to keep the capital expense in the optimal range, IT admins rely on virtualization. Most networks have started adopting virtualization even for their most resource-intensive applications.

Empower IT Help Desks: Deliver Business Value with IT Service Monitoring

On one of the EUC slack forums, a question was recently posted asking what are the key lessons learned in the last year and what changes organizations should investigate in the next year. The answers were revealing. Several folks pointed out that prior to the pandemic, we were operating within the comfort zone of carefully planned deployments with managed devices and managed networks.

Network observability, now publicly shareable

What fun is network observability if you can’t share what you see? That’s why we’ve added public link sharing to the Kentik platform. One of the greater missions of network observability is to break the boundaries of conventional monitoring. At Kentik, we focused our initial efforts on making complex infrastructure problems easy to visualize, understand and resolve. Now we’re tackling a follow-up mandate: to democratize network observability.

HAProxy Monitoring Guide: Important Metrics & Best Tools in 2022

HAProxy is one of the most popular software around when it comes to load balancers and reverse proxies. When you’re using it for these purposes, it’s especially important to monitor for both availability and performance, which will impact your SLI and SLOs. In this post, we’ll talk about the main HAProxy metrics you should monitor and the best monitoring tools you can use to measure them.

How summary metrics work in Prometheus

A summary is a metric type in Prometheus that can be used to monitor latencies (or other distributions like request sizes). For example, when you monitor a REST endpoint you can use a summary and configure it to provide the 95th percentile of the latency. If that percentile is 120ms that means that 95% of the calls were faster than 120ms, and 5% were slower. Summary metrics are implemented in the Prometheus client libraries, like client_golang or client_java.

Monolithic Application Performance Monitoring

A monolithic architecture is one of the oldest architectures used for making software and applications for various companies. In layman's terms, Monolith means everything in a single box. Monolithic software is supposed to be self-contained, and its components are interrelated and interdependent. When it comes to monoliths, updating a single component or rolling out a new feature is difficult. Suppose any program component needs an update.

How to Use the Key Performance Metrics you Already Have to Improve the Microsoft Teams User Experience

In any organization, the IT operations teams bear the responsibility of providing reliable cloud services to users that are increasingly distributed, working from home, the office, or elsewhere. As a result, IT professionals are looking for solutions to achieve visibility of the user’s network from end to end to quickly identify and resolve bottlenecks and ensure maximum productivity and ROI of their cloud applications.

Serverless Heroes Discuss The Latest Trends in Serverless Development

***Please note that the audio turns on at the 45 second mark*** How Lumigo Monitors Its Production Monitoring your production is challenging. Lumigo processes more than 1.5B Lambda invocations per month and digests 25TB of data monthly. Many things can go wrong. In this session, I will explain how Lumigo operates its production from tools to processes.

How Your Cloud Application Monitoring Data Can Help You Make Clearer Business Decisions

Companies are increasingly adopting cloud services. They’re cost-efficient, scalable, and give you time to market by getting your ideas out there as quickly as possible. However, cloud services don’t have extensive, in-built cloud application monitoring. This is where Netreo steps in. Let’s take a look at what cloud computing is. After that, we’ll dive into cloud application monitoring and how it affects business decisions.

OpenTelemetry (OTel) Is Key to Avoiding Vendor Lock-in

The promise of OpenTelemetry is that it can help you avoid vendor lock-in by allowing you to instrument your applications once, then send that data to any backend of your choice. This post shows you exactly how to do that with code samples that configure your application to send telemetry data to both Honeycomb and New Relic.

5 questions about Ansible that Elastic Observability can answer

While automating systems is seen as an imperative in boardrooms around the globe, automation teams — the teams on the ground — often lack the data to help them to industrialize their automation efforts and move from ad-hoc automation to strategic automation. In this automation-focused blog post, we will show how to instrument infrastructure automation with Elastic Observability.

How to Fix Request Method 'POST' Not Supported

Many errors can prompt a page in any search engine. The most common ones are the 404 Not Found Error and the 404 Not Supported Error. Yeah, once in a while, you might encounter this notification; the request method' post' is not supported. This prompt gives no one an option but to leave the page. Take it this way; the error message prompts in your website, yet you depend on these search engines to generate more traffic into your website.

Cloud Governance: What It Is and Why You Need It

Every company exerts some level of effort to manage costs, performance, and risk in their hybrid cloud environment. But to ensure that those activities are performed consistently and efficiently across the board, you need a framework of policies, processes, controls, and tracking. In other words, you need cloud governance.

Desktop as a Service (DaaS) Stressing You Out? Read Expert Advice

DaaS (Desktop as a Service) simplifies virtual desktop and application delivery into a cloud-based subscription service that avoids the traditional complexities of on-premises VDI. As such, the potential benefits of DaaS are very attractive to IT teams that wish to establish greater control over desktops, improve security and speed-up provisioning. It’s no wonder that DaaS adoption and consideration is on the rise.

New in Slack: List all your services

StatusGator’s Slack integration is one of our most popular features. Our users love getting real time notifictations of outages from all the cloud services they depend on. Plus, with a few quick clicks you can check the status of any service you monitor on StatusGator, using our /statuscheck slash command. Now, you can also get the status of all the services in your StatusGator dashboard with one quick command: /statuscheck list. This will list every service on your dashboard.

Continuous Performance Improvement of HTTP API

The following guest post addresses how to improve your services’s performance with Sentry and other application profilers for Python. Visit Specto.dev to learn more about application profiling and Sentry’s upcoming mobile application profiling offering. We’re making intentional investments in performance monitoring to make sure we give you all the context to help you solve what’s urgent faster.

Trends in Education Networking

The EdTech space is evolving at a rapid pace. In addition to a drastic increase in remote learning, a wide range of new technologies and platforms are making their way into physical and virtual classrooms across the globe. However, discussions about trends in educational technology tend to overlook how they affect (and are affected by) the network. We’re here to do something about that.

Mobile optimisation: Is your website still struggling?

If you’re a website owner, manager, or developer, you know to always check your website’s performance and design on both desktop vs mobile, especially since Google’s big drive for mobile-first. Google has even centred their 2021 Core Web Vitals around making sure that your website is efficient on mobile.