Operations | Monitoring | ITSM | DevOps | Cloud

October 2022

The Benefits of Data Observability to SMBs and How to Unlock Them

Data observability is a relatively new discipline in the fields of data engineering and data management. While many are familiar with the longstanding concepts of observability and monitoring in enterprise IT networks and infrastructure, data observability has only really come into the spotlight in the last two years. However, it has managed to turn a lot of heads in that short time.

Ensuring visibility with monitoring tools in 2022

Not long ago, monitoring tools were just nice additions to have and did not have a lot of purposes. However, as technologies scaled up and became more complex, keeping track of all the systems and their health became a huge challenge. As more and more brands started offering new digital services and moved the existing platform, the competition skyrocketed and being on top of system health and proactively resolving potential incidents became crucial.

Installing the Hosted Graphite Heroku Monitoring & Dashboards Add-on.

HostedGraphite provides a complete infrastructure and application monitoring platform from a suite of open-source monitoring tools. Use Hosted Graphite and view all required metrics on beautiful dashboards in real time. Hosted Graphite offers a wide range of tools, add-ons, and plugins which make it possible to measure, analyze, and visualize large amounts of data about your applications with ease.

Why Observability Engineers Are Crucial for Great Data Management

If you’re unfamiliar with observability, you might think an “observability engineer” is just a fancy way to say data admin — but while observability engineers often work with data admins, they work toward different goals. Data admins monitor information to identify and fix known security issues. Observability engineers work to provide a complete picture of all the data a company aggregates and what it means for a business.

Observing AWS Lambda IOT devices

The internet of things is one of my favorite topics. IOT enables low-powered connected devices that opens gateways from the digital to the real world. While I love tinkering away with an Arduino sketch and the latest Espressif or Arduino board, there is always an air of frustration when trying to build out what at first seems like simple functionality using one of these “smart devices” because of the limited view we have into their operations.

Getting Started with Python and Geo-Temporal Analysis

This article was originally published in The New Stack and is reposted here with permission. Working with geo-temporal data can be difficult. In addition to the challenges often associated with time-series analysis, like large volumes of data that you want real-time access to, working with latitude and longitude often involves trigonometry because you have to account for the curvature of the Earth. That’s computationally expensive. It can drive costs up and slow down programs.

What can Elastic Synthetics tell us about Kibana Dashboards?

I like to leverage our technologies to ensure our products have a pleasant user experience. Elastic Synthetics enables you to configure it in an out-of-the-box experience directly through your Elastic Cloud deployment without the need to install anything! It also works across the globe with multiple locations you can choose from. Ever wondered how fast your web service is when accessed from Japan, Germany, or the eastern U.S.? Now you can do this by simply clicking on a checkbox.

On Building a Platform Team

It may surprise you to hear, but Honeycomb doesn’t currently have a platform team. We have a platform org, and my title is Director of Platform Engineering. We have engineers doing platform work. And, we even have an SRE team and a core services team. But a platform team? Nope. I’ve been thinking about what it might mean to build a platform team up from scratch—a situation some of you may also be in—and it led me to asking crucial questions. What should such a team own?

What is Kafka?

Apache Kafka is a popular open source platform for streaming, storing, and processing high volumes of data. In this video, we break down how Kafka works and how it’s able to provide you with a reliable, scalable, and highly performant service for managing events. We also touch on some key resources for effectively monitoring your Kafka deployments via Datadog.

Tales from the Kernel Parameter Side

Users live in the sunlit world of what they believe to be reality. But, there is, unseen by most, an underworld. A place that is just as real, but not as brightly lit. The Kernel Parameter side (apologies to George Romero). Kernel parameters aren’t really that scary in actuality, but they can be a dark and cobweb-filled corner of the Linux world. Kernel parameters are the means by which we can pass parameters to the Linux (or Unix-like) kernel to control how that it behaves.

Elastic Observability: What is it, and How Do You Get Started?

Elastic provides a rich set of Observability features beyond logging, such as metrics, tracing, OTel support, and rich ML/AIOps features. Getting started is easy as deploying a singular agent to collect and ingest metrics, logs, and traces from multiple sources such as K8S, AWS, and Applications. Watch this video to see how simple it is.

Customers Demand Interoperability and Open Standards Are the Key

When I speak with customers, especially chief information security officers (CISOs), one of their most consistent requests is that they want interoperability. They want the software they buy to work with the software they have and plan to buy in the future. Nearly every organization, certainly every enterprise company, has an installed base of hardware and software representing a significant investment in time and money.

Ensuring visibility with monitoring tools in 2023

Not long ago, monitoring tools were just nice additions to have and did not have a lot of purposes. However, as technologies scaled up and became more complex, keeping track of all the systems and their health became a huge challenge. As more and more brands started offering new digital services and moved the existing platform, the competition skyrocketed and being on top of system health and proactively resolving potential incidents became crucial.

Partnering to Build the Right Foundation for Your MSP

When RediTech installed their first RM tool, they hardly looked back, moving too fast to innovate on the up and running platform. But, time caught up to them, and so did all the manual work. That's when they reached out to N-able N-hanced Services to revisit their foundations and build the right use cases for their remote monitoring and managing to best support their partners and grow their business.

Modern AIOps doesn't just fix outages - it prevents them

Modern AIOps doesn’t just fix outages — it prevents them – Is your business one accidental click away from a major outage? We saw it happen with Atlassian earlier this year. You may already have an incident management strategy and monitoring, but is it adjusted for the ever-changing IT infrastructure and application architectures? Putting appropriate protocols in place ensures that one human code push can’t shut down an entire system for three weeks.

NiCE DB2 Management Pack 5.2 released

The database market has seen a tremendous boost over the past years. Although database environments have become much more reliable and performant, there is still good reason to monitor dedicated IBM Db2 on-premise or cloud deployments. The NiCE DB2 Management Pack complements highly advanced availability and performance monitoring based on Microsoft System Center Operations Manager, freeing up valuable admin time while keeping crucial deployments safely up and running.

Easy JavaScript error investigation with source maps

Hopefully by now you’re taken your first sip of Elastic RUM, or real user monitoring, and see the power of searching through traces and the User Experience metrics to gain insights into how users actually use and experience your application. One issue you may have experienced is the challenge of finding the source of errors for minified JavaScript files.

The Orion Platform and the SolarWinds Platform - What's the Difference?

For the better part of twenty years, the SolarWinds Orion Platform has provided the consolidated web console, alerting engine, reporting engine, and the API upon which our Network, Systems, Virtualization, Storage, and other management solutions rely. That technology was always evolving release-to-release, and the next stage in that evolution is a new name. Now called the SolarWinds Platform, it's built on the same technology that's been part of our customers' experiences for years.

StackState Observability Platform v5.1 - Context Is King

Context is king, in particular if you are troubleshooting your stack. Having all the right information from your observability platform to understand the behavior of your stack is fundamental for solving problems. With our StackState Observability Platform v5.1 release, StackState takes a big step forward to provide you even more information that is crucial for making decisions and for finding the root cause of an issue faster.

Data Normalization Explained: How To Normalize Data

Virtually every business utilizes some form of data collection, no matter how big or small. While large-scale enterprises have more established methods for collecting, storing and analyzing data, smaller companies and start-ups are also beginning to understand the value of data collection and analysis in order to: This is especially true in the age of Big Data and democratized data — where we have more data-driven insights available to us than ever.

Monitoring RPA Deployments With Splunk

When you first hear “Robotic Process Automation” (RPA) you might immediately think of a manufacturing line with a series of physical robots each doing their part to build something. RPA is SO much more than that! The “bot” in this sense is an AI powered piece of software that can interface with any system you run today just as a human would.

Minimizing network downtime by integrating network monitoring solutions with ITSM tools

Being a network admin of an enterprise network, you know better than anyone how disastrous network downtimes might be. The cost of downtime study conducted by Gartner in 2014 found that network downtime costs $5,600 per minute on an average, but this number can range from $2,300 to $9,000 per minute. With organizations moving towards sophisticated networks built on hybrid infrastructures, network downtimes are becoming more frequent and costly.

Top 10 cAdvisor Metrics for Prometheus

cAdvisor (container advisor) is an open-source container-monitoring platform developed and maintained by Google. It runs as a background daemon process for collecting, processing, and aggregating data into performance characteristics, resource usage statistics, and related information about running containers. With built-in support for Docker and literally any other container type out of the box, cAdvisor can be used to collect data on virtually any type of running container.

What is hybrid cloud?

Hybrid cloud has become a popular computing model in recent times. Find out all you need to know, including its features, pros and cons As computing needs evolve, enterprises continuously find it difficult to scale their business offerings on private or on-premises computing environments. That’s why there are third-party or public cloud providers to enable businesses to carry out larger computational workloads.

Sponsored Post

Network automation tools and their importance in today's networks

A network, as we all know, is the linking of two or more devices for resource sharing, file exchanging, or electronic communication. In a huge network organization consisting of more than 10,000 devices, managing every device manually is a hectic task and near impossible for network admins. To overcome this challenge, a software-based feature known as network automation was invented. The main purpose of network automation is to automate tasks and reduce both the workload and human errors. This automation works through a network automation tool.

AIOps Rightfully Going Beyond CMDB in the Multi-Cloud Era

The absence of topology can be a key inhibitor for AIOps tools, creating blind spots for AIOps as they only have access to event data. A topology, an IT service model, or a dependency map is a real-time picture of tools and services that are connected and dependent on each other to deliver an IT service. Suppose an application is driven by cloud-native technology, connected with any kind of ephemeral systems (containers and microservices), and relies on storage, database, and a load balancing tool.

Observability vs. Monitoring: What is the Difference?

For several decades, IT monitoring has been deployed in different forms. The focus of IT monitoring has been to gather metrics about the operations of an IT infrastructure’s hardware and software assets to ensure that all the key functions are being performed as expected to support applications and IT services. In the recent past, the term Observability has been used as a synonym for “modern monitoring”.

What's new in Sysdig - October 2022

October has, as usual, been a busy month, and Sysdig announced many new features. In Sysdig Monitor, we announced the release of four new Advisories and Yaml config support for Advisor. In Sysdig Secure, we released Severity filtering in Insights, Pod and Node activity view in Insight and four new Falco rules added to the Rules Library. Each of these are discussed in detail below.

Goats on the Road: DevOp Struggles

The best part of my job is talking to you, our prospects, and customers, about your logging and data practices. I love listening to what you are doing and hope to accomplish, so I can get a sense of the end state. My goal is to brainstorm solutions that provide overall value across the enterprise, and not just aim for a narrow tactical win with limited impact. In late September, I hung out at a local DevOps conference in Brooklyn with the NYC Cribl sales team.

Insight and reliability through continuous synthetic testing in Kubernetes

Kubernetes has become the de facto standard for cloud-based applications. As companies migrate more and more workloads, ensuring reliable connectivity and performance are critical not just for user applications but also for the cluster itself. In this article, we will discuss how augmenting your system monitoring with in-cluster synthetic testing can give you proactive indicators that something might be headed for trouble.

Consumer Goods Company Saves Over 470 Hours and $23k of IT Support

Stable network connections are your business’s lifeline. Vital elements of your modern workplace—such as hybrid working, SaaS application usage, and online collaboration—depend on good network performance, contributing to employee productivity and satisfaction. But when network problems started to escalate for one of our customers, they realized they needed to look beyond their traditional network performance monitoring tools to solve the issue.

5 Times Nexthink Helped Customers Reduce IT Costs

When IT budgets are tight—and they almost always are—pressure comes down to cut costs, streamline technology stacks, and overall do more with less. And when a recession hits, you’re going to see your budget get slashed—unless you can transform your IT department from a cost center to a strategic partner in recession planning and cost efficiency.

5-Star OTel: OpenTelemetry Best Practices

Written by Liz Fong-Jones and Phillip Carter. OpenTelemetry, also known as OTel, is a CNCF open standard that enables distributed tracing and metrics collection from your applications. At Honeycomb, we believe that OpenTelemetry is the best way to ingest the high-cardinality and high-dimensional data that every system, no matter how complex or distributed, needs for observability.

A Demonstration of SolarWinds Observability | SolarWinds Day Virtual Showcase (Oct 22)

Join Head Geek Chrystal Taylor and VP of Product Josh Stageberg as they show off SolarWinds Observability— our new SaaS offering that unifies application, infrastructure, database, network, digital experience, and log analysis into a single, integrated platform. It lets you group together elements like microservices, hosts, databases, and websites and quickly determine the holistic health of your online application and how it impacts your business performance. Reduce cost, optimize performance, and ensure reliability for all your business-critical systems with SolarWinds Observability.

Understanding the Three Pillars of Observability: Logs, Metrics and Traces

Many people wonder what the difference is between monitoring vs. observability. While monitoring is simply watching a system, observability means truly understanding a system’s state. DevOps teams leverage observability to debug their applications, or troubleshoot the root cause of system issues. Peak visibility is achieved by analyzing the three pillars of observability: Logs, metrics and traces.

A Demonstration of Hybrid Cloud Observability | SolarWinds Day Virtual Showcase (Oct 22)

Head Geek Chrystal Taylor and GVP of Product Brandon Shopp walk you through the latest updates to SolarWinds Hybrid Cloud Observability. HCO gives teams a more proactive solution, making them better informed and enabling them to focus on the most business-critical issues. This full-stack approach provides a centralized view of your IT infrastructure and services, and delivers powerful functionality to help businesses and organizations of any size maximize their time and resources.

A Demonstration of Hybrid Cloud Observability | SolarWinds Day Virtual Showcase (Oct 22)

Head Geek Chrystal Taylor and GVP of Product Brandon Shopp walk you through the latest updates to SolarWinds Hybrid Cloud Observability. Hybrid Cloud Observability gives teams a more proactive solution, making them better informed and enabling them to focus on the most business-critical issues. This full-stack approach provides a centralized view of your IT infrastructure and services, and delivers powerful functionality to help businesses and organizations of any size maximize their time and resources.

Introducing Automatic UI Updates

Automatic UI Updates (AUIU) is a new cloud service that allows admins to get the most up-to-date UI experience between Splunk Cloud upgrades. Cloud admins gain early access to newly enhanced self-service tools through the AUIU opt-in service. Specified AUIU enhanced pages and tooling can now be delivered to customers up to three months faster. AUIU is a delivery service that allows for new UI pages and UI improvements to be integrated into Splunk Cloud deployments for specific enhanced admin pages.

What's New with WhatsUp Gold 2022.1

The newly released WhatsUp Gold 2022.1 provides an easy-to-use IT infrastructure monitoring solution that lets you find and fix problems fast. With its unmatched combination of out-of-the-box functionality, intuitive workflows, visual mapping, and system integrations, WhatsUp Gold 2022.1 offers users even more built-in infrastructure discovery and monitoring power, available from Day 1.

Getting ahead of application performance issues

When application performance is not managed proactively, it impacts the customer experience, costs business downtime, and takes time to fix the issue. And when an application is down, transactions can't happen, and staff lose productivity. This equates to a loss in revenue. Even worse, some organisations are oblivious to slowly performing applications until much later when trying to understand the underlying issue. By then, it's often too late, and they've already lost customers or money.

How To Grow Your WooCommerce Business In 3 Easy Steps

Brand loyalty might be on the decline, but ecommerce is on the rise. Ecommerce is on track to account for 24% of global retail sales by the end of 2026. Source: Statista And as businesses reach growth limits in their local markets, the industry is seeing more ecommerce brands expanding onto the global stage. 76% of online shoppers have made purchases on sites outside of their own countries.

IBM MQ vs Apache Kafka: How Do They Differ?

Asynchronous communication between various CX applications has long been made possible by enterprise messaging solutions like IBM MQ and – more recently – Apache Kafka. Developers might assume that these two technologies are interchangeable. However, once they scrape the surface, critical differences between IBM MQ and Apache Kafka come to light.

Showcase dashboards securely and effortlessly with Skykit's offering in the Datadog Marketplace

For many organizations, making the most of the visibility Datadog offers into the health and performance of their infrastructure means displaying dashboards to stakeholders in various settings continuously and in real time. But the standard solutions for sharing dashboards to large-format displays can be onerous, involving sundry software and hardware and restrictive manual setups. These solutions can also pose significant security risks, since they tend to involve sharing passwords or devices.

25 Kubernetes Monitoring Tools And Best Practices In 2022

The Kubernetes platform is the standard for orchestrating containerized applications. It’s ideal for large applications running on distributed instances. The problem is that monitoring Kubernetes infrastructure can be notoriously challenging. In this guide, we'll cover Kubernetes monitoring in more detail, including what Kubernetes metrics to track to improve visibility and control over your K8s containers, apps, microservices, etc.

Reduce MTTR and improve UX with Grafana Enterprise: Inside Optum's observability stack

Among the 12 greatest stressors in life, six revolve around healthcare issues. From loss of a loved one to pregnancy and even retirement, these events often involve interactions with healthcare services — interactions that can either add to an individual’s stress or, ideally, help alleviate it.

Should you put all your trust in the tools?

My father worked with some of the very first computers ever imported to Italy. It was a time when a technician was a temple of excellence built on three pillars: on-the-field experience, a bag of technical manuals, and a fully-stocked toolbox. It was not uncommon that missing the right manual or the correct replacement part turned into a day-long trip from the customers’ site to headquarters and back.

Observability and Security Data Are Littering the Enterprise Like Lint Under The Couch Cushions

How enterprises store and split up observability and security data is a great analogy to how lint, spare change, and partially-eaten bags of popcorn end up under couch cushions. Or when you tell your kids to clean up the house when company is coming over and they stash their toys and your tools in various nooks and crannies.

Enhance the Value of Your Data With Mezmo's Observability Pipeline

Organizations of all sizes rely on their observability data to drive critical business decisions. Production Engineers across Development, ITOps, and Security use it to understand their systems better, respond to issues faster, and ultimately provide more performant and secure user experiences. But while the value of observability data is well understood, teams struggle to derive value from it.

Optimize Java Application Performance by Monitoring JVM Metrics

Although Java has been around for 27 years, enterprise applications still favor it as one of their preferred platforms. Java's functionality and programming flexibility increased concurrently with technological advancement, keeping it a useful language for more than 25 years. Outstanding examples of this progression include new garbage collection algorithms and memory management systems.

New Honeycomb Features Raise the Bar for What Observability Should Do for You

As long as humans have written software, we’ve needed to understand why our expectations (the logic we thought we wrote) don’t match reality (the logic being executed). To that end, we developed techniques to help measure reality—logging text strings, or capturing aggregated metrics—and persevered, seeking out newer and fancier logging or monitoring solutions over the intervening decades.

Managing the hidden costs of cloud networking - Part 2

In the first post of this series, I detailed ways companies considering cloud adoption can achieve quick wins in performance and cost savings. While these benefits of the cloud certainly remain true in theory, realizing these benefits in practice can be increasingly difficult as applications and their networks become more complex.

Welcome to InfluxDB IOx: InfluxData's New Storage Engine

Two years ago I announced that InfluxData was working on a new core for InfluxDB, a project we named InfluxDB IOx. InfluxDB IOx is a cloud-native, real-time, columnar database optimized for time series data built in Rust on top of Apache Arrow and DataFusion. Today I’m excited to announce that we deployed our next-generation storage engine that’s built on InfluxDB IOx in our InfluxDB Cloud platform.

How to monitor server load

We often hear the term "load" used to describe the state of a server or a device. But what does it really mean? System load is a measure of the amount of computational work that a system performs. An overloaded system, by definition, isn't able to complete all its tasks per schedule - this affects the performance and productivity of the system. And while "load" often gets conflated with CPU usage there's a lot more to it.

Monitoring your Network SNMP devices using Hosted Graphite

When you design architecture to monitor your digital assets - either software applications or hardware devices, you need to use different strategies depending on your monitoring target. The factors you want to consider can vary including methods of retrieving monitoring data, frequency of data collection, and how you want to surface metrics and insight you find to stakeholders. In this article, we will mainly discuss how we can monitor your network SNMP devices using Hosted Graphite.

The Roblox Outage

Just before Halloween 2021, Roblox engineers experienced a horror story: a service outage that also took down critical monitoring systems. It seemed like the issue was a hardware problem, but it wasn’t. Users were frustrated, and the clock was ticking. After three full days of downtime, service was finally restored on Halloween day. While the incident itself was an IT nightmare, Roblox’s detailed technical post-mortem several months later was an excellent way to bounce back.

HAProxy Logging Configuration Explained: How to Enable and View Log Files

HAProxy is generally the frontend layer of your application, which means it plays a critical role since all traffic first lands on this layer. Because of this, you need to make sure everything is working at this layer all the time, as any issue can directly impact your business. Therefore, having visibility on this layer is crucial. Visibility can come from two aspects: the metrics HAProxy emits and the logs it generates while handling requests.

Introducing Honeycomb Service Map: A Dynamic, Interactive, and Actionable View of Your Entire Environment

Today, we're announcing the launch of Honeycomb Service Map. This isn't your grandparent's version of a service map. This feature reimagines what it is that you want to know or investigate when looking at visualizations of how your services communicate with one another.

What is the True Cost of Low Employee Engagement?

Employee engagement is the key to boosting IT workplace productivity in an organization. But bad digital employee experience (DEX) can result in disengagement, hurting your company’s net income and increasing employee turnover. According to Gallup’s 2022 Workplace report, disengaged employees cost the global economy a whopping $7.8 trillion loss in productivity. For US companies, the cost is around $350 billion for a single year.

Kafka vs RabbitMQ - A Head-to-Head Comparison for 2022

Kafka vs RabbitMQ – A side-by-side comparison of the performance and architectural differences between the two popular open-source messaging systems. As a big data architect or a big data developer, when working with Microservices-based systems, you might often end up in a dilemma whether to use Apache Kafka or RabbitMQ for messaging. Rabbit MQ vs. Kafka – Which one is a better message broker?

Never too late for database performance monitoring

A reliable database system is necessary in the IT operations of an organization to ensure unhindered delivery of information. This is especially true when it comes to business-critical applications, as disruptions in a database system directly impact the end-user experience, ultimately harming your revenue and reputation. Learn about the difficulties inherent to database systems and how monitoring helps resolve these problems.

Import CSV Data into InfluxDB Using the Influx CLI and Python and Java Client Libraries

With billions of devices and applications producing time series data every nanosecond, InfluxDB is the leading way to store and analyze this data. With the enormous variety of data sources, InfluxDB provides multiple ways for users to get data into InfluxDB. One of the most common data formats of this data is CSV, comma-separated values. This blog post demonstrates how to take CSV data, translate it into line protocol, and send it to InfluxDB using the InfluxDB CLI and InfluxDB Client libraries.

Observing your application through the eyes of a user: A brand new synthetic monitoring experience is coming

Understanding if your applications are not just available but also functioning as expected is critical for any organization. Third-party dependencies and different end-user device types means that infrastructure monitoring and application observability alone are not enough to spot and minimize the impact of application anomalies.

Kentik Kube extends network observability to Kubernetes deployments

We’re excited to announce our beta launch of Kentik Kube, an industry-first solution that reveals how K8s traffic routes through an organization’s data center, cloud, and the internet. With this launch, Kentik can observe the entire network — on prem, in the cloud, on physical hardware or virtual machines, and anywhere in between.

Cracking Performance Issues in Microservices with Distributed Tracing

Microservices architecture is the new norm for building products these days. An application made up of hundreds of independent services enables teams to work independently and accelerate development. However, such highly distributed applications are also harder to monitor. When hundreds of services are traversed to satisfy a single request, it becomes difficult to investigate system issues.

Better Lambda Performance with Lumigo and the Serverless Framework

Lambda is the glue that holds serverless architectures together. Before its release, most users felt it was a matter of luck as to whether AWS would let you connect a service to another. If not, you had to spin up a VM or a container to transform the events from one service in a way that your target service could handle them. Since Lambda was easier to set up, people assumed that all code they would deploy on it would run faster and cheaper than on other compute services.

Using hyperautomation to help SAP customers grow their business - Avantra and Managecore

Hyperautomation is the business philosophy that the modern enterprise is using to rapidly automate SAP tasks and provide value back to the customer. A term coined by Gartner in 2019, hyperautomation and its rise are said to be indicative of how much RPA benefits businesses. In this episode of the EM360 Podcast, Editor Matt Harris welcomes John Appleby, CEO at Avantra, and Nick Miletich, CTO at ManageCore, to discuss.

How to audit an SAP system: A complete guide

The systems used by businesses today are complex and usually involve more than one software system. This can make it difficult for businesses to ensure that all of their systems are working together effectively. The SAP system, used by businesses around the world, is one of the most complex and comprehensive software systems used today.

How to monitor the health and resource usage of Kubernetes nodes in Grafana Cloud

The spine is essential to perform every activity, like crawling, walking, or swimming. Just as the spine is necessary to enable these functions, your Kubernetes infrastructure needs a backbone to be efficient and effective. So if Kubernetes clusters act as the spine of your architecture, then Kubernetes nodes are like the vertebrae — they make up a Kubernetes cluster in the same way the vertebrae form the spinal column.

Announcing New GitHub Actions + Honeycomb Integration Guide

If you build or maintain code in GitHub, the Honeycomb Buildevents Action can help you optimize the performance of your build pipelines in GitHub Actions. This blog introduces you to the gha-buildevents Action and a new hands-on quickstart guide that will show you the inner workings of GitHub Actions workflows, the buildevents tool, and the Honeycomb UI.

Intelligently Monitor Your Work From Home Infrastructure for Business Continuity

Working remotely has become the norm globally for businesses, specifically in regards to IT. Many companies have offices spread throughout the world, decentralized employees or contractors, or simply have a flexible work from home (WFH) policy. Remote work is an aspect of digital transformation that is often left out of conversations when it comes to ensuring business continuity and driving growth, despite research indicating that everything will continue to push towards hybrid work environments.

ITIM and Business Objectives

Every organization has business objectives (BO). These objectives can focus on numerous areas across the company and be related to almost anything within the organization: Identifying core objectives is important for the ongoing success of organizations. Objectives help keep organizations focused on what is deemed important for the future, which, of course, differs for each organization.

A Deep-dive into VictoriaMetrics - The Open Source Time Series Database & Monitoring Solution

This video walks you through the inner workings of VictoriaMetrics: All combined and in a nutshell, VictoriaMetrics boasts: Enjoy the video and let us know if you have any questions via the comments section.

Best practices for network perimeter security in cloud-native environments

Cloud-native infrastructure has become the standard for deploying applications that are performant and readily available to a globally distributed user base. While this has enabled organizations to quickly adapt to the demands of modern app users, the rapid nature of this migration has also made cloud resources a primary target for security threats.

adidas tracks sports activities with help from RabbitMQ

You go for a run, and when you’re done, your buddy, adidas Running, will tell you how you did. Maybe you hit a new goal this run or need encouragement to work harder next time. No matter what, behind the scenes there is a dedicated team of engineers relying on the stable foundation of RabbitMQ. Ever since sports tracking apps saw the light of day, it’s almost like a workout without tracking, somehow is less valuable to us. Imagine a run streak on day 1,000 without functioning measurements!

Sponsored Post

Network Performance Monitoring Is Only Step One

Incident response aims to identify, limit, and mitigate an incident. Whether such an occurrence is a security breach or a hardware failure, formulating and continuously strengthening an incident response strategy has become vital for all businesses in the digital age. Your incident response strategy consists of the processes your organization takes to handle incidents-such as network outages and service-impacting bugs-and the steps taken to mitigate incidents.

It's all about the preparation - Automated preparation!

Recently I joined a team to run a ‘Ragnar’. A ‘Ragnar’ is a running race where you and an additional seven teammates run a course that typically lasts around 24 hours non stop. Our team joined over 300 other teams running this Ragnar trail course in the middle of the woods in northern Wisconsin. This course consisted of three loops, a 3 mile loop, a six mile loop and a seven mile loop.

Unified Observability: Announcing Kubernetes 360

Ask any cloud software team using Kubernetes (and most do); this powerful container orchestration technology is transformative, yet often truly challenging. There’s no question that Kubernetes has become the de-facto infrastructure for nearly any organization these days seeking to achieve business agility, developer autonomy and an internal structure that supports both the scale and simplicity required to maintain a full CI/CD and DevOps approach.

Real-Time Embedded Linux Observability with Pantavisor and InfluxDB

This article was originally published on HackMD and is reposted here with permission. Presently organizations are unable to monitor millions of embedded Linux devices in real-time. With so many different architectures and device types, aggregating telemetry and metrics and viewing that data in a centralized analysis tool is problematic. Onboarding embedded Linux devices into a telemetry service so that metrics can be easily observed is a significant challenge.

Reimagining the Modern Workplace Post-Pandemic

You’re probably bored with talking about Covid – we certainly are. But something that we still find interesting is that in the modern workplace, how people now interact with one another, how they work together and the communication tools that they use play a critical role in boosting their overall productivity. Because of the pandemic, many of us now split our time working between the home and the office – so how can we reimagine the modern workplace to get the most out of it?

Grafana and Cilium: Deep eBPF-powered observability for Kubernetes and cloud native infrastructure

Today, Grafana Labs announced a strategic partnership with Isovalent, the creators of Cilium, to make it easy for platform and application teams to gain deep insights into the connectivity, security, and performance of the applications running on Kubernetes by leveraging the Grafana open source observability stack.

Import Datadog Traces Into Honeycomb

Getting existing telemetry into Honeycomb just got easier! With the release of the Datadog APM Receiver, you can send your Datadog traces to the OpenTelemetry Collector, and from there, to any OpenTelemetry-compatible endpoint. Often, evaluating a new tracing solution requires re-instrumenting your applications from the ground up in a new vendor’s tooling. It’s a pretty high bar to clear just to see if a solution is worth adopting.

Google & Goliath Technologies Partner to Deliver ChromeOS Device Monitoring & Troubleshooting

Philadelphia, PA – October 24th, 2022 – Goliath Technologies, a leader in end-user experience monitoring and troubleshooting software for hybrid cloud environments, announced a strategic partnership with Google to help businesses with a Google Enterprise license quickly identify and isolate the true root causes of IT performance issues, determining if it is related to Google ChromeOS & ChromeOS Flex devices or other components of their infrastructure by automatically and intelligently corre

Bring Efficiency to Log Management in DigitalOcean

The ongoing partnership between Papertrail and DigitalOcean led to the development of the Papertrail software as a service (SaaS) add-on in the DigitalOcean Marketplace. With the add-on, developers can add powerful, simple, and scalable Papertrail log management to their DigitalOcean infrastructure in seconds. In two earlier posts, we reviewed how the add-on helps teams simplify and centralize log management.

How Cribl's Suite of Solutions Help Prevent Zombie Data

In part 1 of this series, we talked about zombie data and what it means for your observability architecture. In this post, we’ll talk more about how to handle all of it. How well can your organization handle the firehose of data it’s collecting? Yes, you have the ability to collect it, but chances are you don’t have the financial or human resources available to analyze all of it effectively.

How We Built It: Getting Spooky with Splunk Dashboards

Dashboards are not just tools for businesses and other organizations to monitor and respond to their data, but can be a method of storytelling. All of our data has the potential to be crafted into compelling narratives, which can easily be accomplished with the help of Dashboard Studio’s customizable formats and advanced visualization tools. We can take a series of disparate datasets and bring them together in one place if they share a common theme — in this case, Halloween.

Condé Nast Uses Proactive Monitoring Solution to Better Serve its Media Empire

Global Media Company Resolves Nearly 50% of Network Detection Problems Before Readers Log On. Condé Nast, the international print and digital media company who publishes Vanity Fair, The New Yorker, Wired and more, was eager to find an automated solution for its network monitoring issues. Condé Nast wanted to provide those logging onto the digital versions of their numerous magazines with reading experiences uninterrupted by network or server hold ups. The IT team required a swift and efficient way to create error reports, as opposed to their old way of using pencils and paper.
Sponsored Post

What Is the Controllability and Observability of Cloud Applications?

There are many computing resources used in different cloud application services to provide online software-as-a-service (SaaS). SaaS differs from traditional applications in that it works from a cloud computing environment. This means that both the application service as well as user data are being hosted by a cloud provider in the cloud. Therefore, the SaaS and data are accessible from anywhere as long as there's online access. This model provides a distinct advantage from a software perspective.

Bring Your Zombie Data Back to Life with Cribl Search

We’ve reached the point where our ability to collect data has actually exceeded our ability to process it. Nowadays, it’s commonplace for organizations to have terabytes or even petabytes worth of data sitting in storage, waiting patiently for well-intentioned systems admins to eventually analyze it.

OpenTelemetry Java - Your Guide to Getting Started

OpenTelemetry (OTel), an open source project under the Cloud Native Computing Foundation (CNCF), is a collection of tools, APIs and SDKs for generating and collecting observability data (mainly trace, metrics and logs) from cloud-native applications. An industry-standard for distributed tracing and observability, OTel enables analyzing application health and performance to ensure production-readiness and support production monitoring.

Sponsored Post

What Makes Citrix Difficult for Synthetic Monitoring Solutions?

Synthetic monitoring and automated testing are a major advantage if you can use them. Unfortunately, some challenges stand in the way of running these technologies outside of a web browser. Trying to run synthetic monitoring on Citrix is even more challenging. Thankfully, 2 Steps has identified and solved these issues. In this post, we'll take your through the challenges that running a synthetic monitoring and automated testing solution on Citrix poses and tell you how 2 Steps makes those challenges go away.

Metrics in Minutes: Prometheus Metrics into Coralogix using Open Telemetry

In this video, we'll explore how to connect Prometheus to Coralogix, using the Open Telemetry collector as an abstraction layer. This deployment minimises 3rd party code in your system, and provides a completely open source path to integration, all while using industry standard, simple tooling.

Introducing PrivateLink Support for Enterprise

Network topology can get very complicated in the cloud, especially when you’re sending data to external SaaS providers. You will likely need to configure gateways and firewalls and keep close tabs on those points of egress. However, if your infrastructure exists within AWS, there’s a much simpler way and that’s through an AWS PrivateLink endpoint.

Modern IT Infrastructure Management: Three Pillars for Success

Your IT Infrastructure team faces untenable demands on their time and resources as your organization increasingly relies on complex hybrid infrastructures and an ever-growing set of technologies and cloud-based services. You can actually limit visibility and slow triage as you add monitoring tools to cover this expansion, blocking critical insight into your environment at the IT service level. What’s really required to be successful in today’s IT infrastructure environment?

How to monitor systemd service liveness

The life of a sysadmin or SRE is often difficult, but occasionally very simple things can make a huge difference. Basic monitoring of your systemd services is one of those simple things, which we sometimes overlook. The simplest question one would want to know is if the thing that’s supposed to be running is actually running at all. If you use systemd services, you can guarantee an answer to that question within minutes using Netdata.

Goliath and ChromeOS - Full User Experience Visibility

Experience monitoring is a powerful tool and an advantage to any administrator. However, even with the tooling, sometimes the view isn’t always complete. Most monitoring tools focus only on the resources being accessed. Some bring in additional information about the hosting and networking those resources are on. But in any EUC solution, the device that is accessing is integral to the end-user experience.

7 types of Redis latency and how to fix it

Redis is designed to be fast. In most cases, it is. However, there are times when Redis may be slow, due to network issues, disk latency, or other factors. When this happens, it is important to be able to detect the slow down and investigate the cause. Latency is the maximum delay between the time a client issues a command and the time the reply to the command is received by the client. Redis has strict requirements on average and worst case latency.

Google Cloud Managed Service for Prometheus

Welcome back to GKE Essentials! In this episode, Kaslin Fields explores a key element of your GKE observability: Google Cloud Managed Service for Prometheus. Watch to see how Google Cloud's fully managed multi-cloud solution for Prometheus lets you globally monitor and alert on your workloads without having to manually manage and operate Prometheus at scale.

What's the hype about Machine Learning?

Can it help businesses? Machine learning is an inescapable buzzword for many in the operations sector. Even friends and colleagues tend to make us aware of a new ML tool that may or may not be useful. While there are many ML tools in the market, not all are suitable for every business. Some tools, when tested, struggle to solve basic, everyday use cases. Therefore, when evaluating ML tools, other deeper questions and issues do arise.

How DevOps Monitoring Works: Concepts, Types & Best Practices

DevOps is an IT delivery concept that combines people, practices and tools with the shared goal of accelerating the development of applications and services. Adopting DevOps at enterprise level typically requires: The continuous development of DevOps practices, as well as other factors like the rapid pace of modern code changes, facilitates a need for DevOps monitoring: a set of tools and processes to support the entire software development lifecycle.

Introducing Splunk App for Chargeback

The Splunk App for Chargeback provides the ability to analyze and manage how internal business units, departments, and individuals are consuming Splunk resources. The three main features of the App are: The Splunk App for Chargeback allows you to manage, monitor, and forecast resource utilization in your shared Splunk environment. You can use Splunk App for Chargeback to focus on any business unit, department, or an individual user.

AIOps: The Future of IT Operations Management in 2023

AIOps stands for artificial intelligence for IT Operations. AIOps, is a set of tools and algorithms that gather data from the entire IT environment, including different monitoring systems, log files and other IT data sources. It then analyzes and applies machine learning algorithms to determine the root cause of an incident. This means that instead of having to go through a long troubleshooting process by analyzing log files and manually looking for root causes, AIOps does it for you in minutes.

Nodejs Performance Monitoring | Monitor a full-stack Nodejs application with open-source tools

Nodejs tops the list of most widely used frameworks for server-side programming by developers. Powered by Google’s V8 engine, its performance is incredible. As a result, Nodejs has now become a critical part of the technology stack of large-scale enterprises and startups. And as nodejs is based on Javascript, it is also easier to learn and begin with. Nodejs is a dynamically typed single-threaded programming language.

What's new in Avantra 23.0

Wow, what a year it has been since the release of our last major version 21.11. This year we’ve focused on automation, improving the automation engine we delivered with 21.11.0 across our minor releases for the year and we’ve already received great feedback from our customers. Avantra 23.0 brings all these improvements together with even more automation features, the next release of our new UX project, new automation integrations and much more.

The Top New Relic Competitors

Platforms like New Relic simplify enterprise cloud complexity, accelerating digital transformation, and simplifying cloud management. Their company serves over 10,000 businesses worldwide with AIops, distributed tracing, browser monitoring, infrastructure monitoring, and synthetic monitoring solutions. As a result of New Relic's lack of onboarding support, some engineers have complained about burnout while implementing the solution.

Zen and the Art of Kubernetes Monitoring

The real beauty of this modern, cloud-fueled, DevOps-driven world that we are living in is that it’s so highly composable. In so many ways, we’ve been freed from the limitations and structures of the previous annals of software and technology history to build things the way that we want to, and however we choose to do so.

11 Top Website Uptime Monitoring Tools to Know

Uptime is the time period when your website and all of its contents are fully functional and accessible. It is expressed as a percentage, thus if your web is up 99% of the time, you can expect more than 7 hours of downtime every month. Downtime, as the name suggests, is when all or a portion of your service is unavailable. Disruptions result and potential customers are lost. Uptime monitors are useful for correctly logging, assessing, and informing on each downtime.

When Cloud Native Stacks Misbehave - Pitfalls and Lessons Learned | Itiel Shwartz (Komodor)

In this session, Itiel Shwartz will demonstrate common failure scenarios - both app and infra related. We will laugh a little and cry a little, and then cover monitoring, observability & troubleshooting best practices methodologies such as metrics, distributed tracing, logging, network visualization and more. But cheer up! We’ll wrap up by introducing some helpful tools, in order to find and fix issues as fast as possible.

Bringing "Blameless" to Traffic Court | J. Paul Reed (Release Engineering Approaches)

What do modern incident analysis techniques and moving violations have in common? This Quick Bite tells the story of taking the same retrospective techniques the most innovative technology companies in the world use to understand their operational incidents... to traffic court, to help us all understand what really happened? What happened next? Come find out!

ScienceLogic Achieves Record Q3, Meeting Strong Demand for IT Transformation

I am thrilled to announce that ScienceLogic posted its best quarterly sales bookings in our company’s history! Amid warnings from the broader tech industry slowdown, ScienceLogic proved once again to be an outlier, as current macro trends continue to drive strong demand for IT transformation.

How to monitor Microservices?

Microservices are being used every where and for good reasons. They do provide you with many benefits especially improved focus and cutting the time to market. Microservices do bring complexities too. Monitoring microservices is complex because of simply the number of them. Monitoring a user transaction requires monitoring many microservices. Correlating the data from them to identify the root cause manually is a nightmare especially in a complex environment with 100s or 1000s of microservices.

Ship AWS Cloudwatch Logs to Any Destination with OpenTelemetry

With observIQ’s latest contributions to OpenTelemetry, you can now use free open source tools to easily aggregate logs across your entire infrastructure to any or multiple analysis tools. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here.

How to manage high cardinality metrics in Prometheus and Kubernetes

Over the last few months, a common and recurring theme in our conversations with users has been about managing observability costs, which is increasing at a rate faster than the footprint of the applications and infrastructure being monitored. As enterprises lean into cloud native architectures and the popularity of Prometheus continues to grow, it is not surprising that metrics cardinality (a cartesian combination of metrics and labels) also grows.

How to autoscale Grafana Loki queries using KEDA

Grafana Loki is Grafana Labs’ open source log aggregation system inspired by Prometheus. Loki is horizontally scalable, highly available, and multi-tenant. In addition, Grafana Cloud Logs is our fully managed, lightweight, and cost-effective log aggregation system based on Grafana Loki, with free and paid options for individuals, teams, and large enterprises.

Your IT Alerting Software is Failing You. We Can Help.

Sometimes an IT ticket is just an IT ticket. But far more often, when one or a few tickets are submitted, it means there are many more users and systems exposed to the same issue. IT issues can quickly get out of control and affect many employees, sometimes overnight. When these get out of control, they can become “top call drivers” that bring your team, department, business lines, and even entire business to a halt.

Iterating on an OpenTelemetry Collector Deployment in Kubernetes

When you want to direct your observability data in a uniform fashion, you want to run an OpenTelemetry collector. If you have a Kubernetes cluster handy, that’s a useful place to run it. Helm is a quick way to get it running in Kubernetes; it encapsulates all the YAML object definitions that you need. OpenTelemetry publishes a Helm chart for the collector. When you install the OpenTelemetry collector with Helm, you’ll give it some configuration.

A Simplified Introduction to Azure Database for PostgreSQL Flexible Server

I find it amazing how much opportunity and flexibility cloud environments are creating for organizations of all sizes. I’m seeing more and more companies experimenting with open-source software (OSS) relational database systems, which years ago would’ve been too complicated for the customer to set up. With Azure, you can spin up OSS systems like MySQL or PostgreSQL quickly to determine if the engine fits their needs. If it does, you can continue development on it.

Redis Monitoring: What Metrics Should You Measure to Ensure Performance

Redis is an open-sourced, BSD 3 licensed, highly efficient in-memory data store. It is used widely in the industry because of its incredible performance and ease of use. It can easily be used as a distributed, in-memory key-value store, cache, or message broker. It can hold virtually any data structure, making it highly versatile. Redis was architectured and developed with speed in mind and designed to keep all the data in memory.

Top 3 Issue Alert Tips to Stop Noisy Notifications

Sentry Alerts ping you on Slack, Microsoft Teams, or Pager Duty when something goes needs your attention. However, too many alerts can turn your notification channel into an endless noise feed. I spoke with dozens of Sentry customers in the past 6 months, and something I heard over and over again was “Sentry can get noisy at times” and “There are days I can’t keep up with Sentry notifications because we get so many of them”. Does this sound familiar?

How to Reduce Costs With DevOps

As defined by Amazon Web Services, DevOps is the integration of cultural concepts, practices, methods, and tools which allow an organization to provide services and applications at high speed: advancing and improving their products at a much faster rate than those using traditional software process for infrastructure management and development. This allows organizations to serve clients more effectively and compete in the market.

Use Datadog Continuous Testing to release with confidence

Testing early and often in the development cycle is a must for ensuring that your application meets user expectations. Poor performance and errors can alienate users and prevent you from meeting crucial benchmarks and OKRs. Additionally, having to constantly implement fixes after new, under-tested features are added can fatigue developers and strain your resources, making your organization less nimble overall.

Leverage collaborative screen sharing with Datadog CoScreen

Remote collaboration tools have transformed how remote and hybrid teams work synchronously. But while the current popular chat forum and video conferencing solutions are inarguably helpful, few were created with software development and operations in mind. CoScreen is the only real-time collaboration tool designed specifically for remote and hybrid engineering teams that integrate both interactive screen sharing and video conferencing features.

Identify and redact sensitive data in APM, RUM, and Events stream with Sensitive Data Scanner

Customer-facing applications request and process many types of sensitive data, such as API keys, credit card numbers, and email addresses. As your application scales in size and complexity, it becomes harder to keep track of this sensitive data moving across more services, increasing the risk of data leaks.

Announcing PCI-Compliant Log Management and APM from Datadog

For any organization that stores, processes, or transmits cardholder data, monitoring can pose a particular set of challenges. The Payment Card Industry (PCI) Data Security Standard (DSS) dictates rigorous monitoring and data security requirements for the cardholder data environments (CDEs) of all merchants, service providers, and financial institutions.

Gain visibility and control of your cloud spend with Datadog Cloud Cost Management

To optimize its cloud investments, your organization needs internal stakeholders to act on shared knowledge about its cloud costs and cloud usage. But in practice, it’s difficult for organizations to gain a high degree of clarity about their cloud spending. The factors contributing to cost data are not normally visible to all stakeholders, and it’s often impossible to attribute costs to the teams, services, and applications that incurred them.

Dash 2022: Guide to Datadog's newest announcements

Today at Dash 2022, we announced new products and features that enable your teams to break down information silos, shift testing to the left, monitor cloud and application security, and more. Now, you can analyze cloud cost data alongside other telemetry, create synthetic tests for your mobile applications, and prevent malicious activity in your environment by blocking IPs directly from Datadog. We expanded Sensitive Data Scanner to include APM, RUM, and Events stream data.

Coming soon! Raygun Alerting's Microsoft Teams integration

Microsoft Teams is a popular integration request for our Alerting feature, and the Raygun team is busy at work making this feature available to all our customers. We will be notifying all our customers when this feature is available. You are also welcome to keep an eye out for the launch announcement in our Changelog. Since it’s such a popular integration request, we thought we’d share some specs and screenshots with you as we progress through the work.

The Open Source Observability Adoption and Migration Curve

Open source monitoring and observability tools can be found in production all over the world – whether they’re being used by startups or entire enterprise development teams. DevOps, ITOps, and other technical teams rely on tools like Prometheus, Grafana, OpenSearch, OpenTelemetry, Jaeger, Nagios, Zabbix, Graphite, InfluxDB, and others to monitor and troubleshoot their cloud environment.

Untangling Account Management With User Permissions

Companies, like most things, rarely grow in a straight line. Plants will take root where they can, and send shoots where they can to get the most sunlight, even if there are obstacles in the way. But vines and branches aren’t known for their efficient pathing, which can make a tangled mess of the whole plant. So get a good sun hat and some pruning shears ready; you’ll need them today! The difference between organic and structured growth is one of purpose and planning.

Your Business Requires a Resilient Internet

One of my initial surprises upon joining Catchpoint about five months ago was to do with how much confusion there is in the observability market. Every single vendor has almost the same message around ensuring a great digital experience for your customers or employees or both. Of course, these experiences are critical to get right, but for the most part many of these solutions, at best, help to ensure that sites are live and available, and that they are reachable by some users.

How you can use the Pandas Python collector to monitor weather data

Netdata just launched a Pandas collector. Pandas is a de-facto standard in reading and processing most types of structured data in Python so if you have some csv/json/xml data, either locally or via some HTTP endpoint, containing metrics you’d like to monitor, chances are you can now easily do this by leveraging the Pandas collector without having to develop your own custom collector as you might have in the past.

Cost Advisor: Optimize and Rightsize your Kubernetes Costs

Kubernetes has broken down barriers as the cornerstone of cloud-native application infrastructure in recent years. In addition, cloud vendors offer flexibility, speedy operations, high availability, SLAs (service-level agreement) that guarantee your service availability, and a large catalog of embedded services. But as organizations mature in their Kubernetes journey, monitoring and optimizing costs is the next stage in their cloud-native transformation.

Observing Schrödinger's Python App

As a developer, I love the versatility of Python. Over the years I have used Python for so many different use cases: game development, APIs, IoT, machine learning, and web development. It can scale tall applications in a single bound and take on any challenge faster than you can pip install flask. Something you learn very quickly in the world of app development is to build everything for scale.

Scheduling Tasks in PHP

In the scenario where you want to execute tasks repeatedly at a specific time and have full control over when they are executed and how the results are handled, it makes sense to build this into your application instead of setting up a cron job, for example. I’d like to give you a quick example of how you can achieve this in PHP using two great libraries, ReactPHP and cron-expression. ReactPHP is an event-driven programming library that has an event loop at its core.

Everything You Need to Know About SolarWinds Observability-Our Transformational Subscription Service

Transformation is key to being at the forefront of the tech industry, and over the past two years, I’ve been excited to lead an outstanding team of developers and engineers as we’ve embarked on evolving our monitoring tools toward observability. With this in mind, we’re excited to announce two significant product releases today. The first is a completely new product offering and subscription service we call SolarWinds® Observability.

It's Time to Rethink Observability and Rethink SolarWinds

Everyone in the information technology industry understands “change” is guaranteed. People are creative and constantly striving to find more efficient ways to solve problems and more innovative ways to deliver services to consumers. But keeping up with the constant cloud and internet technology shifts and taking advantage of all the new capabilities is a harrowing task for digital organizations.

Sysdig Cost Advisor: Optimize your Kubernetes Costs

Every company running its applications on the cloud needs to estimate its operating costs, but running workloads on Kubernetes clusters across multiple providers often makes it hard. Without Kubernetes context in the cloud billing reports, users aren’t able to group costs or effectively assign the resources to the proper cost center. To address these gaps in Kubernetes cost monitoring, we are excited to announce Cost Advisor, a new feature in Sysdig Monitor that will give you visibility into Kubernetes costs and automatically help you identify areas to reduce them.

SolarWinds Hybrid Cloud Observability - Evolving Beyond Monitoring

Learn more about SolarWinds® Hybrid Cloud Observability and how it can help organizations of all sizes and industries optimize performance, help ensure availability, and reduce remediation time across on-premises and multi-cloud environments by increasing visibility, intelligence, and productivity.

SolarWinds Observability - A Unified Full-Stack Solution for DevOps Teams

SolarWinds® Observability is a SaaS offering that unifies application, infrastructure, database, network, digital experience, and log analysis into a single, integrated platform. The solution is designed to grow and expand to accommodate whatever kind of environment you manage.

Find and Fix Bottlenecks in Your Gradle Builds With OpenTelemetry and Honeycomb

Today, I’d like to share with you a new community-contributed integration that helps you optimize and debug your Gradle builds. This new Gradle plugin is available today, is free to use, and you can use it immediately with a free Honeycomb account.

Microsoft Teams Isn't Down Because of Microsoft. There's A Lot More Going On

Another ticket rolls in complaining about Teams service quality – calls keep breaking up, or the video quality regularly drops out. And when each one mentions that Microsoft Teams often does this, it’s easy to pass the blame. Sure, people jump to conclusions when they’re frustrated by tech, but the truth is that Microsoft Teams itself is rarely the problem.

Tracking website changes to receive alerts when new Calendly appointments become available

Teams and individuals worldwide use Calendly to help them with appointment scheduling without the hassle of juggling their calendars and conflicting meetings. With features like automated reminders and routing forms, it’s no surprise that it’s considered one of the best scheduling apps on the market.

Don't Know What to Monitor? L.E.T.S. Start with 4 Metrics!

Software monitoring, how does it work? “We paid for a bunch of tools but we don’t know what we should be looking at. There are tons of charts that don’t seem to mean anything!” If you talk to people about software monitoring you’ve inevitably heard something similar to this. With so many possible metrics it can feel like searching for a needle in a haystack. Even with curated dashboards there is inherent confusion about what is important.

Cloud Logging pricing for Cloud Admins: How to approach it & save cost

Flexera’s State of the Cloud Report 2022 pointed out that significant cloud spending is wasted, a major issue that is getting more critical as cloud costs continue to rise. In the current macroeconomic conditions, companies focus on identifying ways to reduce spending. To effectively do that, we need to understand the pricing model. We can then work towards the challenges of cost monitoring, optimization, and forecasting.

Container Observability

In the recent past, container-based deployment architectures have played a significant role in improving applications on multiple fronts, including: Containers are all-inclusive packages containing lightweight services which are easy to spawn and terminate. However, container-based deployments can comprise hundreds of individual services and their replicas spinning up and down at any moment.

The 7 Pillars of DEX Visibility: How Nexthink provides a complete picture of your Digital Workplace

The ‘last mile’ is a term used in the logistics and telecommunications industry to characterize the obstacles businesses face trying to deliver products to customers during the very last part of their supply chain. You might have a world-class infrastructure, high-end technology, and the most sophisticated processes and skilled resources, but if your product doesn’t reach the customer at the right time and place, your business suffers.

Monitor ALL of Salesforce with Exoprise

Salesforce customer relationship management (CRM) performance is critical to sales, marketing, and customer service for smooth business operations. As a result, Information Technology (IT) teams and developers often customize the cloud platform using thousands of apps, plugins, and APIs available in the Salesforce AppExchange Store to boost collaboration and improve efficiency.

How to create a data integration strategy for your organization

Developing a strategy for integrating data across your organization helps ensure that everyone has access to the most up-to-date data in a secure way. This article provides an example of a strategy you can use to develop your own. Despite the global digital acceleration of data use cases, many companies still struggle to be data-driven.

The Ultimate Guide to Containers and Why You Need Them

Containers have long been used in the transportation industry. Cranes pick up containers and shift them onto trucks and ships for transportation. Container technology is handled in a similar vein in the software world. A container is a new and efficient way of deploying applications. A container is a lightweight unit of software that includes application code and all its dependencies such as binary code, libraries, and configuration files for easy deployment across different computing environments.

How to track AWS costs with the AWS Cost Explorer app for Sumo Logic

From Sumo Logic’s inception over a decade ago, we made a strategic bet to go all in with Amazon Web Service (AWS). Today, many of our customers rely on Sumo Logic to gain unified visibility into their growing number of AWS services, cut troubleshooting time and unlock comprehensive root cause analysis for complete issue resolution.

Bridge Your Data Silos to Get the Full Value from Your Observability and Security Data

In my work as a technical evangelist at Cribl, I regularly talk to companies seeing annual data growth of 45%, which is unsustainable given current data practices. How do you cost effectively manage this flood of data while generating business value from critical data assets?

Serverless observability: Lumigo or AWS X-Ray

Observability is a measure of how well we are able to infer the internal state of our application from its external outputs. It’s an important measure because it indirectly tells us how well we’d be able to troubleshoot problems that will inevitably arise in production. It’s been one of the hottest buzzwords in the cloud space for the last 5 years and the marketplace is swamped with observability vendors. Different tools employ different methodologies for collecting data.

0 to Observable: From Kubernetes Logs to Container Observability with Coralogix

In this video, we begin with a local Kubernetes cluster. From there, we will add a collector agent, the Open Telemetry Collector and configure it to push logs to Coralogix. However, we won't stop there. We'll then use the Logs2Metrics feature to transform those logs into some key container metrics, and visualise them using a DataMap. From 0 to observable in 15 minutes.

How Logz.io Uses Observability Tools for MLOps

Logz.io is one of Logz.io’s biggest customers. To handle the scale our customers demand, we must operate a high scale 24-7 environment with attention to performance and security. To accomplish this, we ingest large volumes of data into our service. As we continue to add new features and build out our new machine learning capabilities, we’ve incorporated new services and capabilities.

Stop Trusting Container Registries, Verify Image Signatures

One of InfluxData’s main products is InfluxDB Cloud. It’s a cloud-native, SaaS platform for accessing InfluxDB in a serverless, scalable fashion. InfluxDB Cloud is available in all major public clouds. InfluxDB Cloud was built from the ground up to support auto-scaling and handling different types of workloads. Under the hood, InfluxDB Cloud is a Kubernetes-based application consisting of a fleet of micro-services that runs in a multi-cloud, multi-region setup.

Get better insights from industrial IoT data with Grafana

Varland Plating has been in the electroplating business since 1946. At their industrial job shop in Cincinnati, Ohio, they perform complex electrochemical treatments on steel, brass, and copper manufactured parts to create everything from corrosion-resistant building materials to decorative metals.

ANS increases visibility for customers with easy implementation

LogicMonitor has helped ANS provide data and visibility to their customers which is a critical competitive advantage. LogicMonitor’s unified observability solution was differentiated from other competitors through the ease of implementation, and enterprise scalability.

The Future of the Network is End User Experience. Are you Ready?

When you think of a network, what comes to mind? MySpace, Facebook, Instagram, TikTok… LinkedIn if you’re all business? No, no, no, not what I meant! Not social networks – IT networks. What comes to mind now? Switches? Access points? Firewalls? A score of ethernet cabling? For us too, the network symbolizes all the traditional things that Auvik helps IT teams around the world manage and monitor.

How BGP propagation affects DDoS mitigation

We often think of DDoS attacks as volumetric malicious traffic targeted against organizations that effectively take a service offline. Most frequently detected by anomalous behavior found in NetFlow, sFlow, IPFIX, and BGP data, what may not be well understood is how the DDoS mitigation works and how it’s possible to visualize the effectiveness of the mitigation during and after an attack.

What is Cloud Application Performance Monitoring?

Due to its complex nature compared to the on-prem architecture, the cloud presents several challenges to maintaining observability in your apps. This is why you need a robust cloud monitoring solution to care for your cloud-based apps and your cloud. As more and more companies move their operations to the cloud, the need for cloud monitoring solutions continues to grow. Understanding the various aspects of cloud monitoring and how it affects your application is essential.

How to monitor the impact of Puppet Runs using Hosted Graphite

In this article, we will look at what Puppet is and why it is important to monitor Puppet server metrics. We will also analyze the tools that help us monitor Puppet’s performance. Ultimately we will learn about the benefits of using Hosted Graphite by MetricFire to monitor Puppet server metrics. Sign up for MetricFire free trial today or book a demo with the MetricFire team to understand how you can take advantage of its monitoring solutions.

Announcing Rollbar Live In-App Chat Support

What’s better than great support? Live in-app support with a real person when you need it During the last two months we have been rolling out our live chat capability for all Rollbar users regardless of the plan you are on. That's right; you can now speak to a real person from our customer engineering team to help answer your product related question from within the Rollbar application. Our goal is to provide more support channels when you need it without having to wait for an answer.

How BGP propagation affects DDoS mitigation

Doug Madory, Kentik director of internet analysis, and Phil Gervasi, director of tech evangelism, discuss the nuance of coordinating the mitigation of a DDoS attack and how we can use Kentik to see the propagation of BGP announcements on the public internet before, during, and after the DDoS attack mitigation.

Your Infrastructure Monitoring Tool May Be Holding Back Productivity

Productivity is one of the measures economists use when looking at the health of and growth (or lack thereof) of economies. Productivity growth is the ability of people to do more with the same or only marginally more effort. So, when Henry Ford introduced his assembly line to automobile manufacturing, he dramatically increased employee productivity. In the case of Henry Ford, the benefits of massive increases in productivity included.

What is Istio Service Mesh, and Do I Need It?

Development teams build modern applications using microservice architectures. Individual services are built and maintained by separate teams, and then these services are combined using container-based orchestrators to comprise a complete product offering. Microservices are a standard development method because they allow teams to iterate releases, providing ongoing new customer-facing features and bug fixes without needing to redeploy an entire platform or app.

SOAP vs REST vs JSON - a 2023 comparison

SOAP vs REST vs JSON are comparisons that are frequently made in discussions about web services. While SOAP and REST are both leading approaches to transferring data over a network using API calls, JSON is a compact data format that RESTful web services can use. Deciding whether you should create a SOAP vs REST API is an essential question if you are planning to provide a web service. Each architectural style has its own use cases, benefits, and limitations.

Apache Kafka in the Airline, Aviation and Travel Industry

Apache Kafka is the de facto standard for event streaming use cases across industries. Many use cases can be applied to the aviation industry, too. Concepts like payment, customer experience, and manufacturing differ in detail. But in the end, it is about integrating systems and processing data in real-time at scale. For instance, omnichannel retail with Apache Kafka applies to airline, airports, global distribution systems (GDS), and other aviation industry sectors.

Exoprise Extends the Service Watch Platform to Optimize the Digital Employee Experience

The secret to enterprise success is about driving digital business acceleration. With a new unified communication as a service (UCaaS) experience-based solution, Exoprise is at the forefront of improving workplace productivity and increasing employee retention.

Collect GitHub audit logs and scanning alerts with Datadog

For most organizations, GitHub is mission critical. Your GitHub repositories likely also contain some of your organization’s most sensitive data. GitHub provides tools to help you protect and govern this data, with tools such as audit logs, code scanning alerts, and secret scanning alerts. However, analyzing these logs and alerts through GitHub’s UI can be challenging. For example, looking for trends in your code scanning alerts over time through GitHub’s UI is just not possible.

How to Enrich Logs and Metrics with OpenTelemetry Using BindPlane OP

Data enrichment is the process of adding additional context or attributes to telemetry data at the source that increases its value during analysis. OpenTelemetry, a collaborative open source telemetry project with the largest organizations in the observability space, can be configured to enrich logs and metrics from dozens of sources. This blog will show you the basics of how to use BindPlane OP to easily deploy and configure OpenTelemetry to enrich data from a source.

Q&A from Our Recent Observability Webinar

Earlier this month I hosted the “Everything You’ve Heard About Observability is Wrong (Almost)” webinar– thanks to all of you who attended. I wanted to follow-up with the attendees as well as those who were not able to join. As promised, it wasn’t the same old Observability presentation that we have grown accustomed to you know, all marketing with little value.

How to monitor web servers and their performance

Web servers are among the most important components in modern IT infrastructures. They host the websites, web services, and web applications that we use on a daily basis. Social networking, media streaming, software as a service (SaaS), and other activities wouldn’t be possible without the use of web servers. And with the advent of cloud computing and the movement of more services online, web servers and their monitoring are only becoming more important.

Debugging Just Got Faster and Easier With New Enhancements to BubbleUp

BubbleUp is Honeycomb’s machine-assisted debugging feature and is one of our most powerful differentiators. It leverages machine analysis to cycle through all of the attributes found in billions of rows of telemetry to surface what is in common with problematic data compared to baseline data. This explains the context of anomalous code behavior by surfacing exactly what changed when you don’t know which attributes to examine or index, dramatically accelerating the debugging process.

Sentry Performance Issues | Detect N+1 Database Queries

Learn about Sentry's new performance issues feature that detects N+1 queries. These are actual videos submitted by Sentaurs for our monthly Show-N-Tell. We have not edited them except for obscuring personal information that may appear in screenshots. Some videos may include screenshots that contain fictitious usernames or email addresses for illustrative purposes.

Grafana 9.2 release: Troubleshooting Grafana panels with a new support feature

Ever run into issues building a panel in your Grafana dashboards? To help with those issues, the current support process for Grafana, Grafana Cloud, and Grafana Enterprise often requires many cycles where we request more information. This can be slow, frustrating for both our users and our support teams, and the process makes it difficult to reproduce issues without access to similar data.

Autoscaling Checkly Agents with KEDA in Kubernetes

Checkly private locations enable you to run browser and API checks from within your own infrastructure. This requires one or more Checkly agents installed in your environment where they can reach your applications and our Checkly management API. You need to ensure you have enough agents installed in order to run the number of checks configured in the location. We have a guide to planning for redundancy and scaling in our documentation.

Where Are You In Your Observability Journey?

Observability is the ability to see and understand the internal state of a system from its external outputs. Logs, Metrics, and Traces, collectively called observability data, are three external outputs widely considered to be three pillars of observability. Now more than ever, organizations of all sizes must employ the necessary processes and technologies to harness the power of their data and make it more actionable.

How to monitor HTTP endpoints

The HTTP protocol has become the de facto standard application layer protocol of the internet. From publicly available web sites and APIs to “inter-process” communications in REST based microservice architectures or large Service Oriented Architectures based on SOAP, you find HTTP being used again and again, due to its simplicity and our familiarity with it. How many protocols can you name that have memes for their status codes?

Startup and running configuration management

Configurations are considered the heart of network infrastructure. They are often adjusted to improve the overall workflow of the network environment. One small unnecessary change to a configuration can bring down an enterprise’s entire network infrastructure. Therefore, the changes made to configurations must always be checked to ensure they are in sync with the devices to improve efficiency and performance. A network configuration is generally divided into two parts: 1.

InfluxDB Cloud Native Collectors, Enterprise and Industrial IoT Examples - Part 2

Learn how to deploy InfluxDB Cloud’s Native Collectors with Kepware and the Things Network. In Part 1 of the blog series, we discussed connecting Kepware to InfluxDB using the new InfluxDB Cloud feature Native Collectors! As promised, let’s now discuss how to connect an Enterprise IoT platform, The Things Network to InfluxDB. Before we get to the juicy tutorial let’s run through a quick reminder.

Monitoring HPC system health with Grafana and Psychart

Nicolas Ventura is a critical facilities engineer at NERSC, with experience in both mechanical and computer systems. The National Energy Research Scientific Computing Center (NERSC) is a modern data center that’s home to two powerful high-performance computing (HPC) systems used for worldwide scientific research in genetics, physics, geology, and more. As such, the infrastructure team at NERSC has to closely track the facility conditions to ensure optimal operations.

Product Update - Custom Data Retention Periods for Buckets Made Easy

We love to write and ship code to help developers bring their ideas and projects to life. That’s why we’re constantly working on improving our product to meet developers where they are, to ensure their happiness, and accelerate Time to Awesome. This week, we are covering a product release that we think will save you time and effort when using InfluxDB with data retention requirements.

No query, no problem: How LM Logs is built for everyone

So your team has access to a logging tool? Great! What’s the first thing you want to find? The latest config change gone wrong? Data from 30 days ago when a specific server was at high capacity? Or maybe you’d like to access logs for a certain IP on a certain day for specific HTTP and servers with counts and averages. Hopefully there was training to teach you the specific query languages and expert skills required to answer these questions.

How Instacart Rebuilt Their Release Monitoring Workflow

For a company like Instacart, one of the largest grocery delivery services in the US, a single bug in the codebase could impact millions of customers, shoppers, and their orders. When it came to a major release last year, Instacart’s infrastructure engineering team realized their existing workflow for monitoring the health of hundreds of microservices was no longer sustainable. They needed a better way to detect issues in their codebase before they impacted users.

How to Tail Kubernetes Logs: Using the Kubectl Command to See Pod, Container, and Deployment Logs

Logs are a critical aspect of any production workload, as they give you insight into what is happening in your system and tell you which components may be having issues. The traditional method of looking at logs involves basic Linux commands like tail, less, or sometimes cat.

15 best iOS crash reporting tools for 2023

Picking the best iOS crash reporting tools available in 2023 is a tall order. The market has continued to get more competitive, and a best-in-breed tool needs to monitor crashes, generate crash reports, filter and group errors, plus perform other tasks on top. In this article, we’ve collected the 15 best iOS crash reporting tools to help you make the right decision for your particular requirements.

Overwhelmed with network infrastructure monitoring tools? Why go for many when you just need OpManager Plus?

Network infrastructure monitoring is a crucial part of modern IT business. You need a flawlessly functioning network to deliver services and products to the end users. As the size and complexity of a network grows, so do the stakes. Any issue in a large enough network will cause multiple repercussions, and network administrators fight an uphill battle trying to troubleshoot them. With the right sort of monitoring tool, you can ensure a better experience for both the users and the admins.

Route logs to third-party systems with Datadog Log Forwarding

Large organizations often rely on multiple monitoring tools, security platforms, and auditing systems to meet the diverse needs of their observability, security, engineering, and compliance teams. Because these teams may use the same logs for many different use cases—including detecting potential threats or breaches, troubleshooting errors, and gauging the effectiveness of new features—it can be difficult to effectively standardize and route data.

Discover the values behind log patterns with Pattern Inspector

Whether you’re rushing to troubleshoot an incident or proactively performing a security audit, the trial-and-error process of searching through millions of logs for key information can be time-consuming and cumbersome. To help you quickly surface important details from large swaths of log data, Datadog’s Log Explorer allows you to search and filter your logs, create visualizations, as well as group your logs by fields, patterns, or transactions.

Monitor Azure Cosmos DB for PosgreSQL with Datadog

Azure Cosmos DB for PostgreSQL is a fully managed relational database service for PostgreSQL that is powered by the open source Citus extension. With remote query execution and support for JSON-B, geospatial data, rich indexing, and high-performance scale-out, Cosmos DB for PostgreSQL enables users to build applications on single- or multi-node clusters.

How to Monitor SD-WAN Networks

With the increasing use of cloud-based applications, businesses are more reliant than ever on the Internet to deliver WAN traffic. As a result, they’re migrating from MPLS networks to hybrid WAN architectures and SD-WAN technology. Keep reading to learn how to monitor SD-WAN networks with Network Monitoring to identify issues that native SD-WAN monitoring features often miss.

Partnerships and The Important Role of the UK IT Reseller Community 'The Channel'

For over 40 years the vast majority of UK organisations have chosen to get their IT advice and solutions from specialist IT selling organisations, collectively called resellers. The principal functions resellers provide is advice, products, and services. Today resellers are categorised into sub-groups which better define their type or focus, such as, Managed Service Providers (MSP), Value Added Reseller (VAR), Enterprise, Systems Integrator (SI) etc.

7 log management challenges and solutions

Arthur Conan Doyle's Sherlock Holmes famously said, "You see, but you do not observe." Collecting application logs exhaustively and interpreting them to support business objectives are two different things. Application logs, also called app logs, event logs, and audit trails, are automatically generated records of computational events in IT environments.

Improve Application Reliability With 4T Monitors

StackState’s new 4T Monitors introduce the ability to monitor IT topology as it changes over time. Now your observability processes can trigger alerts on changes in topology that don’t match an ideal state, on deviations in metrics and events and on complex combinations of parameters. Monitoring topology as part of your observability efforts enriches the concept of environment health by adding the dimension of topology.

Monitoring Websites on Black Friday with Sematext

Black Friday is one of the most challenging holidays of the year. In this video, we will take a look at how Sematext Cloud, a full-stack monitoring solution, can help you monitor and troubleshoot any issues you may have in this upcoming holiday. Have full visibility over your stack and send alerts to the correct people when something goes wrong.

How to monitor Istio with Sysdig

In this previous article, we talked about how to monitor the Istio service mesh in Kubernetes with the out-of-the-box observability stack. This time, we will walk you through monitoring the Istio service mesh with Sysdig Monitor and how to troubleshoot issues. Istio service mesh provides special characteristics and functionalities for microservices running on Kubernetes.

Flows vs. packet captures for network visibility

Recently, I saw some discussion online about how flow data, like NetFlow and sFlow, doesn’t provide enough network visibility compared to doing full packet captures. The idea was that unless you’re doing full packet captures, you’re not doing visibility right. Because I’ve used packet captures so many times in my career, I admit there’s a part of me that wants to agree with this.

Open source documentation will improve collaboration

There’s always a thrill to see something that you’ve dreamed of coming to life. And for us, open source docs is the realization of that dream. In simple terms, open source docs mean that the documentation is freely available for anyone to modify. This is a part of the modern documentation movement, being able to make changes to keep pace with modern development cycles.

What To Know About Microsoft Azure PostgreSQL Hyperscale

As organizations adopt cloud technologies and modernize their applications, the data they generate and ingest often grows exponentially, leaving them with difficult choices for storing and using this data. Customers are beginning to explore moving away from traditional relational database management systems (RDBMS) because of the data volume to be ingested, as these RDBMS often cannot handle workloads.

5 key benefits of Kubernetes monitoring

Kubernetes made it much easier to deploy and scale containerized applications, but it also introduced new challenges for IT teams trying to keep tabs on these newly distributed systems. Ops teams need proper visibility into their Kubernetes clusters so they can track performance metrics, audit changes to deployed environments, and retrieve logs that help debug application crashes.

Unleash the Full Power of Playwright With @playwright/test

Welcome to day three of our very first Launch Week! The last two days we shared how alerting became so much better and how we moved one level up by completing our SOC 2 Type 1 audit. Are you curious what’s next? I’ll tell you, but first let’s set the stage and look at Microsoft’s Playwright.

Why Use a Purpose-Built Time Series Database?

For many workloads, using a time series database is a smart choice that saves time and storage space. Developers and companies have more database choices than ever. Choosing the right database for a project saves time when writing and querying data. As companies work with larger datasets to make increasingly intelligent and automated systems, efficiency is key. For many workloads, using a time series database is a smart choice that saves time and storage space.

The State of Security Data Management in 2022

Today, Cribl is releasing The State of Security Data Management 2022 in collaboration with CITE Research. The report examines the challenges that enterprises are facing as they work to balance evolving business priorities with cyber threats. The report was conducted in September 2022 and surveyed 1,000 senior-level IT and security decision-makers. The survey found that, although most organizations are confident in their data management strategy, few believe it’s actually sustainable.

Authors' Cut-Gear up! Exploring the Broader Observability Ecosystem of Cloud-Native, DevOps, and SRE

You know that old adage about not seeing the forest for the trees? In our Authors’ Cut series, we’ve been looking at the trees that make up the observability forest—among them, CI/CD pipelines, Service Level Objectives, and the Core Analysis Loop. Today, I'd like to step back and take a look at how observability fits into the broader technical and cultural shifts in technology: cloud-native, DevOps, and SRE.

Eliminate Data Transfer Fees from Your AWS Log Costs

As businesses generate, capture, and seek to analyze more data than ever before, they often find themselves limited by high data storage costs, expensive data processing fees, and high management overhead. For organizations who wish to expand their log analytics programs and become more data-driven, maximizing cost efficiency has become a critical operational objective.

Webinar Snippet: How to Monitor SD-WAN Networks

A snippet from our webinar on "How to Monitor SD-WAN Networks." Obkio's Solutions Architect, Sam, explains monitoring SD-WAN Internet connection with Obkio. Obkio is a simple Network Monitoring & Troubleshooting SaaS solution that continuously monitors network and core business applications performance to identify intermittent issues and improve the end-user experience.

Webinar Snippet: How to Monitor SD-WAN Networks - Monitoring Agents

A snippet from our webinar on "How to Monitor SD-WAN Networks." Obkio's Solutions Architect, Sam, explains deploying Monitoring Agents in key network locations. Obkio is a simple Network Monitoring & Troubleshooting SaaS solution that continuously monitors network and core business applications performance to identify intermittent issues and improve the end-user experience.

5 main features that make up the best website monitoring tools

We live in an era where people would rather take the crowded rush-hour subway than drive through slow moving traffic. Website visitors are likely to bounce if a page takes more than four seconds to load. Once gone, there’s a high probability they’ll never be back. Keeping a close watch on your website performance is the only solution for maintaining a high traffic website.

Send metrics and traces from OpenTelemetry Collector to Datadog via Datadog Exporter

OpenTelemetry is an open source, vendor-neutral observability framework that provides tools, APIs, and SDKs to collect and standardize telemetry data from cloud-native applications and services. One of OpenTelemetry’s key components is the OpenTelemetry Collector, which receives and processes data before using exporters to route it to the destinations of your choice.

Sponsored Post

Monitoring Transaction Log Files for PCI compliance

File Integrity Monitoring, aka as FIM, is a must-have feature for anyone in charge of security. With FIM, one can detect when a critical file, such as a file that belongs to the Operating System, or a key configuration file, is changed. In most cases, configuring FIM is straightforward: If the file changes then generate an alert.

Exoprise Expands Network Visibility into Microsoft Teams and Leading UCaaS Applications

As businesses pivot towards multiple UCaaS platforms, the latest Exoprise monitoring solution offers deep application and network intelligence to support a modern workforce with a great digital experience.

Forward logs from the OpenTelemetry Collector with the Datadog Exporter

OpenTelemetry is an open source set of tools and standards that provide visibility into cloud-native applications. OpenTelemetry allows you to collect metrics, traces, and logs from applications written in many languages and export them to a backend of your choice.

Get the Coverage You Need in the User Experience Monitoring Space

Understanding user behaviour can help companies and organisations improve customer engagement, sales conversions, and customer service. On the other hand, not understanding user behaviour can result in lost customers and missed opportunities. So what is user behaviour, how does it affect your business, and how can you use it to improve your marketing efforts?

Cloud Computing - Complexity in Observability

Cloud computing has become a mainstream technology, and it's now easier to get started with the cloud than ever before. However, while the benefits of the cloud are undeniable, there are still challenges that you will have to face when using cloud services. One of these challenges is how difficult it can be to monitor an app or infrastructure. Cloud computing is a complex environment, so monitoring everything in it is challenging.

Biggest Mistakes Companies Make When Evaluating & Purchasing APM Software

Most modern companies that offer web or mobile apps use APM at some stage to enhance their growth. APMs help you to understand what's going on inside your app. It helps you know when something breaks, and it also helps you learn how to make sure it doesn't happen again. However, choosing the right APM solution for your product is complex. If you select the incorrect tool, you may discard it because it will not enable you to meet your observability objectives.

How to setup the native Teams connector in SCOM 2022

With the introduction of SCOM 2022, teams now comes with native integration with Microsoft Teams, replacing the old Skype for Business integration. This powerful new update gives users seamless unidirectional access to the latest alerts and notifications within their teams channels. Once set up, notifications like this one will pop up in Microsoft teams channels as you desire, keeping you informed of critical issues as they arise.

Microservices or a monolith - which one are you?

One analogy of a microservice architecture that I personally like is the idea of a large office setting with disparate departments communicating through an internal mail system. I imagine manilla envelopes being passed around, carried on carts through hallways, up elevators—passing the information one department needs to the next department.

Checkly Completes SOC 2 Type 1 Audit

A Service Organization Control (SOC) audit is one of the most extensive tests an organization can undergo to demonstrate the ongoing maintenance of high-level information security. Today, we’re thrilled to announce that Checkly is SOC 2 Type 1 compliant after completing a successful audit by an accredited auditing firm. This demonstrates that Checkly’s information security policies, procedures, and practices meet the SOC 2 guidelines for security and data privacy.

Time Series Forecasting with Python and Facebook Kats

Time series analysis is the study of a sequence of data points and records that are collected over a constant period. The analysis indicates how a variable or a group of variables has changed and helps in discovering underlying trends and patterns. Time series data is generally used for forecasting problems by predicting the likelihood of future data based on historical information.

Logging From Mobile Devices Using Cribl.Cloud

Mobile devices have changed our world. They come with us everywhere and provide invaluable services. One nagging problem is how to get data out of your mobile device. Specifically, logging metrics and events can be a trial. Opening up a public-facing port, managing the log receivers, coding… Wouldn’t it be nice if this was simplified? This article will demonstrate how easy delivering logs can be using Cribl.Cloud and simple HTTP POSTs – for free.

Expanding the Definition of the Network with Saaslio & Boardgent

Welcome to the last mile of the office network! I’m thrilled to extend our warmest welcome to two new additions to the Auvik family—Saaslio and Boardgent. Combining their product feature set and industry expertise with Auvik, will extend our capabilities to provide a unique toolset designed to solve problems for MSPs and internal IT departments—and effectively manage the end-user experience regardless of location and which applications are being used.

Feature update: Interface Topo and Saved Maps landing page

At LogicMonitor, we build with the customer in mind. We continue to iterate on the design and development of our platform to deliver a modern and intelligent customer experience. With that in mind, did you catch the important announcement in the Looking Ahead section of the v.175 Release Notes? The note announced that “LogicMonitor is in the process of rolling out a new user experience (UX) for customer portals.

How to Simplify Your Graphite Metric Ingestion Pipeline with Histograms

Many organizations relying on Graphite will be leveraging telemetry provided through Statsd. And if you rely on Graphite in combination with StatsD telemetry, you’re likely suffering from aggregation bloat. In a typical Graphite ingestion pipeline, applications emit data points via UDP, which are then received by an aggregator such as StatsD. Most StatsD servers only offer static aggregations, which must be configured upfront.

Observability Pipelines: Helping Your Data Do More

With an exploding volume of data and systems comes the need for observability, or the ability to understand the internal states of a system from knowledge of its external outputs. As a result, observability data's importance is at an all-time high. Businesses spanning every industry use it in various ways to respond to issues, increase agility, mitigate risk, and ultimately provide better experiences for their users. It’s an incredibly valuable commodity.

Datadog alternatives for cloud security and application monitoring

If you work in IT or DevOps, unless you’ve been living on a remote island without Internet access, you’ve likely heard of Datadog, a popular platform for monitoring cloud applications. Datadog collects and interprets data from various IT resources. The resulting insights assist in managing performance and reliability challenges to deliver a better end-user experience.

How to monitor DNS query response time

DNS (Domain Name System) servers translate standard language web addresses to their actual IP addresses for network access. DNS response time is the time it takes a Domain Name Server to receive the request for a domain name’s IP address, process it, and return the IP address to the browser or application requesting it. When it comes to DNS response times, the lower the better, and generally values less than 100ms are considered to be in the acceptable range (depending on the application).

How to Improve Your Spotify Adoption

The ecommerce industry is booming. While the pandemic accelerated the need for businesses to digitalize, advancements in technology and the surge in available marketplaces also helped ease buying and selling online. According to Shopify, the ecommerce industry is expected to grow by almost $11 trillion by 2025. Online stores are popping up every day – with an estimated 12-24 million ecommerce sites globally.

WhatsUp Gold 2022.1: Monitor More of What Matters, Right Out of the Box

Well, we’re here to change that. Progress WhatsUp Gold release 2022.1, available as of October 12, 2022, offers users even more built-in infrastructure discovery and monitoring power, available for immediate use. It’s now even easier for you to: It’s been a banner year for WhatsUp Gold users. Packed with enhancements, WhatsUp Gold 2022.1 layers onto the powerful improvements already released in 2022.0 last March.

Checkly Announces Business Milestones, SOC 2 Compliance, New Playwright Test Runner, GitHub Sync, Enhanced Alert Notifications, and More

Company experiences impressive momentum with thousands of users and over 600 customers now using Checkly to monitor apps and APIs quickly and reliably, conducting more than 3.8 Billion browser and API checks.

AIOps Provider ScienceLogic Acquires Machine Learning Analytics Provider Zebrium to Provide At-A-Glance Root Cause Visibility

Moving toward its goal of freeing up resources of enterprise IT teams and optimizing digital experiences, AIOps and hybrid-cloud IT management provider ScienceLogic has acquired machine learning analytics firm Zebrium to automatically find the root cause of complex, modern (i.e., containerized, cloud-native) application problems.

Adding middleware to Go HTTP client requests

The Go standard library has fantastic support for making HTTP requests. However, sometimes, a request needs to be modified or an action needs to be taken upon response (eg. logging a response). In many cases, adding these logs to the every request would involve a lot of duplicate code. In other cases accessing the client might not be possible because of restricted access in a different package or third party library. Thus, I introduce RoundTrippers.

User Journey Secure Variable Storage

We’re excited to announce that we now offer Secure Variable Storage for synthetic user journey scripting! If you have sensitive data that you wish to use in journey scripts then you can now store them in your account’s own encrypted vault. This is useful for data such as email addresses, passwords, usernames and API keys.

Monitoring the AVD Broker

A legacy way to deploy applications and desktops on Azure was often to put some Server 2022 VMs (Virtual Machines) in Azure running the standard RDS (Remote Desktop Services) roles – session hosts, brokers, gateway etc. and then pay the compute costs for your session hosts, brokers, and gateways, set up a public IP address and open ports.

How AIOps enhances operational efficiency

Digital data is everywhere, and its sheer volume and ambiguity often make it challenging for us humans to analyze. That’s why we use a special branch of AI called artificial intelligence for IT operations (AIOps) to reveal the deeper structure of copious data. AIOps sits at the intersection of big data and machine learning to improve the efficiency of IT operations.

Checkly Alerting Improvements and Our New Slack Community

Welcome to day one of our very first Launch Week! In the upcoming days, we’ll release new features every day. We’ll share new alerting capabilities, unlock the power of Playwright, and will talk about new ways to control and write your Browser checks. It’ll be a nice feature and improvements mix, trust me! To kick things off, let’s have a look at what’s new in the world of monitoring and alerting.

What's New in Checkly Launch Week

The Checkly development team is continually improving our platform and user experience, and we’re excited to unveil some new features that we’ve been working on during our first ever Checkly Launch Week. From October 11th through 14th, we’ll be announcing and discussing our latest innovations, new features, functionality, and capabilities for users every single day of the week.

How We Earn It: High Customer Satisfaction

One of the gratifying things about working at Cribl is receiving daily validation that we’re making customers’ lives easier, and solving their real problems. Every time someone tells us something like this, our hearts gladden, and a goat angel gets its wings: Numbers like those also translate into…numbers. When we surveyed customers in our most recent quarter, our CSAT (Customer SATisfaction) score was above 90%.

Introducing Software Delivery Shield for end-to-end software supply chain security

Organizations and their software delivery pipelines are continually exposed to growing cyberattack vectors. Coupled with the massive adoption of open source software, which now helps power nearly all of our public infrastructure and is highly prevalent in most proprietary software, businesses around the world are more vulnerable than ever. Today’s organizations need to be more vigilant in protecting their software development infrastructure and processes.

Product Update - Arduino Onboarding Made Easy

We love to write and ship code to help developers bring their ideas and projects to life. That’s why we’re constantly working on improving our product to meet developers where they are, to ensure their happiness, and accelerate Time to Awesome. This week, we are covering a product release that we think will save you time and effort when onboarding to time series and InfluxDB using Arduino.

Grafana 9.2 release: New Grafana panel help options, Grafana oauth updates, simplified variable editor for Grafana Loki, and more!

Welcome to Grafana 9.2, a jam-packed minor release with a wide range of improvements to help you create and share Grafana dashboards and alerts. Along with new developments for public dashboards and support for Google Analytics 4 properties, Grafana 9.2 offers new ways to connect with support teams about panel issues, a simplified query variable editor for Grafana Loki, improvements to access control, and much more.

Authors' Cut-Shifting Cultural Gears: How to Show the Business Value of Observability

At Honeycomb, the datastore and query systems that we manage are sociotechnical in nature, meaning the move to observability requires a sociological shift as much as it does a technical one. We've covered the technical part in several prior discussions for our Authors’ Cut series, but the social aspect is a little squishier. Namely: How do you solve the people and culture problems that are necessary in making the shift to adopt observability practices?

3 Simple Ways to Augment Microsoft Call Quality Dashboard Data

The Microsoft Call Quality Dashboard (CQD) does a solid job of helping admins check call and meeting quality across their Teams setup. But wouldn’t it be handy if you could get more out of CQD and make optimizing Teams for your business easier? Here are 3 ways to do just that. Although Microsoft provides the core quality data for you to accurately monitor Teams performance, there are other things you can do to augment your call quality dashboard.

How to spot issues in Microsoft Teams - even if users don't open tickets.

The latest release of Martello Vantage DX includes new data rich dashboards to monitor Microsoft Teams performance including actionable insight into call volume, top ten affected users, locations, and meetings. See how you can quickly spot Microsoft Teams performance issues with our latest release.

How to Keep Your System Visible in the Age of Remote Working

Monitoring IT infrastructure and services has always been an essential IT prerequisite. However, your IT monitoring system and security measures need to upgrade with an exponential increase in the number of remote users post-pandemic. For instance, consider this: At the end of a work day, you are notified that one of your critical services has gone down. But the problem is that five teams support different processes of that service.

A Guide To Opentelemetry Collector

This article will give you a quick overview of some of the key attributes you should know in order to get started with leveraging the OpenTelemetry collector for your next telemetry project. As an integral component of any project that involves distributed tracking, the OpenTelemetry Collector plays an important role. Simply put, it is helpful to know that the collector itself is a data pipeline service that collects telemetry data.

How to monitor host reachability

Most sysadmins and developers have at some point used a few of the popular Linux networking commands or their Windows equivalents to answer the common questions of host reachability- that is, whether a host or service is reachable and how fast it responds. One of the simplest, common checks, is to simply ping a host to verify that it’s reachable from where you issue the command, and to see the total time it takes for the host to receive your request.

How to add a store locator to your ecommerce storefront with Elastic

One thing that adds value to a business’s ecommerce presence is the ability for customers to easily find physical stores with an interactive map. Store locators can be built quite rapidly — all you need to place them on a map is geographic location in latitude and longitude. In this post, I’ll outline the pieces needed to put together a proof-of-concept store locator that could later be added to an ecommerce website.

Optimize Your Infrastructure Resources for Continuity of Network Monitoring Operations

Leverage DX NetOps network monitoring software metric projections to manage demands for IT resources proactively and cost-effectively. Capacity is a finite resource, so your IT Team needs to carefully analyze current and future use case scenarios to prevent potential bottlenecks. Presenting key network metric projections in a simple and unified manner is paramount to predict capacity and to understand when more resources are needed, allowing you to plan accordingly.

Anodot Named Momentum Leader on G2's Fall Grid

We are proud to announce that Anodot has been named Momentum Leader on G2’s fall grid for Cloud Cost Management Software. G2 is the largest and most trusted software marketplace. More than 60 million people annually use G2 to make smarter software decisions based on authentic peer reviews. G2 is disrupting the traditional analyst model and building trust by showcasing the authentic voice of millions of software buyers.

Effective Log Management and Analysis as an Enabler for Observability

Traditionally, when monitoring or troubleshooting active incidents, engineers access logs directly on the source system. However, modern IT environments are now too complex, and engineers can no longer manage and analyze logs effectively this way. With the adoption of microservices and the use of cloud-native infrastructure, it’s no longer feasible.

Interval Selector in Sentry Dashboards

Demo of the new time interval selector feature within the Sentry Dashboards tab. These are actual videos submitted by Sentaurs for our monthly Show-N-Tell. We have not edited them except for obscuring personal information that may appear in screenshots. Some videos may include screenshots that contain fictitious usernames or email addresses for illustrative purposes.

New State of DevOps Report 2022: Broadcom Sponsors Key Research

For DevOps teams looking for insights on how to improve, it’s invaluable to leverage the learnings of others. At the same time, given the wide range of DevOps teams’ expertise, tenure, and organizational dynamics, it’s also clear that one size does not fit all. That’s why efforts like this year’s “Accelerate State of DevOps Report” are so important. We’re proud to be a sponsor of this 2022 edition.

Why is data replication important?

High availability. This is what every monitoring tool needs to ensure that you never compromise on IT infrastructure visibility. On top of high availability, do you really want to enable all available features on your production system? It is important for the monitoring tool to have a low footprint on your CPU consumption and memory usage. Let’s dive deeper into the recommended way of configuring Netdata to ensure high availability and a low resource footprint through data replication.

How to Monitor and Troubleshoot Multi-cloud Applications

Multicloud and hybrid cloud applications are deployed on multiple cloud vendor platforms, including on-premises private cloud. While these platforms offer tremendous benefits by providing a reliable and scalable platform for fuelling digital transformation, they also add significant monitoring complexity. Site reliability engineers (SRE) need multicloud monitoring visibility to: Why is this important?

Automate Troubleshooting of Applications Running on Kubernetes

StackState is an out-of-the-box solution to observe your entire Kubernetes stack, identify problems, automatically highlight the changes that cause them and provide the full context you need for efficient and effective troubleshooting. Our clear and affordable pricing makes it easy to get started today.

New Honeycomb Integration With ServiceNow

Today, I’d like to tell you about a new community-contributed integration that connects Honeycomb to your ServiceNow workflows. My new integration reimagines what’s possible when connecting observability tools with ITSM systems. This post explains how it works and how to get started with it.

What is Observability: A Beginner's Guide

Observability is a methodology that you incorporate into your enterprise architecture to provide greater visibility into what is happening. It helps us determine the states of the system from their external outputs and allows technicians to identify bottlenecks, predict issues and mitigate them. As the architectures of IT systems are becoming more complex and distributed we use observability to meet the need to measure their internal states.

Collect traces, logs, and custom metrics from your Google Cloud Run services with Datadog

Google Cloud Run is a managed platform for the deployment, management, and scaling of workloads using serverless containers. You can deploy workloads in the cloud or, using Cloud Run for Anthos, on your on-prem infrastructure.

Common SQL Server challenges and how Applications Manager's SQL Performance Monitor helps you overcome them

Database management systems are an essential component of business applications. Over the years, MS SQL has earned its place in the hearts of database administrators (DBAs) as the most trusted relational database management system. It is still the go-to choice for many DBAs as it helps them leverage its extensive capabilities across various dimensions such as security, portability, transaction processing and analytics.

How to Detect Anomalies and Why You Should Care

Companies today are relying on technology more than ever thanks to widespread digital transformation and cloud initiatives. And this is increasing the need for safe, efficient and reliable IT environments. But maintaining operational IT stability is very difficult when considering the complex and dynamic nature of today’s IT environments. In fact, IT environments are constantly changing, with new network devices, users and software versions coming into existence.

SAP S/4HANA Private Cloud vs. Public Cloud: Which Is Right for You?

The SAP private cloud has been around for a while and was designed to provide secure, reliable, and scalable computing services to smaller organizations. Choosing the right platform upon which to lay the foundation of your entire business's operations is no small task. This article should help you reach a clearer decision by explaining: To learn how Avantra can help your company succeed, give it a try today, for free.

Observability Is a Data Analytics Problem

Observability is a hot topic in the IT world these days. It is oftentimes discussed through the lens of the “three pillars of observability”: Logs, Metrics and Traces. Indeed these telemetry signal types help us understand what happened, where it happened and why it happened in our system.

Flux: The Key to Edge Data Replication with InfluxDB

EDR enables developers to use the full capabilities of InfluxDB at the edge. Developers also can use that same data in the cloud for different purposes. Flux is the data scripting and query language for the InfluxDB time series database platform, enabling useful features such as Edge Data Replication (EDR).

Getting the Most from Microsoft Teams PSTN Functionality: The Experts Weigh In

The popularity of Microsoft Teams Calling has many organizations looking at their options to add Teams PSTN capabilities to their Microsoft Teams deployment. While it’s important to select the right PSTN option for your business, it’s even more important to recognize the complexity that PSTN will bring to your Teams deployment and have a plan to deliver good service quality with the technical support resources you have.

3 Tips to Deliver Microsoft Teams Service Excellence

As we continue to see Microsoft Teams usage skyrocket, now more than ever, users are depending on Microsoft Teams service excellence to maintain productivity. But it can be challenging to deliver a reliable user experience in today’s modern workplace. There are many factors in the IT environment impacting Microsoft Teams performance, and IT teams typically don’t have full visibility into them, or the service quality delivered to end users.

How to install a Site24x7 APM Insight Java agent in a Spring Boot application

This video will walk you through the process of installing the Site24x7 APM Insight Java agent in a Spring Boot application. With the APM Insight Java agent installed, you can monitor your entire application, track every transaction that occurs, identify transaction errors, and optimize transactions before your end users are impacted.

How to install a Site24x7 APM Insight Java agent in a WildFly server 8.x and above-standalone setup

This video will walk you through the process of installing the Site24x7 APM Insight Java agent in a WildFly server (in a standalone setup). With the Site24x7 APM Insight Java agent installed, you can monitor your entire application. You'll be able to track every transaction that occurs, identify transaction errors, and optimize transactions to prevent your end users from becoming impacted.

What are Core Web Vitals? | Core web Vitals explained in 7 minutes

Core Web Vitals are a system of metrics used by Google to analyze your site's performance and user experience. If your site has a poor score in any core web vital metrics, google will rank your site lower than other websites. In this explanation video, we will look at the meaning of core web vitals and a few of the most common causes for poor core web vital metrics.

Grafana k6 one year later: Lessons learned after an acquisition

A few years ago, I was meeting with venture capitalists and private equity firms about the future of k6, the open source performance testing tool that we created in 2016 and open sourced in 2017. After talking about the k6 product mission — to give modern engineering teams better tools to build reliable applications — one investor challenged us to create an even bigger vision for the company: What if we acquired a company to broaden the k6 story?

How to monitor a website for changes in visa appointment availability?

Anyone that has ever applied for a visa and wanted to schedule an interview appointment remembers and knows the struggle of refreshing the page continuously, waiting for a new session to become available. The battle became even more real during the pandemic when embassies started giving out fewer appointments that quickly filled up, leaving the rest empty-handed and impatient.

How to filter metrics by label?

It is sometimes easy to get lost in the mountain of metrics and infinite number of dimensions when working with an infrastructure monitoring tool. Being able to filter metrics by label and visualize only what is relevant to the current scope of monitoring & troubleshooting, becomes absolutely crucial to the success of SREs, Sysadmins and DevOps professionals.

So You've Troubleshooted the Alert. Now What?

Welcome to the companion post to So You Received an Alert. Now What? Last time, we broke down the process between receiving the Uptime.com check alert and figuring out what broke. Today, we’re going to show you how to communicate your efforts so that everyone – your end users, coworkers, and bosses – know what’s going on. Your first step is to update your Status Page, your central hub for incident management and communication.

Datadog on gRPC

Datadog, the observability platform used by thousands of companies, is made up of hundreds of services that communicate over the network using gRPC, an RPC framework, making it a critical component for Datadog’s reliability. As teams investigated incidents related to their services, they discovered that some of them were gRPC related. But, were there common patterns to those incidents? Could we use them to learn more about gRPC and how to use it better?

Tutorial: How to Use ChaosSearch with Grafana for Observability

In my last blog post, Building a Cost-Effective Full Observability Solution Around Open APIs and CNCF Projects, we introduced using ChaosSearch in combination with the most popular open source front- and back-ends in the application observability space. In case you missed it, the TL;DR version is that you can use a variety of open source projects and open API-based components to build the best-of-breed observability stack of your choice rather than relying on expensive, all-in-one solutions.

It's a Three-Peat For Cribl with Awards from Comparably

When we began the week, we had zero awards from Comparably. As we end the week, we now have a three-peat of awards. Cribl was recognized among 70,000 companies out of 15 million ratings – winning top honors for Happiest Employees, Best Compensation, and Best Perks and Benefits. We’re thrilled to be recognized by Comparably, and we’re looking forward to continuing our pursuit of being the best place to work.

External Services Monitoring for Python

Python web applications are taking over more and more of the internet (source). However, with great Pythonic power comes great responsibility — ensuring that your web applications consistently deliver in terms of performance and reliability. It is one thing to build and ship an application and another to continually monitor and maintain it on the internet.

Where Are My App's Traces? Understanding the Black Magic of Instrumentation

Many developers don’t know what instrumentation really is, and those who do don’t really understand the black magic that takes an application and makes it emit telemetry, especially when automatic instrumentation is involved. On top of that, each programming language has its own tricks. I wanted to unwrap this loaded topic on my podcast, OpenObservability Talks. For this topic I invited Eden Federman, CTO of Keyval, a company focused on making observability simpler.

Building a Performant iOS Profiler

Profilers measure the performance of a program at runtime by adding instrumentation to collect information about the frequency and duration of function calls. They are crucial tools for understanding the real-world performance characteristics of code and are often the first step in optimizing a program. Apple and Google have first party profiling tools, but they are only usable for local debugging during development.

Upgrading our Google Lighthouse Service

We are always looking for ways to improve our products and processes at RapidSpike. We’ve recently completed an upgrade to our Google Lighthouse service software to Google’s latest version 9.6.7. This required us to almost entirely rearchitect how we run the tests, which allowed us to take advantage of the latest AWS technologies. This blog post explains what this change required.

How to Gain Observability into Your CI/CD Pipeline

We all know that observability is a must-have for operating systems in production. But we often neglect our own backyard — our software release process. We noticed we made that mistake here at Logz.io. We were wasting time and energy in handling failures in the CI/CD pipeline, and made our Developer-on-Duty (DoD) shifts tedious. That’s why it’s critical to incorporate your observability practices into your CI/CD pipeline.

GitLab CI/CD Job Templates!

Like I’ve mentioned in my last blog post, we use GitLab pipelines for packaging. We have a lot of software, like Icinga, Icingaweb and its various modules, which we want to build across multiple different operating systems. This results in a huge number of jobs and pipelines, doing very similar stuff. We have a lot of code repetition, and this is bad – code repetition means higher code maintenance , and it invites bugs.

5 Website Architecture Tips That You Should Know About

Imagine being lost in a new place, and your maps don't help much. That sounds frustrating, right? Yet, it's precisely how your website visitors feel if you don't have functional website architecture to guide them through your content. Website architecture is the mind map that guides your visitors and search engines through the content on your website.

How to monitor Oracle DB with Google Cloud Platform

Monitor Oracle DB in Google Cloud Platform with the Google Ops Agent. The Ops Agent is available on GitHub, and makes it easy to collect and ship telemetry from dozens of sources directly to your Google Cloud Platform. You can check it out here! Below are steps to get up and running quickly with observIQ’s Google Cloud Platform integrations, and monitor metrics and logs from Oracle DB in your Google Cloud Platform.

The future of observability is cloud-native and unified

Building modern, cloud-native applications introduces new challenges to teams and organizations. As these systems grow and scale, struggles abound: inconsistent performance monitoring experiences across siloed tools, wasteful performance management practices with duplicated efforts, and mounting frustration from colleagues and customers. Surmounting these challenges requires multiple sources of data and truly unified observability.

Announcing Grafana Cloud Link, a gateway from any local Grafana instance to Grafana Cloud

If you’ve had a local Grafana instance for any length of time, it’s likely dialed in just how you like it, and that’s a good thing. If you are working within Grafana Cloud, by contrast, you are using a heavily opinionated experience that our teams are building, managing, and provisioning. As a result, we serve up solutions that users can work with out of the box and can use to build their stack.

Missing indexes in PostgreSQL? How to quickly identify it

While working on improving the Netdata PostgreSQL collector, we were monitoring our production PostgreSQL instance and something caught our attention immediately. The rows fetched ratio seemed really, really low for one particular database… there were missing indexes in PostgreSQL! Rows fetched ratio is the percentage of rows that contain data needed to execute the query (rows fetched), out of the total number of rows scanned (rows returned).

BindPlane OP Enterprise Beta Announcement

Since introducing BindPlane OP earlier this year, we’ve received a lot of feedback asking for the enterprise features you require to deploy in production. With functionality like SSO, RBAC, and Audit reporting all surfacing to the top of that list. Today we’re launching BindPlane OP Enterprise in beta, which introduces support for LDAP and AD authentication. We’d love for you to try it out and let us know what you think.

Kubernetes ErrImagePull and ImagePullBackOff in detail

Pod statuses like ImagePullBackOff or ErrImagePull are common when working with containers. ErrImagePull is an error happening when the image specified for a container can’t be retrieved or pulled. ImagePullBackOff is the waiting grace period while the image pull is fixed. In this article, we will take a look at.

StatusGator vs IsDown: The Best IsDown Alternative

StatusGator and IsDown are two products that provide status page aggregation and vendor status monitoring. At a first glance, these two products look the same but they are quite different in reality. StatusGator was launched in 2015, and has been aggregating status data for more than 7 years. IsDown is a newer alternative that is similar but lacks many key features that StatusGator has. To make the best choice, we will closely examine the differences between the two.

How merchants can protect revenue with AI-powered payment monitoring

Smooth payment operations are critical for every merchant’s success. At its most basic level, a seamless and reliable payment process is the key to assuring transaction completion, which is at the very core of a merchant’s financial strength. However, when payment data systems fail to deliver insights about issues regarding approvals, checkouts, fees or fraud, the result is revenue loss and sometimes customer churn.

Keeping Your Microsoft 365 Estate In-Check with Netreo

Teams meeting not loading? Outlook mailbox not refreshing? Imagine starting Monday morning with either of the above two issues. It can certainly hinder anyone’s work schedule. Microsoft Teams, Sharepoint, Outlook and other Microsoft 365 services have become essential in day to day work life. An outage in any of these services causes panic among the workforce, followed by questions on whether the trouble is their system or the application.

Basics of Retrace APM

Retrace is an an award-winning, easy-to-use SaaS application monitoring solution, combining APM reporting with Error and Log management in a centralized location. The main APM component of Retrace provides code-level application performance visibility for 6 of the most popular programming languages, .Net, Java, PHP, Python, Ruby and Node.js. Retrace APM metrics give users insight into the amount of requests that are being made to an application’s endpoints and how those requests perform.

Beating the odds: How log data helps detect and lower MTTR

Depending on your business, MTTR stands for mean time to repair or mean time to recovery – but it can also mean resolution, resolve, or restore. No matter how you define it, the basic measurement is the same: it’s the time it takes from when something goes down to when it is back and fully functional. This includes everything from finding the problem to fixing it. For ITOps teams, keeping MTTR to an absolute minimum is crucial.

Will Automation Replace the IT Workforce?

Whether you work in Manufacturing, Tech, or Retail, you’ve likely considered the impact of automation in your industry. The rapid digital transformation brought on by the COVID-19 pandemic forced many leaders to face this concern head-on. But for IT, there is no cause for alarm. Automation is not designed to replace the workforce; it is designed to be ITs greatest asset.

Boost Application Experience with Network Behavior Analysis

“Network behavior analysis (NBA) is a network monitoring program that ensures the security of a proprietary network. NBA helps in enhancing network safety by watching traffic and observing unusual activity and departures of a network operation,” explained Techopedia. “Network behavior analysis monitors the inside happenings of an active network by collecting data from many data points and devices to give a detailed offline analysis.

RTOS Tracing, your way

When debugging an RTOS-based system, tracing can often give a better understanding of the real-time behavior of your system. Percepio Tracealyzer supports two main types of RTOS tracing, snapshot and streaming, both offering the same powerful visualization although streaming allows for collecting longer traces. The Percepio trace recorder offers several options to allow developers to adjust the tracing setup to fit their target system and their analysis needs.

Introducing ManageEngine CloudDNS for all your critical DNS infrastructure management needs

The DNS is the most critical part of network infrastructure and the only doorway to the internet. We at ManageEngine, a division of Zoho Corp, understand this criticality well and have carefully designed software that helps IT infrastructure management professionals securely manage their domains’ DNS records and elude modern problems in DNS management. We’re excited to introduce ManageEngine CloudDNS, the first critical infrastructure management software in our portfolio.

How to Increase Workplace Productivity, Tools, and More!

Wondering what the best way for employees to work more efficiently is? According to a Gallup poll, an engaged and productive workforce can increase corporate profits by as much as 21%. With the talent competition only increasing, every company’s IT and business leaders need to consider implementing workplace productivity strategies. In this article, we share tips for optimizing productivity levels for a modern workforce and exploring new ways of measuring it daily.

Sponsored Post

What Is Service Automation?

Service automation is the process of automating processes, events, tasks, and business functions. It offers multidimensional visibility of the business, which helps you streamline the business process. It integrates the domain functionality tools with various layers of automation within a unified interface or workflows.

Monitor Workday with AVM Consulting's integration in the Datadog Marketplace

Workday is a cloud-native solution for enterprise management. With its multitenant architecture, which supports a wide range of integrations and enables centralized administration of multiple Workday instances (i.e., tenants), Workday provides a unitary framework for managing human resources, financials, payroll, recruiting, analytics, and more. Along with Workday’s cloud-based delivery model, this multifaceted support offers flexibility that can be critical at enterprise scale.

Monitor application performance from the Datadog mobile app

When you’re on-call for a critical service and get alerted to an issue that could impact customers, you need quick access to key performance metrics in order to effectively troubleshoot. But all too often, digging into this data requires you to switch from your mobile device to a laptop, restricting your ability to troubleshoot on the go and disrupting your routine.

What's Happening To Middleware In The Cloud-Native Era?

Spending two decades in the middleware field has given me deep insight into the evolution of this technology domain. I began my career as a software engineer in a platform group, building reusable components using technologies like object linking and embedding (OLE), the distributed component object model (DCOM) and common object request broker architecture (CORBA).

IT infrastructure monitoring

IT infrastructure monitoring as a whole picture is about keeping track of the health and performance of all the IT assets in a network environment. The network management system gathers data on various metrics, like availability, health, performance, and utilization. Then IT infrastructure monitoring transforms this data into useful statistics that help enterprises scale their businesses.

Sponsored Post

Microsoft 365 Monitoring Use Cases

Enterprises moving to Office 365 cloud-based applications require a new approach to ensuring deployment success. Of course, your end-users should receive a fantastic application experience, whether with Teams, SharePoint, OneDrive, etc. But when problems surface with slowness or call quality, the Microsoft Service Health Dashboard provides no visibility beyond their network - leaving IT admins in the dark. Today, I'll walk through a few critical Microsoft 365 use cases for monitoring purposes and how Exoprise Digital Experience Monitoring solutions can help.

Elastic Announces Innovations to Transform the Way Organizations Search, Observe and Protect their Data

The new changes to Elastic Search will simplify Elastic Cloud on AWS experience with automatic provisioning of Elastic Agent to easily ingest data from any AWS service and improve search relevance with machine learning-based hybrid scoring.

Monitoring Kubernetes with Hosted Graphite by MetricFire

In this article, we will be looking into Kubernetes monitoring with Graphite and Grafana. Specifically, we will look at how your whole Kubernetes set-up can be centrally monitored through Hosted Graphite and Hosted Grafana dashboards. This will allow Kubernetes Administrators to centrally manage all of their Kubernetes clusters without setting up any additional infrastructure for monitoring.

How to Monitor Google Cloud Interconnect and Network Performance

Google Cloud Interconnect promises data transfers with low latency, and high availability - but how can you make sure that it’s actually performing as promised? Continuously monitoring Google Cloud Interconnect performance is the key to identifying slowdowns, high levels of packet loss, and other problems affecting Google Cloud. Keep reading to learn how to do it all in minutes using Obkio Network Monitoring!

Reimagining nmon Using InfluxDB

IBM engineer Nigel Griffiths built nmon in the 1990s to monitor operating system performance data for AIX. Since its original launch, Griffiths revisited and revamped nmon. For example, he built an open-source version for Linux. Despite drastic change in the very nature of computing and exponential growth in storage, memory, and compute power, it wasn’t until 2018 that Griffiths sought to completely re-write the tool and bring it into alignment with modern computer systems.

TL;DR Deep Linking Dashboards

If you’re an InfluxDB and InfluxDB UI user, you’ve almost certainly created dashboards. However, if you’re building dozens of dashboards in the InfluxDB UI, you might have come across the need to deep link related dashboards. In this tutorial we’ll learn how we can use the table view with Flux, string interpolation, and variables to deep link users to other dashboards.

The Monitoring Problem: Too Many Tools + Too Much Time = No Room for Innovation

Continuous availability and unceasing innovation are prerequisites for today’s digital businesses. So it makes sense that business leaders invest heavily in teams and tools to monitor digital apps and services. In theory, these tools should also free up time for engineers to push new functionalities that wow customers. But do these investments actually result in more uptime and customer-delighting innovations?

Elastic Universal Profiling helps you deliver fast, affordable, and efficient services

So, what is Universal Profiling™? Universal Profiling™ is fast emerging as an important component of observability. A standard feature inside hyperscalers since approximately 2010, the technology is slowly percolating into the wider industry. Universal Profiling™ allows you to see what your code is doing all the time, in production across a wide range of languages and can profile both user-space and kernel-space code.

Django Monitoring and APM Benefits

Django is growing to become one of the most popular web frameworks, and it's built on top of Python, among the easiest programming languages to start with. As the number of companies releasing Django apps increases, it is natural that the need for Django monitoring will also increase. In this guide, we will share the benefits of implementing monitoring in a Django application. Without further ado, let's begin!

Getting to That Elusive "Inbox Zero" With Custom Alerts and Codeowners

Forethought is a leading AI company providing customer service solutions that transform the customer experience. As a high-growth startup with a fast-expanding engineering org., teams had to deal with compounding complexity, leading to challenges measuring the impact and health of their services. Forethought’s core engineering team maintains common services between other internal teams, infrastructure, data, and tools and, as they added more engineers, the original team split into five.

Data Center's Need MSPs

A recent Honeywell survey reinforced something many in the MSP world already knew: data centers need managed service providers. 96% of facility managers indicated remote management is important, but only 34% had remote management capabilities in place. That’s surprising in a digital world where data centers enable critical business functions and an epic business opportunity for MSPs looking to enter the data center world.

SLO walkthrough: measuring microservice performance

To improve reliability, we need to measure it, and to measure it we use SLOs (Service Level Objectives). Or at least, that’s what Google SRE has popularized. In practice, it can be difficult and time-consuming to identify the right things to measure, to get to the right data, and to surface the results in a way that engages the stakeholders and teams involved. And all this is especially hard as we scale our teams and applications across multiple technology stacks.

First page of Hacker News, 9000+ GitHub stars, improved dashboards and documentation - SigNal 17

Welcome back to our monthly product updates - SigNal! So happy to share with everyone that we were trending on the first page of Hacker news over the weekend, and the whole team was so upbeat about it. Last month, we shipped a lot of dashboard and documentation improvements to make SigNoz better for our users. We also participated in conferences, gave talks on open source, and sipped a lot of coffee! Let us see what humans of SigNoz were up to in the month of September 2022!

Icinga + Guacamole

One of Icinga’s greatest strengths is its ability to integrate with other systems and use those systems’ data to enrich monitoring. It can write time-series data to InfluxDB, Graphite or even Prometheus with our icinga2-exporter. It can talk to different data sources so that hosts and services can be created and managed automatically. This means that lots of manual work is eliminated.

Cloud-native observability from customer to kernel

From its inception as a powerhouse for logging, Elastic Observability has grown into a comprehensive solution for full-stack multi and hybrid-cloud observability. Given the increasing complexity of the cloud-native world, the major challenge for observability is twofold: getting deeper and more frictionless visibility at all levels of applications, services, and infrastructure, and making sense of the overwhelming amount of data that is available.

Fintech Industry: Are Your IT, DevOps, and Engineering Teams Siloed?

The Cambridge English Dictionary defines a silo as “a part of a company, organization, or system that does not communicate with, understand, or work well with other parts.” Siloing can exist at various organizational levels: siloed departments, siloed teams within a department, and even siloed engineers within a team. In any industry, siloing can cause issues with alignment, communications, and overall delivery, but in fintech, there are additional risks.

7 Must-Have Steps for Production Debugging in Any Language

Debugging is an unavoidable part of software development, especially in production. You can often find yourself in “debugging hell,” where an enormous amount of debugging consumes all your time and keeps the project from progressing. According to a report by the University of Cambridge, programmers spend almost 50% of their time debugging. So how can we make production debugging more effective and less time-consuming?

How Cortex can help you get the most out of Datadog

With Datadog’s Dash conference right around the corner, we at Cortex have been thinking a lot about best practices for observability. To get the most out of an application performance monitoring (APM) vendor like Datadog, you want to make sure monitoring and observability are built into launch and production readiness checklists.

Device discovery: The path to total network visibility

For an organization to prevent cyberattacks, it first needs complete visibility into all the events that occur within its network. With this visibility, the organization can analyze risky behavior by users and entities, and take the necessary steps to proactively secure itself. However, if an attack were to still happen, the organization again needs complete visibility to identify how and from where the attacker entered the network.

Sponsored Post

What Is a Business Service & How Can AIOps Support It?

We talk a lot about business services and how to keep them running at peak efficiency, but every now and then these questions arise: what exactly is a business service, how can I provide the best possible experience to my users, and how does it affect the performance of my IT estate?

Sponsored Post

Core Web Vitals e-commerce analysis: part two

In 2021, Google introduced Core Web Vitals, three criteria to measure if a website is fast, stable, and responsive enough to give visitors a good digital experience. These factor into search ranking and have a powerful influence on customer behavior. But while Google has been urging the web performance community to get on board for more than two years, many are still falling short. We pulled data from the Chrome User Experience Report to conduct our own Core Web Vitals analysis, finding that even some of the largest e-commerce brands aren't passing these thresholds.

Using Lumigo OpenTelemetry Distributions with other backends

When we set out to trace applications running outside of AWS Lambda, there was little doubt in our minds that building on top OpenTelemetry was by far the best course of action. There are many reasons for this, but chiefly, it is a question of coverage. At its most fundamental level, achieving coverage requires as-wide-as-possible support for technologies, and interoperability among instrumentations.

Sponsored Post

Why Composable Analytics Matter for Multi-Cloud AIOps

There’s plenty of loaded terminology and buzzword bingo when it comes to the latest advances in cloud application delivery. Especially when it comes to multi-cloud – which should merely mean multiple cloud instances when modern cloud applications really leverage multiple hybrid IT operating models, atop both existing business silos and newer microservices application workloads.

Feature Focus: September 2022

Another month has come to a close, so I’m back again to take you through what’s new and noteworthy from the month of September. If you missed last month’s blog, this will be a monthly recurring series to keep you posted with the latest and greatest at Honeycomb. There’s a ton to cover, so I’ll dispense with the preamble and dive right in.

Set up instant SNMP monitoring with the new SNMP integration in Grafana Cloud

Simple Network Management Protocol (SNMP) is an internet protocol that is used to collect information about network devices and manage them. Most of the modern devices connected to a network support SNMP, such as routers, switches, servers, printers, and more. There are three different versions of SNMP (v1, v2, and v3). It most commonly operates on UDP ports 161 and 162. The most common versions being used are v1 and v2. The data can be collected from a network device through SNMP via polling.

10 Tools for Monitoring Mobile Apps from a Network Point of View

With so many apps to choose from, mobile users no longer have much patience for apps that don’t work well. This isn’t just about bugs and crashes; users also think about how fast the app works and how much battery it uses. But have you ever thought about what would happen to the business if your live applications running on the client’s systems went down or didn’t work as expected?

Cloud Monitoring further embraces open source by adding PromQL

As Kubernetes monitoring continues to standardize on Prometheus as a form factor, more and more developers are becoming familiar with Prometheus’ built-in query language, PromQL. Besides being bundled with Prometheus, PromQL is popular for being a simple yet expressive language for querying time series data. It’s been fully adopted by the community, with lots of great query repositories, sample playbooks, and trainings for PromQL available online.

Sentry Custom Performance Metrics

This is a demo of Sentry's Custom Performance Metrics and how to query them in both Discover and Dashboard pages. Custom Performance Metrics are measurements attached to Sentry a transaction just like FCP or LCP, but with the major difference being that they're defined by the user rather than Sentry. These are actual videos submitted by Sentaurs for our monthly Show-N-Tell. We have not edited them except for obscuring personal information that may appear in screenshots. Some videos may include screenshots that contain fictitious usernames or email addresses for illustrative purposes.

We have redesigned our entire service

As of today, Oh Dear is in a brand new jacket. We've totally redesigned Oh Dear's UI. Our app doesn't only look better, but we've also made it much easier to use. We feel that our new design should speak for itself, so we highly recommend visiting the home page, browsing a bit around, register an account, or log in, and discover the redesigned app yourself. If you've been using Oh Dear before, you'll notice that we polished everything, and the UX should be much better.

Monitoring Strategies: An Introductory Guide With 5 Examples

Monitoring is an integral part of most organizations. The monitoring process usually consists of several tools that, combined, show you information about whatever you're monitoring - applications, infrastructure, networks and so forth. While monitoring may seem like an obvious practice to some, it can be challenging to establish the best monitoring strategy for your organization.

Incident response: Unlocking knowledge and breaking down silos

In a world of monolithic applications and microservices, responding to incidents can be a painful process, involving multiple people with siloed knowledge jumping between different tools to find the relevant data and take action. Individuals within a business often hold the knowledge of how a particular component works, or how it depends on other services. The key to successfully responding to incidents is unlocking this knowledge and breaking down the silos between teams.

Release webinar: SquaredUp v5.6

SquaredUp v5.6 is all about the most highly requested features from our customers. Amongst all the improvements, you’ll see the new Tree View and Sunburst visualisations to help you spot root cause more quickly across your SCOM environment, Enterprise Applications and EAM-X monitored objects. Tune into this release webinar to see the new features in action!

Fixing SCOM Blind Spots - Introducing EAM-X, and loads more with SquaredUp v5.5!

In this webinar you’ll learn about all the latest updates included in SquaredUp v5.5, as well as an exclusive demo of our brand new licence tier, EAM-X. What is EAM-X? EAM-X is a new tier of SquaredUp built to extend SCOM’s visibility beyond Microsoft’s domain, making it the ultimate single pane of glass! Have you ever had an outage because of a dependency not monitored by SCOM? Even with Management Packs, SCOM has blind spots, and blind spots = business risk.

Unity Performance Testing Tools & Benchmarks

The following guest post addresses how to improve your services’ performance with Sentry and other application profilers for Unity. Learn more about Sentry’s Profiling product or try it out now if you’re already a Sentry user. We’re making intentional investments in performance monitoring to give relevant context to help you solve what’s urgent faster.

Featured Post

Fixing Slow Databases: Improving App Performance Overnight

There's no denying database applications have come a long way over the past few years. Despite all the improvements, however, they're still far from perfect-sometimes, they even feel painfully slow. A seemingly quick and easy task can end up taking hours for no good reason. The result? Angry users, suspicious managers, and a generally unhappy team.

5 Microservices Challenges and Blindspots for Developers

Microservices are loosely coupled services that are organized around business capabilities. In an ideal microservices architecture, each service can be developed and deployed independently. To form a functional application, these separate services communicate with each other in the production environment (and even beforehand).

Cloud Providers Health Report - September 2022

Check our September 2022 health report on the top most popular cloud providers. We analyze the health of the cloud providers based on the number of outages and problems during the month. The source of the data is made available by the cloud providers themselves via their status page. We normalize it and use it to generate the report.

Security Best Practices at MetricFire

At MetricFire, we treat your data as our data, and we secure our data. Security is prioritized at every level of our infrastructure so you can have peace of mind that your data is sent and stored safely. Keeping MetricFire secure is fundamental to the nature of our business. One of our key priorities is to secure our customers’ metrics and trust. We diligently ensure that we comply with industry security standards so that our customers can trust that their metrics are safeguarded.

What is Network Discovery?

Something we've said before is, "You can't protect what you can't see," and it's so true. How can you protect your network if you can't see ALL of it? Enter: Network Discovery. Network Discovery is the fundamental principle of network monitoring and management. At its core, the process of discovery allows you to find and identify everything connected to your network. It allows you to see what's on your network, how to figure out what's connected, and more.