Operations | Monitoring | ITSM | DevOps | Cloud

November 2022

Java vs Python: Code examples and comparison

As two of the most popular and practical languages out there, should you choose Java or Python for your next project? Is one of these languages a clear-cut better option? The answer is a long one. According to GitHub’s annual Octoverse report, Python has now climbed to the second most popular language in usage, pushing Java down to third place.

Overcome your virtual machine monitoring woes with OpManager

As enterprises move towards a digital-first strategy, they rely much more on their IT infrastructures. An organization’s infrastructure drives the entire business and thus must be aligned with the organization’s business goals. This crucial task of managing an enterprise’s infrastructure brings its own share of difficulties to the table. However, the primary concern is to ensure the infrastructure’s scalability and optimal performance.

Troubleshooting Azure SQL Database Performance Issues

When applications suffer performance degradation often the root cause of the issue is a database problem. In this guide we’ll show you 7 ways to troubleshoot your Azure SQL database performance issues using metrics and insights from the eG Enterprise monitoring solution.

AWS and InfluxDB - Reflections on re:Invent 2022 Keynote

Amazon re:Invent is a major technology event every year. At this year’s re:Invent, the keynote by AWS CEO Adam Selipsky made a concerted effort to draw connections between technology and some of the key challenges that people around the world, and in some cases beyond the terra firma of Earth, face. While the presentation touched on a wide range of topics, one overarching theme was the intersection of the physical and digital worlds, and the role technology plays in bridging that divide.

Tracing with InfluxDB IOx

Tracing has always been a key use case for time series data. But admittedly, it’s also one that past versions of InfluxDB could not handle as well as we wanted. One of the roadblocks was the cardinality issue. Tracing data is, almost by definition, high cardinality data and prior to InfluxDB IOx, high cardinality data could affect query performance.

Search Observability Data In-Place: Store Where You Want, Query When You Want

When we created Cribl Search, we wanted to give systems administrators the ability to query data without having to spend resources on collection and processing first — but we didn’t stop there. With Search, we’re also making it possible to query all the data you’ve already collected, processed, and kept in places like object stores, file systems, analytics tools, S3 buckets, or other data stores.

Network Performance Reporting Feature Release

Obkio’s long-awaited Network Performance Reporting feature is now available for all users of Obkio’s Network Monitoring App. Create custom network monitoring reports to simplify and visualize the analysis of information in your network. Learn more about how to access and use Network Performance Reporting with Obkio.

Logz.io's New Features: Easier, Faster, and More Cost-Efficient Observability

Our product strategy this year was relatively simple. Many observability practitioners we spoke with complained that observability was oftentimes slow, heavy, complex, and costly – which can be summed up in our CEO’s recent blog on modern observability challenges. While our customers didn’t report similar challenges, we wanted to further distance ourselves from this typical observability experience.

Broadcom Software Debuts the Experience-Driven NOC at DoDIIS 2022

Command, control, and communications (C3) systems are fundamental to all military operations, and the network is the backbone to keeping the warfighter up to date and out of harm's way. The right network modernization strategies will enable the latest C3 capabilities to provide real-time situation awareness and decision support for today’s military operations.

Kubernetes 1.26 - What's new?

Kubernetes 1.26 is about to be released, and it comes packed with novelties! Where do we begin? This release brings 37 enhancements, on par with the 40 in Kubernetes 1.25 and the 46 in Kubernetes 1.24. Of those 37 enhancements, 11 are graduating to Stable, 10 are existing features that keep improving, 16 are completely new, and one is a deprecated feature. Watch out for all the deprecations and removals in this version!

Kubernetes Health-Check: The Most Critical Health Conditions To Monitor

Kubernetes can generate so many types of new metrics (millions every day) that one of the most complex aspects of monitoring your cluster’s health is filtering through these metrics to decide which ones are important to pay attention to. In fact, in a survey that Circonus conducted of Kubernetes operators, uncertainties around which metrics to collect was one of the top challenges to monitoring that operators face.

TraceQL: a first-of-its-kind query language to accelerate trace analysis in Tempo 2.0

The much-anticipated release of Grafana Tempo 2.0, which we previewed at ObservabilityCON 2022, will represent a huge step forward for the distributed tracing backend. Among the biggest highlights will be TraceQL, a first-of-its-kind query language that makes it easier than ever to find the exact trace you’re looking for. There’s supposed to be a video here, but for some reason there isn’t. Either we entered the id wrong (oops!), or Vimeo is down.

Kentik Kube: Container Network Performance Monitoring

In this brief demo, Phil Gervasi explains how you can use Kentik Kube to monitor K8s network performance among your containers on-premises and in public cloud. With Kube, you can get granular visibility into container performance in terms of packet loss, packet size, protocol and application activity, TCP flags, and network latency.

Release 1.37.0: Infinite scalability, database tiering, and much more

Another release of the Netdata Monitoring solution is here! We focused on these key areas: IMPORTANT NOTICE This release fixes two security issues, one in streaming authorization and another at the execution of alarm notification commands. All users are advised to update to this version or any later!

What is the Real ROI of an ITIM Tool?

It may seem a bit strange, but recently I’ve been asked about Return on Investment (ROI) calculators a fair amount. Generally the discussion is if a given vendor should (or should not) have an ROI calculator on their website. Some reasons for justifying an ROI calculator include making it easier for potential clients to understand the value they will receive from a solution and the basic cost involved.

Bringing Codecov into the Sentry Family: Where Code Coverage Meets Application Monitoring

Today Codecov is joining the Sentry family. Codecov began as a code coverage reporting tool in 2014 and has since emerged as a market leader in the test analytics space. Codecov makes coverage actionable for over two dozen test frameworks, and has helped over a million software developers improve their approach to testing, coverage, and code reliability. You might be asking, what do test analytics have to do with application monitoring?

How We Made JavaScript Stack Traces Awesome

Sentry helps every developer diagnose, fix, and optimize the performance of their code, and we need to deliver high quality stack traces in order to do so. You might have noticed a significant improvement in Sentry JavaScript stack traces recently. In this blog post, we want to explain why source maps are insufficient for solving this problem, the challenges we faced, and how we eventually pulled it off by parsing JavaScript.

Close the Cloud Monitoring Gap with Network Observability

To fully capitalize on the promises of digital transformation, IT leaders have come to recognize that a mix of cloud and data center infrastructure provides several business advantages, including increased agility, cost efficiencies, global availability, and, ultimately, better customer experiences.

7 Incident Management Best Practices to Improve Business Efficiency

Think about the last time your IT systems had an outage: How did your team react to it? Were they organized with a clear idea of how best to resolve the issue? Or was it chaotic, with people firing questions from all directions and customer service channels ablaze with requests for help? Digital technology disruptions are typical (and even expected) at the workplace, but it doesn’t have to be chaotic, with teams rushing around to extinguish the metaphoric fire.

Visualize everything with Uptrends Grafana integration

Grafana is long known as a leading open source platform for those needing beautifully rich, composable operational dashboards. The notion of being able to connect disparate data sources to Grafana for improved monitoring of infrastructure, log analytics, and overall better operational efficiency is an increasingly alluring prospect for those in fintech, ecommerce, and other industrial sectors.

Golden signals in seconds with Universal Service Monitoring

Whether you are a site reliability engineer, DevOps engineer, or application developer, you need visibility into the health and performance of every service you run or support. But in complex, dynamic environments, it can be difficult to ensure that all services are accounted for.

Cribl Search: The Most Powerful Tool for Querying Data at Its Source

One of the most useful features of Cribl’s flagship solution Stream is its ability to separate the wheat from the chaff in your data’s journey from source to destination — Stream allows you to control what data goes to what system, Cribl Search, takes this to the next level by controlling what data should be collected before it is ever put in motion.

Oh....The Things You Can Test with Built-in Data Generators in Cribl Stream

Picture this! The coffee is hot, the keyboard is ready to rock, the bandwidth is unused, and the software is deployed (or the cloud is waiting patiently)…. but the data is missing! That’s right, most of us have been there. In our industry, it is very common for data to be the lowest common denominator for many projects.

Top 9 Synthetic Monitoring Tools in 2023

In both the corporate and personal areas, a significant portion of daily life is spent glued to screens in an era where digital transformation is widespread. We engage with many digital services, apps, and websites daily. During these digital encounters, we occasionally encounter websites with slow-loading pages or 404/server error messages.

Slash MTTR, avoid costly downtime with improved cross-team Collaboration.

Every second counts when IT teams are called upon to resolve business impacting issues. In modern enterprises, poor communication, fragmented toolchains and spiralling IT complexity can conspire to slow down incident response, putting service availability and ultimately customer satisfaction in peril.

Announcing Logz.io Open 360: One Platform for Open Source Observability

Today, I’m thrilled to announce the introduction of Logz.io Open 360™. This is a major step in our journey. Open 360 is a unified platform for modern engineering teams requiring end-to-end observability across logs, metrics and traces—delivered in an intuitive user interface. Open 360™ is specifically designed to enable engineers to have deep monitoring and insights into distributed systems.

Getting started with unified observability for AWS in less than 10 minutes using terraform

This video provides a step by step guide on how to observe AWS environments. This will only take about 10 min of working time for you to get a fully configured Elastic Cluster that is actively collecting the data of your AWS environment.

Monitor your mobile apps with Embrace's offering in the Datadog Marketplace

Embrace is a mobile application monitoring solution that helps you track and troubleshoot mobile app performance by combining data analytics, real user monitoring, network performance monitoring, and hardware monitoring in a single platform. We’re pleased to partner with Embrace to offer an out-of-the-box Embrace Datadog app and software license in the Datadog Marketplace.

Announcing TISAX-compliant observability for the automotive industry and its suppliers

Many organizations face complex regulatory requirements when it comes to monitoring the health and performance of their service and application infrastructure. As part of our ongoing commitment to providing a comprehensive monitoring solution for all customers, we’re pleased to announce that Datadog has achieved TISAX Assessment Level 2 (AL2) certification.

Improve your EC2 rightsizing recommendations with Datadog and AWS Compute Optimizer

While cloud solutions can give you greater flexibility as you scale your infrastructure, limited visibility into resource utilization makes provisioning the right amount of compute resources challenging. To ensure that every workload is fully supported, many organizations may opt to over-provision, which leads to overspending. Or, in an attempt to maximize cost savings, organizations may under-provision, leaving workloads unsupported and risking serious performance impacts.

Visualizing Time Series Data with Chart.js and InfluxDB

Time series data is a sequence of data points generated through repeated measurements indexed over time. The data points originate from the same source and track changes at different points in time. Times series data includes data like stock exchange data, monthly inflation data, quarterly gross domestic product (GDP) data, and logs from IoT sensors.

When Stream Meets Lake: Cribl Integrates With New Amazon Security Lake to Help Customers Address Data Interoperability

We’re excited to announce that Cribl integrates with Amazon Security Lake. Amazon Security Lake allows customers to build a security data lake from integrated cloud and on-premises data sources as well as from their private applications using the Open Cybersecurity Schema Framework (OSCF).

Grafana 9.3 release: Enhanced navigation, Grafana localization, Grafana Alerting updates, and more!

Welcome to Grafana 9.3! Get Grafana 9.3 In our continued efforts to make Grafana more accessible and easier to use, we are excited to showcase new updates to improve navigation, introduce localization, and much more. Read our What’s New documentation to learn all about the latest release and for more details, refer to the changelog.

The Top 3 Data Applications to Drive Your Customer Experience (CX)

The data and analytics space is booming and changing how we analyze and use data across industries. According to Fortune Business Insights, the global big data analytics market is projected to grow from $271.83 billion in 2022 to $655.53 billion by 2029 (at a CAGR of 13.4%). This boom has only been further accelerated by the recent pandemic as industries are shifting to digital solutions to address expanded market opportunities.

Flowmon Anomaly Detection & MISP

Back in 2021 we have introduced the integration between MISP, a community threat intelligence sharing platform and Flowmon ADS. The integration turns indicators of compromise shared through MISP to actionable intelligence. Flowmon ADS will automatically pick up on latest indicators of compromise using MISP API and leverage those indicators of compromise to detect adversary activities in the target network. The integration is available in Flowmon ADS 11.2 and newer versions.

New AWS services? No problem! How Sumo Logic is evolving to meet your AWS observability needs

Every year, AWS re:Invent demonstrates the pivotal role that AWS plays as companies build their modern applications. Investing in a growing portfolio of AWS services can be key to a business’s ability to compete. But this often means additional complexity: more services, across more regions and accounts. Organizations need a way to ensure their applications running on AWS services are reliable and secure. Enter Sumo Logic.

A Complete Guide to PostgreSQL Performance Tuning: Key Optimization Tips DBAs Should Know

PostgreSQL is an open-source relational database that is highly flexible and reliable and offers a varied set of features. Even though it is a complex database, it provides great integrity and performance. Also, you can deploy it on multiple platforms, including a light version for websites and smartphones. Because you can deploy Postgres in different ways, it comes out of the box with only some basic performance tuning based on the environment you’re deploying on.

Applications Manager bags another token of excellence

We are thrilled to announce that ManageEngine Applications Manager won Best Application Performance Monitoring Brand at Enterprise IT World’s CIO Select Awards 2022, which took place in Bangalore along with Enterprise IT World’s Hybrid Cloud Summit & Awards 2022. Among many reputable organizations offering application performance monitoring (APM) tools, our product stole the show by having the most efficient and user-friendly features.

SQL Server Monitoring: What metrics to track

SQL Server Monitoring has become an essential part of modern-day applications since a major chunk of these applications rely heavily on a database. It is therefore important to monitor your metrics and make the best out of your database services. SQL Server Monitoring offers plenty of metrics to choose from. We will be breaking down the five key categories that an SQL server provides for a comprehensive view of their functionality.

Top 8 Web Application Performance Metrics

Web application performance metrics help determine certain aspects that impact the performance of an application. This article discusses eight key metrics, including: Web performance is rapidly evolving, with new trends like Google’s Core Web Vitals (CWV), increasing use of images and video on web pages, and the rollout of 5G. We’ll present three methods that can help you adapt to these changes and boost the performance of your web applications.

Cleaning up your microservice resources

Managed services and serverless deployments have become increasingly popular tools in the software development process. This means that organizations are focusing less on infrastructure resources and more on the functionality and security of applications. Managed services—such as the applications like DynamoDB, Step Functions and API Gateway that are crucial to serverless architectures—come with associated costs.

How Banco Itaú tracks 1.5B daily metrics on-prem and in AWS with Grafana and observability

Brazil’s Banco Itaú is the largest bank in Latin America, so when performance and uptime issues impact its applications, the reverberations can be massive. “It can impact the whole economy of Brazil. It can damage other banks’ business too,” Ana Paula Genari Martin, SRE manager at Banco Itaú, said in her recent ObservabilityCON talk. And keeping those applications running is no small feat, considering the size of their digital operations.

Track and triage errors in your logs with Datadog Error Tracking

Reducing noise in your error logs is critical for quickly identifying bugs in your code and determining which to prioritize for remediation. To help you spot and investigate the issues causing error logs in your environments, we’re pleased to announce that Datadog Error Tracking is now available for Log Management in open beta.

New Honeycomb Integrations Let You Bubble Up Lurking AWS Issues

Today, we’re announcing the expansion of Honeycomb integrations with various AWS services. This update now covers a much wider swath of AWS services, makes it easier to integrate your AWS stack with Honeycomb, and with our new BubbleUp enhancements, you’ll be identifying and debugging hidden issues in your AWS stack faster than ever.

3 Most Common SD-WAN Issues

We’ve all been stuck in traffic congestion on the road at some point in our lives. Traffic congestion may happen when there’s too many cars on the road, when there’s an accident or a closed street. Network congestion isn’t too different from that - but instead of cars causing congestion, it’s network traffic. That’s why, in this article, we’re running you through how to detect network congestion with Network Monitoring tools.

Cribl Supports Multiple AWS Account Monitoring and Analytics with New Account Factory Customization

Keeping with our mission of helping customers gain radical levels of choice and control with their observability data, we’re excited to announce full support for the Amazon Web Services (AWS) Account Factory Customization solution within AWS Control Tower console. Customers can now use AWS Control Tower to define account blueprints that scale their multi-account provisioning in a streamlined manner.

Mitigate cold starts in your Java Lambda functions with Datadog and AWS Lambda SnapStart

AWS Lambda enables engineering teams to build modern, scalable services without the need to provision underlying infrastructure resources. But monitoring Lambda functions requires visibility into performance indicators that differ from those of traditional architectures—and cold starts are a key example.

Ubuntu Logs: How to Check and Configure Log Files

Ubuntu provides extensive logging capabilities, so most of the activities happening in the system are tracked via logs. Ubuntu logs are valuable sources of information about the state of your Ubuntu operating system and the applications deployed on it. The majority of the logs are in plain text ASCII format and easily readable. This makes them a great tool to use for troubleshooting and identifying the root causes associated with system failures or application errors.

How to install the Site24x7 APM Insight .NET Core agent

This video will walk you through the process of installing the Site24x7 APM Insight.NET Core agent. With the APM Insight.NET Core agent, you can monitor your web applications built in.NET Core 2.0 and above. You can track HTTP requests, SQL queries, errors, exceptions, web API calls, and remote calls in your ASP.NET Core applications hosted on IIS or Kestrel web servers. This installation method works in both Linux and Windows environments.

Seeing vs. Understanding - The Power of Trace Visualization

It’s common in our everyday language to conflate seeing and understanding when the two are actually very different things. For example, if every day for the last few years we spoke briefly and wrote down the total number of Covid cases in the world, it would be easy to see some trends in the data—you would see the data. But if we present the same data drawn as a chart, it’s easy to understand where the spikes and dips are and when the situation got really bad.

Crous Paris Mitigates Network Outages Before Students Notice with Progress WhatsUp Gold

The Paris location serves approximately 300,000 students and 1,000 staff. With so many people on campus, Crous Paris needs their network and servers to be in top-notch working condition. WhatsUp Gold allows us to fix a problem quickly and simply, without having to connect to each server individually in order to locate the malfunction.

Cloud Providers Health Report - October 2022

Check our October 2022 health report on the top most popular cloud providers. We analyze the health of the cloud providers based on the number of outages and problems during the month. The source of the data is made available by the cloud providers themselves via their status page. We normalize it and use it to generate the report.

Cloud Providers Health Report - November 2022

Check our November 2022 health report on the top most popular cloud providers. We analyze the health of the cloud providers based on the number of outages and problems during the month. The source of the data is made available by the cloud providers themselves via their status page. We normalize it and use it to generate the report.

Web API Monitoring Explained: A Helpful Introductory Guide

An API, application programming interface, is a collection of tools, protocols, and subroutines that can be used when building software programs or applications. APIs makes software development easier by providing reusable components and a set of clearly defined communication protocols. Recently APIs have come to mean web services, but there are also APIs for software and hardware libraries, operating systems and databases.

Using ClickHouse with MetricFire

In the analytics domain, fast and reliable storage is an important aspect for businesses to handle a large amount of data. There are different types of data storage including RDBMS, NoSQL, data lake, data warehouse, and graph database. Among these, the most widely used is RDBMS that powers various systems and applications of companies of all sizes. RDBMS is easy to use and straightforward to understand thanks to its table-based (or column-based) data format.

Citrix delivery group is running out of resources

The Load Capacity Usage percentage, or Load Evaluator Index, for a Citrix Virtual Apps and Desktops or DaaS multi session machine delivery group, represents the percentage of resources allocated to that delivery group, that is in use by virtual desktop or application sessions. It is the ratio between the sum of all measured Load Indexes, and the sum of all maximum Load Indexes for the multi session machines in the delivery group. A Load Index represents the load on a multi session machine.

StackState Named Market Leader by Research in Action

Earlier this year, StackState was named a Market Leader in the “2022 Research in Action (RIA) Vendor Selection Matrix (VSM) for Observability.” This is great recognition of the innovative path that we are on. We have focused on topology-powered observability, supported by our unique 4T® Data Model.

AppSignal for Node.js 3.0 Introduces OpenTelemetry Support

After a period of beta testing, we're happy to announce the launch of our latest AppSignal for Node.js package. This package features six new integrations and uses the OpenTelemetry framework for reliable telemetry data collection. OpenTelemetry is an open standard that facilitates the instrumentation of standardized telemetry data collection. AppSignal is committed to using OpenTelemetry in new integrations, and our Node.js integration is the first to use the standard.

An Introduction to Apache Parquet

A look at what Parquet is, how it works and some of the companies using its optimization techniques as a critical component in their architecture. As the amount of data being generated and stored for analysis grows at an increasing rate, developers are looking to optimize performance and reduce costs at every angle possible. At the petabyte scale, even marginal gains and optimizations can save companies millions of dollars in hardware costs when it comes to storing and processing their data.

What is Network Flow Monitoring, and Why You Shouldn't Live Without It

Old network salts likely know all about network flows and the value of network flow monitoring. As former News Editor for Network World and Editor in Chief of Network Computing, network flows are part of my old stomping grounds. In fact, I remember when Cisco invented NetFlow in the late 1990’s to collect traffic data from its routers and switches so it could be analyzed by network pros.

How Microsoft Has Further Developed the Microsoft Teams Environment

Over the past three years, the Microsoft Teams environment has become the de facto voice communication tool for a large percentage of organizations. But with many companies having implemented their Teams setup rapidly in the face of a dramatically changing business setting, some IT teams have been struggling to know how to get the most out of their Teams environment.

ManageEngine turns 20 | This is our story

ManageEngine turns 20 this year and we want to take this opportunity to thank our customers, who have made this journey so special. In this video, our leadership team takes you through the journey of ManageEngine, from its inception to the road ahead. Our leaders discuss the role our customers have played in creating and perfecting solutions to help simplify your IT needs and how your input and feedback has not only shaped us into the organization we are today, but continues to prepare us for the future.

Building AppStack Dashboards

AppStack™ helps you diagnose and troubleshoot performance problems faster—from applications to servers, virtualized infrastructure, databases, and storage systems. The AppStack dashboard shows you all performance information across the application stack with relationships between various elements of your environment. Kevin M. Sparenberg, SolarWinds THWACK Community Evangelist, shows you how the dashboard works and why it’s so important for troubleshooting application issues in your IT infrastructure.
Sponsored Post

What CIOs Need to Know About Digital Experience Monitoring

Many enterprises are taking a customer-centric approach to meet customer demands more efficiently and upscale their businesses. However, several siloed communication and workflows within these organisations often make it challenging to achieve the desired experiences for customers. Additionally, the initiatives adopted by the enterprises usually don't provide an extensive understanding of the end users' experiences. The market's answer to this problem is monitoring software. Enter digital experience monitoring.

What is Network Performance Monitoring?

When you think about improving visibility into your IT environment, monitoring applications and the other infrastructure that hosts them might be the first thing that may come to your mind. Although infrastructure and application monitoring are two important components of an overall monitoring strategy, an equally vital part – and one that is easier to overlook – is network monitoring. If your network fails or underperforms, your applications will also experience problems.

A Modern Guide to MySQL Performance Monitoring

According to results from the Stack Overflow Developer Survey 2022, nearly half (46%) of respondents say they use MySQL, making it the most widely-adopted database technology among developers today. This popularity is due in no small part to MySQL’s unique features that help it handily meet the needs of modern applications, from small software projects to business-critical systems.

Making the Most of CloudWatch Log Insights: 7 Best Practices

Amazon CloudWatch provides Log Insights, a feature that can help you: CloudWatch Log Insights uses a proprietary query language with several basic commands. It provides sample queries for common AWS service log types, as well as query auto-completion. Learn more about CloudWatch Log Insights capabilities and how to use them.

Hidden Costs of Cloud Networking: Optimizing for the Cloud - Part 3

I used the first two parts of this series to lay out my case for how and why cloud-based networks can effectively “Trojan horse” costs into your networking spend and highlighted some real-world instances I’ve come across in my career. In the third and final installment of this series, I want to focus on ways you can optimize your personnel and cloud infrastructures to prevent or offset some of these novel costs.

Less is more ... or more is more? Decide for yourself with Icinga DB Web list view modes

With Icinga DB Web you can now customise Icinga Web’s list views to your needs. While in one scenario you might be more interested to see as many objects as possible at a glance, in another scenario detail attributes of only a few objects will be more important to you. Yet, in the first case, you would even be distracted by more detailed information.

The Biggest Ecommerce Challenges This Black Friday

We recently featured in Ecommerce Age. If you missed the write up, you can catch up in full, here… As ecommerce continues to outdo the high street, Black Friday sales are becoming as much of a tradition as Christmas dinners. But shoppers are very influenced by external factors, from the economy to website experiences. We outline the key ecommerce challenges this Black Friday…

On-premises, Cloud First or Cloud Repatriation - What's the Trend? Which is Best?

Should you leave the cloud? Is cloud migration reversing? Is cloud repatriation a growing trend? On-premises vs. Cloud – which is best? How do you futureproof your IT and protect your business through recession or uncertainty, whether that’s in Cloud or On-prem? Today’s article is about keeping your options open within the context of recent economic and technological trends.

Going Beyond Infrastructure Observability: Meta's Approach

What’s the ultimate goal of bringing observability into an organization? Is it just to chase down things when they’re broken and not working? Or can it be used to truly enable developers to innovate faster? That’s a topic I recently discussed with David Ostrovsky, a software engineer at Meta, the parent company of social media networks Facebook and Instagram among others. He was my guest on the most recent episode of the OpenObservability Talks podcast.

Import JSON data into InfluxDB using the Python, Go, and JavaScript Client Libraries

Devices, developers, applications, and services produce and utilize enormous amounts of JSON data every day. A portion of this data consists of time-stamped events or metrics that are a perfect match for storing and analyzing in InfluxDB. To help developers build the applications of the future, InfluxDB provides several ways to get JSON data into InfluxDB easily.

Elasticsearch vs Splunk - Which tool to choose for Log Management?

Developing software is an art in itself. From building to shipping, developers have to keep iterating the process to make improvements to the existing ones. Developers around the world spend hours on building quality logging into their applications. But this logging is only efficient when we have one-page or two-page applications where debugging through logs is relatively easy.

What's in an instrumentation? An SQS and Python study

At Lumigo, we keep improving the coverage and quality of our distributed tracing instrumentation to give you, through Lumigo’s transactions, the most accurate and intuitive representation of how your distributed system behaves. In this blog, we cover a recent development for the Amazon SQS instrumentation in Lumigo’s OpenTelemetry distro for Python, providing a seamless experience for a scenario that otherwise would result in confusing, broken transactions and lost insights.

Understanding N+1 Database Queries

N+1 queries are the most common problems among developers. N+1 database query problems occur when you have to call the database for N items, and those N items have again N additional data fields which are not in the same table, and those extra N data fields are required for the use case. Generally, this issue is handled at the time of database designing, but every problem cannot be solved efficiently by one solution, some need to be solved by brute force.

Observability vs Monitoring: Which is Better?

Distributed architectures are becoming an increasingly important source of application services for organizations. Advances in observability and monitoring are being driven by this trend. But exactly how do observability and monitoring differ from one another? It's essential to know when something goes wrong in the application delivery chain so you can identify the root cause and resolve it before it has an impact on your business. Monitoring and observability offer a two-pronged strategy.

The Basics of Using AWS EventBridge for Observability

As you adopt modern, serverless, microservices-based architectures, it can become more challenging to monitor and understand the state of your applications at any given time. That’s where event bus capabilities from services like Amazon EventBridge can come in handy. AWS EventBridge can help you build loosely coupled, event-driven architectures and applications, and deploy new features faster.

Grafana crosses 1 million mark for active instances

It’s hard to think of a use case that Grafana hasn’t been used for. When Torkel Ödegaard launched the Grafana open source project with his first commit in December 2013, “my goal was to make time series data accessible for a wider audience, to make it easier to build dashboards, and to make graphs and dashboards more interactive,” he said.

Optimize Your NOC with DX NetOps and Automic Automation

Learn how Automic can help NetOps speed up triage, reduce MTTR, and alleviate the load on your NOC Today’s network team is committed to advancing the speed of triage and resolution. This can be a titanic task in modern networks, which are fueled by SDN and becoming more agile and dynamic than ever. These new networks have introduced traffic, congestion, and outages on a greater scale.

What's New in Sysdig - November 2022

Content What’s New in Sysdig is back again with the November 2022 edition! I am Matt Shirilla, an Enterprise Sales Engineer based in Texas, and I am very excited to update you with the latest feature releases from Sysdig. For Sysdig Monitor, this month brings new filtering for AWS Cloudwatch Metric Streams and a new Lambda Extension for AWS Lambda Telemetry API , plus the release of new Advisories.

The Incident Retrospective Ground Rules

I joined Honeycomb as a Staff Site Reliability Engineer (SRE) midway through September, and it’s been a wild ride so far. One thing I was especially excited about was the opportunity to see Honeycomb’s incident retrospective process from the inside. I wasn’t disappointed! The first retrospective I took part in was for our ingestion delays incident on September 8th.

Grafana Worldmap Panel

Grafana Worldmap is a free-of-cost panel used to display time-series metrics over a world map. Users can choose to visualize their data based on cities, states, countries, or any other segregation they like as long as they have a coordinate for each data point. Each data point comes in the form of circles that vary in size depending on the value of data and can get color-coded as per thresholds.

Is your website strategy ready for Black Friday?

The countdown has begun. No, not for the excitement of New Year’s Eve festivities nor for an awe-inspiring space launch. We’re talking about Black Friday and Cyber Monday, the annual mad rush of frantic Christmas shoppers looking for deals after Thanksgiving. In 2021 on Black Friday, consumers spent $109.8 billion online, up 11.9% compared to 2020. Shoppers spent another $10.7 billion on Cyber Monday, 1.4% down from $10.8 billion in 2020.

3M achieves growth, performance and reliability with AppDynamics

As one of the biggest SAP shops in the world, identifying the root cause of system performance issues impacting business operations is critical to 3M. The conglomerate needed a monitoring system that could pinpoint the problems to allow for faster resolution. By deploying AppDynamics, 3M discovered they could address risk areas in their SAP production environments and adopt it as a daily monitoring tool across non-SAP applications.

Wait... Elastic Observability monitors metrics for AWS services in just minutes?

The transition to distributed applications is in full swing, driven mainly by our need to be “always-on” as consumers and fast-paced businesses. That need is driving deployments to have more complex requirements along with the ability to be globally diverse and rapidly innovate.

Cribl Search: An Innovative New Way to Search Observability Data

These days, administrators typically have to deploy multiple tools to search through all of their datasets – then they get to spend the little free time they have left over dreaming of a world where they could search multiple distributed datasets simultaneously, similar to existing web search tools. They might have one tool for Splunk, another for Elastic, and some may even still be using grep or some other cumbersome function to search non-correlated data.

Citrix Print Manager Service is not running

The Citrix Print Manager Service is part of the Citrix Virtual Desktop Agent (VDA) software which runs on a single or multi session machine. It is used for the Citrix Advanced Universal Printing Architecture. It takes care of the client printer mappings between a user client and the VDA within an ICA session. This service heavily relies on the Windows Print Spooler service which spools print jobs and handles interaction with the printer.

What is API Observability?

Mission-critical apps that are deployed on the cloud drive today's modern enterprises, which in turn power their businesses. These applications' fundamental units are microservices, which tiny development teams created to enable speedy feature releases to the market. APIs serve as the ties that bring these microservices together so they can cooperate.

Monitoring Network Outages at the Edge and in the Cloud

Gathering data to explore a problem with power outages creating connectivity issues and ultimately draining a laptop battery. Monitoring locations that have intermittent power and/or connectivity outages can be challenging. In this article, I’ll show how to use InfluxDB, an open source time series database, InfluxDB Cloud and Edge Data Replication to store data locally and send it to a central location whenever possible.

Why Centralized Log Management? Understanding the Use Cases

Centralized log management provides various benefits across an organization. The fundamentals of log management offer a wide variety of business use cases. Whether you’re managing event log data manually or realizing you need more than an Open Source solution, finding the right internal champions can make your life easier. Understanding the business use cases and strategic impact centralized log management provides can help you gain the internal buy-in you need.

Grafana Cloud Access Policies: Say hi to the new Cloud API keys

Until recently, Grafana Cloud users had to rely on API keys to read and write data to and from the composable observability platform. These API keys had minimal features, which limited administrators’ ability to manage account access on a granular level. We’re keenly aware of these shortcomings, and we’ve been working to overhaul and replace these API keys with something more flexible, more reliable, and more secure.

Redis: Open Source vs. Enterprise

Are you curious about the difference between open-source Redis and Redis enterprise? Of course, Redis Enterprise is a hosted service that runs Redis db on behalf of its customers, while open-source Redis is available for anyone to use. However, there's also a key difference between open source and enterprise in how the clusters are implemented. In order to understand the difference, we need to know what Redis Clusters are. ‍

3 Website Monitoring Tools I'm Thankful for in 2022

Thanksgiving can seem like a stereotypical American holiday, filled with images of family and friends gathered around tables overflowing with food. But harvest celebrations are far older than the United States. People have gathered in late autumn to enjoy the fruits of their labors for generations, long before the first Pilgrim arrived in the New World. The annual harvest feast is a time to look back on and enjoy the hard-earned comforts.

Intelligent Network Monitoring and Visibility - and Why It Matters

As today’s networks grow increasingly complex, your network application and security teams are all struggling with network communications, hidden threats and misconfigurations. Each of these issues allows various vectors, increasing the likelihood of a cybersecurity incident. Often an incident occurs because an organization lacks a single pane of view, the right information sharing, and an automated, inter-departmental workflow.

Measuring application performance in Swift using transactions

So you’re building a mobile app that’s performing big data requests; or crunching big data. But now you’re asking yourself: With Sentry’s Custom Instrumentation you can keep an eye on those big data-handling functions. Let’s see how you can implement them in your Storyboard and SwiftUI projects.

Have fun again creating! Get to know visual console and dashboard editing

Did you know that the visual console editor allows users to design the final look by dragging elements with the mouse? You just have to choose the background and icons that represent the status of each relevant aspect you want to display. We tell you this and much more in the video on our workshop.

Can Observability Push Gaming Into the Next Sphere?

The gaming industry is an extensive software market segment, reaching over $225 billion US in 2022. This staggering number represents gaming software sales to users with high expectations of game releases. User acquisition takes up a large part of software budgets, with $14.5 billion US spending globally in 2021. User retention is critical to the success of any game, especially where monetization requires driving in-app purchases and ad revenue.

BGP Monitoring with Catchpoint: Finding and Fixing BGP Issues - FAST

BGP is effectively the postal service of the Internet. Without BGP, traffic doesn't move. So, when there's a configuration issue, or worse, malicious activity – the repercussions can be huge. That's why constant monitoring of BGP traffic is crucial. In this ten-minute video, Solutions Engineer Zach Henderson explains why BGP issues can damage your bottom line and then shows how to quickly detect, analyze and resolve them with Catchpoint's market-leading BGP Monitoring solution.

How Grafana unites Medallia's observability stack for faster, better insights

California-based Medallia captures feedback signals — in-person interactions, customer surveys, call centers, social media, etc. — to help businesses improve their customer experience. In much the same way, the company’s Performance and Observability Engineering team captures observability signals to optimize the experience for internal users.

Grafana vs. Chronograf and InfluxDB

How can you judge Grafana vs. Chronograf and InfluxDB? Monitoring various systems is a crucial component of continuous maintenance. You can look at different parameters of the monitored system and take corresponding actions for certain conditions. For example, engineers can prevent server failure when they see the load on the server approaching its critical point. If the numbers of processed transactions (or registered users) exceed the expected level, you can celebrate your success.

Datadog acquires Cloudcraft

A well-designed cloud architecture is essential to ensure that the underlying infrastructure stays operational, within budget, and compliant over time. These days, organizations are rapidly spreading their infrastructure across a broad, complex mesh of interconnected resources and services. It can be difficult to make high-level decisions about the design and management of these systems. This is why many organizations are now turning to cloud infrastructure modeling tools.

ITSM Tasks Got You Down? Make Like a Dad on Christmas Eve and Throw Out the Manual.

Automation is a smart investment in efficiency, productivity, and profitability. According to VentureBeat, companies that invested in automation technologies began to see results almost immediately, including an average 7% increase in revenues. In total, U.S. companies that adopted automation in 2021 generated an extra $195 billion in revenue per month, adding 7.1 million jobs to the economy.

RedHat OpenShift monitoring with Splunk's OpenTelemetry Operator

Do you have an instant view of all the full-stack automated operations in your OpenShift environment. Would you like to monitor your self-service provisioning as code, to better understand health and performance? Have you been struggling to resolve service issues and reduce the time taken for troubleshooting across all your Kubernetes deployment? We’ve got you covered!

observIQ awarded Fall 2022 Intellyx Digital Innovator Award

We talk to a lot of the top research firms about our business and the evolving space of observability so we are honored to be awarded the Fall 2022 Intellyx Digital Innovator Award. Intellyx focuses on enterprise digital transformation and the leading edge innovators helping to drive change in the enterprise IT marketplace. They give this award to vendors who make it through its rigorous selection process and deliver a successful briefing. The award is not based on being a client.

Eating Our Own Goat Food: Using Our Own Products

Here at Cribl, we’re big on GoatFooding. We not only prepare but consume our own products, in our own products. Today we’ll pull back the curtains to shine a light on how we use Cribl products within our Cribl.Cloud service. Cribl is a pioneer in the observability space, so what better way to use our products than by observing Cribl.Cloud?

Jaeger Tracing: Pros, Cons, Alternatives and Best Practices

OpenTelemetry (OTel), is an open source, CNCF (Cloud Native Computing Foundation) project that provides tools, APIs and SDKs for observability data collection (i.e, logs, metrics and traces) from cloud-native applications. Developers can use the data collected from OTel to monitor and analyze application health and performance. To leverage the data and its insights, you can export the data to external solutions, like APMs, open source Jaeger and Zipkin, Helios, and others.

Solve code-level bottlenecks with Profiling for Node.js

Profiling is an important tool in every developer’s toolkit because it provides a granular view into the execution of your program from your production environment. This is an important concept, as performance bottlenecks can often be very hard or even impossible to reproduce locally due to external constraints or loads only seen in a production environment.

Solve code-level bottlenecks with Profiling for Python

Profiling is an important tool in every developer’s toolkit because it provides a granular view into the execution of your program from your production environment. This is an important concept, as performance bottlenecks can often be very hard or even impossible to reproduce locally due to external constraints or loads only seen in a production environment. Python is one of the most popular programming languages available, and it is one of the core technologies we use at Sentry.

Global bank transforms incident alert management & communications

One of the top 10 largest financial services companies in the world 200,000+ employees worldwide. Serving tens of millions of customers. With operations in more than 60 countries, the Interlink Incident Alert Management app serves an audience of thousands of service owners and business stakeholders - across 20+ global markets.

Prepare your IT systems for Black Friday with best practices and strategies from Ulta Beauty

For retail and e-commerce companies, exponential traffic spikes are a holiday season tradition that often peaks on Black Friday. And Ulta Beauty knows this all too well. As the largest beauty retailer in the US today, operating 1,300 stores nationwide, Ulta experienced the growing pains of its on-premises IT environment: slow rollouts, tedious infrastructure management, and a lack of visibility into critical systems. Sound familiar?

Crash Reporting & Real User Monitoring for React applications

In this blog post, I’m going to talk about how to integrate Raygun4JS with React at a deeper level than what is provided out-of-the-box. None of these things are needed for Raygun4JS to do its primary job (reporting errors that happen on your website) but provide useful extra value for determining how your React application is performing and what is going wrong when an error occurs.

Announcement: Raygun Launches Public API

Today marks a new era for Raygun. With the launch our new Public API, you’ll be able to use the powerful metrics surfaced in Raygun in exciting new ways. This is our first step towards being API first, so watch for future developments. At Raygun, we pride ourselves on delivering invaluable customer-centric insights into the health of your software.

Citrix Audio Redirection Service is not running

The Citrix Audio Redirection Service is part of the Citrix Virtual Desktop Agent (VDA) software which runs on a single or multi session machine. It is responsible for redirecting audio over the Audio Virtual Channel to be used within the ICA session. If this windows service has stopped running, users having a virtual desktop session to the respective session machine have no sound.

Searching Observability Data Just Became Point & Shoot

The traditional approach for searching observability data is a tried-and-true: Once all the search staging is accomplished, we can perform high-speed, high-performance, deep-dive analysis of the data. But is this the best way or even the only way to search all that observability data? The answer to the first question is maybe, as it depends on what you are trying to accomplish. The answer to the second question must be a resounding no.

Manage your Raygun quota effectively with these tips

If you have any concerns about going over your plan’s monthly event quota, this guide will help you implement strategies to manage your Raygun error quota effectively. The key to staying within your quota is to keep the Raygun dashboard organized. This helps you save time too, as you’ll prevent non-critical errors staying active, ensuring you and your team are only alerted to the errors that matter. We recommend implementing the following actions to manage your event quota effectively.

Kafka performance monitoring metrics

In this article, we will analyze what are the metrics for monitoring Kafka performance and why it is important to constantly monitor them. We will also look at the process of monitoring metrics for Kafka using Hosted Graphite by MetricFire. To learn more about MetricFire, book a demo with the MetricFire team or sign up for the free trial.

Partitioning for Performance in a Sharding Database System

Partitioning can provide a number of benefits to a sharding system, including faster query execution. Let’s see how it works. In a previous post, I described a sharding system to scale throughput and performance for query and ingest workloads. In this post, I will introduce another common technique, partitioning, that provides further advantages in performance and management for a sharding database.

How to install the Site24x7 APM Insight Java agent in a Docker container

This video will walk you through the process of installing the Site24x7 APM Insight Java agent in a Docker container. Docker itself is the whole environment that helps you run, build, and manage your application, allowing APM to achieve its goals more quickly. Related links The argument to include in your application startup command.

Architecting for Reliability

As modern systems become increasingly more complex, the risk of incidents and outages increases. Old approaches to reliability can sometimes be adapted to novel system designs, but other times new methods need to be invented. In this panel session moderated by Datadog’s Jason Yee, you’ll hear from SRE leaders and systems architects across the industry about how they’re designing and operating systems to achieve greater reliability.

Democratizing Observability

DevOps principles have helped many organizations improve cross-team collaboration, which has in turn led to increased reliability and velocity in the development lifecycle. In this session moderated by Jason Yee, we hear from panelists who have applied these same DevOps principles to observability, helping them unlock data-based insights and empower teams to make smarter, more informed decisions.

How to centralize thousands of data sources with Grafana: Inside Adform's observability system

Over the course of two decades, Adform grew from a dream between friends huddled in a basement to a leading advertising tech platform powering more than 25,000 clients worldwide. Success brought external accolades, but it also created the need for internal innovation to support the company’s continued growth. In 2018, Adform was still operating in startup mode, which meant developers and teams cherry-picked the tools that worked best for them.

Installing the HG Heroku Monitoring & Dashboards Add-on

HG or HostedGraphite provides a complete infrastructure and application monitoring platform from a suite of open-source monitoring tools. Depending on the setup, you can choose Hosted Graphite as your data source and view all required metrics on beautiful Grafana dashboards in real time. Hosted Graphite offers a wide range of tools, add-ons, and plugins that make it possible to measure, analyze, and visualize large amounts of data about your applications with ease.

Is Your Ecommerce Site Ready for Black Friday and Cyber Monday?

The holiday shopping season is one of the most stressful periods for operators of retail and ecommerce businesses, as the seasonal surge of holiday shoppers can put massive amounts of stress and strain on even the most well-architected websites. Here’s a recent example from 2021: The Office Depot website suffered an outage during Cyber Monday that knocked the online shop offline for hours, impacting the ability of customers to place orders online.

Scaling Ingest With Ingest Telemetry

With the introduction of Environments & Services, we’ve seen a dramatic increase in the creation of new datasets. These new datasets are smaller than ones created with Honeycomb Classic, where customers would typically place all of their services under a single, large dataset. This change has presented some interesting scaling challenges, which I’ll detail in this post, along with the solution we used, and how we leveraged Honeycomb’s own telemetry to scale Honeycomb.

How IT Departments Can Manage the Infrastructure to Boost Microsoft Teams Call Quality

The person speaking to you can only hear a sort of robot voice and the dreaded pixilation of your face on a video chat has descended on your important call with your team or, much, much worse, with a VIP client – does panic set in? Well, if your IT team doesn’t have a good way to boost Teams call quality then yes. Inconsistent Teams call quality is a problem plaguing businesses in every sector, and it’s an issue that can have a real negative impact on an organization.

Using Playwright and Checkly to create an ecommerce synthetic monitoring check (no audio)

This short video shows the creation of a synthetic monitoring check using Playwright and Checkly. Here's an outline of the steps in the video: Note: If you’d like to follow along with the steps here, make sure that you have Playwright installed first.

Delivering a Successful Microsoft Teams User Experience - Microsoft MVP, Nick Cavalancia

Think delivering Microsoft Teams service to your users is simple? Microsoft MVP Nick Cavalancia lists all of the factors at play in delivering a successful Teams user experience. Spoiler alert: most of them have nothing to do with Microsoft.

What Is Splunk & What Does It Do? An Introduction To Splunk

Hi! We’re Splunk, and we’re glad you’re visiting us today. Honestly, we hear from people far and wide about “What does Splunk do?”, “Does the name Splunk mean something?” And of course, “How can I learn Splunk?” I wrote this article to help answer all these questions for you and point you towards whatever question you want answered.

Dashboard Design: Visualization Choices and Configurations

There are many visualization types and configurations available to choose from. In general, keep your visualizations as simple and straightforward as possible to avoid distraction and highlight only the most important information. If there is too much unnecessary information on the page it can be overwhelming and focus can be misdirected to unimportant details.

SD-WAN Troubleshooting: How to Troubleshoot SD-WAN Networks

SD-WAN networks are more popular than ever. With the increasing use of cloud-based applications, many businesses rely on SD-WAN services to deliver optimal Internet, cloud, and UC performance. But like all networks, SD-WAN can experience network issues that affect user experience and network performance. So keep reading to learn how to troubleshoot SD-WAN networks using Network Monitoring.

Kubernetes Monitoring: Metrics, Tools & Best Practices

Monitoring any type of resource can be challenging. But Kubernetes monitoring is a special kind of challenge. Not only are there a variety of different Kubernetes layers and resource types to monitor, but collecting monitoring data from Kubernetes can be difficult if you use a managed Kubernetes service that limits your access to the underlying infrastructure. For all of these reasons, Kubernetes monitoring requires a different approach.

AppSignal's Future with OpenTelemetry

AppSignal is a strong supporter of open-source technology. We owe so much of our modern world to the unseen, hard-working software developers who build and maintain the many technologies that make everything from reading this article to sending a message from your phone possible. That's why we're investing in OpenTelemetry, the open-source standard for telemetry data collection, rather than developing our own independent standard.

What Is Uptime and Why Is It Important?

Remember the last time you tried to visit a website or pay a bill and the spinner just kept going and going? That site needed uptime monitoring! “Uptime monitoring” refers to the practice of tracking a website’s availability and performance quality over time. This type of monitoring includes services that report on the availability of a website or server. Monitoring tools ensure that your website or server is running smoothly.

InfluxDB is 5x Faster vs. MongoDB for Time Series Workloads

At InfluxData, one of the common questions we regularly get asked by developers and architects alike the last few months is, “How does InfluxDB compare to MongoDB for time series workloads?” This question might be prompted for a few reasons. First, if they’re starting a brand new project and doing the due diligence of evaluating a few solutions head-to-head, it can be helpful in creating their comparison grid.

Yes, You Subscribed Correctly. The OPC UA Client Listener Plugin Has Been Released!

This article would not be possible without the contribution of Lars Stegman. The OPC UA Client Listener Plugin was his own contribution to a long-standing issue. Telegraf now includes a new plugin highly anticipated by the community. The OPC UA Client Listener Plugin. So you might be asking yourself: what is the big deal? There was already an OPC UA Plugin — how is this different?

Customer Story: Intercom Reduces MTTWTF With Observability and Distributed Tracing

Intercom’s mission is to build better communication between businesses and their customers. With that in mind, they began their journey away from metrics alone and towards complete observability. The first step was tooling, and they learned quickly that trying to work with multiple solutions was not the answer.

Apica Quick Guides - URLv2 Check Configurations

Have you ever wondered what that one checkbox does, where that button takes you or what a specific function does? These quick guides are designed to explain every function as quick and precise as possible so you can continue your monitoring without any disturbance. This guide will explain all of the URLv2 check specific configurations and their individual functions.

Elasticsearch Tutorial | Getting Started Guide for Beginners - Sematext

Elasticsearch is the search engine of choice for some of the very largest companies in the world. The reason Elasticsearch is loved by so many is due to its searching and indexing capabilities. In this Short elasticsearch tutorial for beginners, we will learn what elasticsearch is and what are the advantages of indexing data. We will also do some practical examples where we will deploy our own elasticsearch cluster, manually create an index, use a bash command to import large amounts of JSON documents and run our first search (query) within elasticsearch.

RUM now offers React Native Crash Reporting and Error Tracking

React Native has become the predominant development framework for cross-platform mobile applications. By interacting with native APIs largely under the hood and requiring only a fractional proportion of platform-specific code, it allows you to build applications for iOS, Android, and the browser using the same declarative JavaScript. But this cross-platform adaptability has its downsides.

AWS recognizes Sysdig as an Amazon Linux 2022 Service Ready Partner

Sysdig is pleased to announce that we’ve achieved the Amazon Linux 2022 Ready designation as part of the Amazon Web Services (AWS) Service Ready Program. Amazon Linux 2022 (AL2022) is the newest Linux operating system from AWS available to support your workloads running on Amazon EC2. The team at Sysdig validated AL2022 with Sysdig Secure and Sysdig Monitor to ensure full support for our container security and cloud-native monitoring capabilities with this latest OS.

Webinar Snippet: How to Troubleshoot SD-WAN Networks | Obkio

Check out the snippet from our latest webinar, How to Troubleshoot SD-WAN Networks. Obkio VP and Lead Network Geek explains Obkio's SD-WAN monitoring design. Obkio is a simple Network Monitoring & Troubleshooting SaaS solution that continuously monitors network and core business applications performance to identify intermittent issues and improve the end-user experience.

Avoid These Five Cloud Networking Deployment Mistakes

When transitioning from physical infrastructure to the cloud, it’s easy to think that your networks will instantly be faster, more reliable, and produce windfalls of cost savings overnight. Unfortunately, this wishful line of thinking fails to account for some of the complexities of cloud networking and is one of the biggest drivers of the cloud deployment mistakes we see.

Dash Panel Discussion: What Users Really Want

Measuring user experience is typically done by tracking metrics like latency and purchase frequency. But these metrics can often obscure real user sentiment. In this panel session moderated by Miranda Kapin, you can learn about better ways to uncover how users are truly experiencing your application and methods for improving their engagement.

What is SD-WAN technology, and how does it work?

In recent years, Software-Defined WAN Technology (SD-WAN) has changed the way networking professionals secure, manage, and optimize connectivity. As organizations continue to implement cloud applications, conventional backhaul traffic processes are now inefficient and can cause security concerns. SD-WAN is a virtual architecture that enables organizations to use different combinations of transport services that can connect users to applications.

How to design a microservices architecture with Docker containers

Application development trends guide industries (tech and non-tech alike) toward a more cloud-native and distributed model with digital-first strategies. Many organizations are adopting new technologies and distributed workflows. Software development pipelines enable teams to collaborate efficiently and maintain productivity. However, organizations that were early to embrace modern application development strategies and tools, including containerization and multi-cloud environments.

Monitoring Your Fleet With Memfault Training

Releasing a connected device in today’s world without some form of monitoring in place is a recipe for trouble. How would you know how often or if devices are experiencing faults or crashing? How can the release lead be confident that no connectivity, performance, or battery-life regressions have occurred between the past and current firmware update? In this training session we will go over.

The Evolution of IBM Integration Bus to App Connect Enterprise

IBM Integration Bus was one of the first messaging middleware applications to be developed and it has gone through many iterations to reach the stage we are at today with App Connect Enterprise. Like any software application, it has become more feature-rich as time has passed and each iteration has marked a new milestone in the capabilities that it has delivered. We will trace some of the evolutionary paths of IBM Integration Bus to see how it came to be where it is today.

Authenticating Icinga 2 API Users with TLS Client Certificates

When interacting with the Icinga 2 API, the client is commonly authenticated using a password provided via HTTP basic auth. Icinga 2 also support a second authentication mechanism: TLS client certificates. This is a feature of TLS that also allows the client to send a certificate, just like the server does, allowing the server to authenticate the client as well.

Jon Leighton on How the Digital Employee Experience is Central to a People-First IT

Digital Employee Experience (DEX) is grossly understated in today’s corporate landscape, despite the growing interest in digital transformation in companies everywhere. Technology has long been central to the workplace, but not enough effort has been put into the experience of those who use it. Enterprises, especially global enterprises, can spend upward of tens of thousands of dollars on software and digital appliances to make work easier for their team.

Essential Tips to Build a Great Small Business Website

Without an online presence, running a business is no longer possible. You can't rely on only social media to reach out to people and provide all information related to your business. From products, descriptions, prices, and functioning hours. People look for everything online. If you have products or services, you need at least a small business website to let people know what you deal with. It surely helps you provide a market, reach more customers, and gain popularity in just a short period.

Scaling Throughput and Performance in a Sharding Database System

Understand the two dimensions of scaling for database query and ingest workloads, and how sharding can make scaling elastic — or not. Scaling throughput and performance are critical design topics for all distributed databases, and sharding is usually a part of the solution. However, a design that increases throughput does not always help with performance and vice versa. Even when a design supports both, scaling them up and down at the same time is not always easy.

Independence with OpenTelemetry on Elastic

The drive for faster, more scalable services is on the rise. Our day-to-day lives depend on apps, from a food delivery app to have your favorite meal delivered, to your banking app to manage your accounts, to even apps to schedule doctor’s appointments. These apps need to be able to grow from not only a features standpoint but also in terms of user capacity. The scale and need for global reach drives increasing complexity for these high-demand cloud applications.

Why Hybrid Network Monitoring is Key for Retailers This Holiday Season

The main street in my town is mostly lined with mom-and-pop shops, and I love to support these businesses. Large online retailers keep making it harder for these stores to compete, so I think it’s important to keep doing business with them when I can. Lately, it’s been interesting to see that these mom-and-pop shops increasingly have something in common with the largest online retailers: They’re reliant on the internet to deliver their goods and services to consumers.

Suppressing Dissent: The Rise of the Internet Curfew

In the evening on September 30, people across Cuba found their internet service cut. The residents of this Caribbean nation had begun protesting their government’s tepid response to Hurricane Ian which had wrought destruction a week earlier. Internet service returned to normal the following morning, but this outage wasn’t caused by storm-related damage. This blackout was a deliberate act, a fact confirmed when service dropped out for the same period of time the following day.

Grafana vs. Splunk

Are you trying to choose between Grafana and Splunk, but can't find enough information about their capabilities? In this blog, we highlight the details of why a user should select Grafana OR Splunk as part of their monitoring stack and what are the user benefits of each. Also, you can check out what it's like to make your own Grafana dashboard using our MetricFire free trial. Get onto the product in minutes and see if you prefer Grafana over Splunk.

10 Best Error Monitoring Tools to Use in 2024

Software has changed the world we live and work in. The people behind the code are the real heros, but they require the technology to ship better software, faster. So how do they do this? They have to work smarter, be proactive and respond to problems quickly. Let's look at the top 10 error monitoring tools on the market to help you find the best solution for you and your team.

Cribl.Cloud Is Now On AWS Marketplace!

As of 2022, 49% of enterprise workloads and data are in a public cloud, and that number is expected to increase by 6-7% over the next year. Why? With big cloud moves come big benefits: optimized performance, reduced management overhead, and cost savings on data centers. However, it also comes with the struggle to get a handle over never-ending data growth. Customers are looking to Cribl to help route and process that data at scale and need a seamless way to get started within minutes.

5 Tips For Consumers To Shop Safely This Black Friday

While it makes for bleak reading, the frenzy of sales and online shopping activity surrounding Black Friday, means this pre-holiday season is a key period for cybercriminals. And each year we see an increase in cyberattacks during what should be a feel-good time. The picture is all-the-more worrying in 2022, as this Black Friday weekend (25th-28th November) falls on the same date as the USA vs England World Cup game – a highly- anticipated day of betting for bookmakers.

Reduce Data Costs: Log Sampling with OpenTelemetry and BindPlane OP

Redundant logs are a common nuisance in observability pipelines of all kinds. In large environments, excess logs can multiply data costs to unsustainable amounts. Log sampling is the process of randomly sampling logs to produce the same valuable insight with dramatically reduced data flow. Configuring agents in a pipeline to appropriately sample logs can be a pain. Pipeline managers, like BindPlane OP, make that process simple and scalable.

How Do You Measure Application Performance?

Web performance isn’t just about how long a website needs to render all its page elements—it also covers techniques for monitoring an application’s runtime, user-defined transactions, component response times, and network requests. The important thing is using performance data to evaluate the success of your app or service, whether you’re trying to compare different versions or introduce new capabilities.

Reducing MTTR for DevOps and SREs with PagerDuty Process Automation and InfluxDB

Mean time to resolution (MTTR) is a metric that transcends industry and technology. It’s a measure of how quickly, on average, support teams identify, act, and resolve IT issues and incidents. Because MTTR directly relates to service quality, maintaining a low MTTR is a critical goal for DevOps and SRE teams. These teams have a vested interest in resolving issues quickly because escalating incidents to higher levels of the support team increases response and resolution times.

My Most Surprising Discoveries from The SRE Report 2023

I’ve had the honor and privilege of authoring The SRE Report for the last three years. For the 2023 version, this included working with some amazing individuals like Anna Jones, Kurt Andersen, and Steve McGhee. Download The SRE Report 2023 here (no registration required).

5 Reasons Why OpenTelemetry is the Future of Observability

It has been said that open source is eating the world and in the observability space, the project behind this movement is OpenTelemetry. The project is quickly becoming the standard for instrumentation and collection of observability data. Why is an open standard and open-source approach to instrumentation and data collection so compelling? This talk will provide five reasons why OpenTelemetry is disrupting the observability market.

Generate RUM-based metrics to track historical trends in customer experience

Datadog Real User Monitoring (RUM) provides end-to-end visibility into the user experience and performance of your browser and mobile applications. RUM allows you to capture and retain complete user sessions for 30 days. This means you can pinpoint bugs, prioritize issues, and determine fixes with data collected across an entire quarter.

Pipeline Profiling: Or How I Learned to Stop Worrying and Isolate the Problem

It’s that time of year again! If you’re not a procrastinator, you’ve probably already blown out your sprinklers for winter and are looking forward to the snow and holidays ahead. Well done, irrigation purists! I, on the other hand, am an olympic-level procrastinator and will usually wait until the last moment before NWS forecasts a 10″ snow for the night then frantically search for my air compressor.

The Human Element of Tech Development

Opportunities for growth are all around us, but it takes the ability to be open and an eager growth mindset to see them. In this episode, David Noblet, Co-Founder + Chief Architect at ChaosSearch, shares how he and his team find innovative ways to improve digital services for their clients by constantly taking inspiration from their daily lives.

Grafana 9.2: Create, edit queries easier with the new Grafana Loki query variable editor

As part of the Grafana 9.2 release, we’re making it easier to create dynamic and interactive dashboards with a new and improved Grafana Loki query variable editor. Templating is a great option if you don’t want to deal with hard-coding certain elements in your queries, like the names of specific servers or applications. Previously, you had to remember and enter specific syntax in order to run queries on label names or values.

Building a Multi-Tenant Insurance Platform

In 2020, CoverWallet—a multi-tenant insurance platform—was acquired by Aon, which led to a rapid expansion in both the size and global presence of its engineering organization. In his talk, CoverWallet’s Hylke Alons walks through the changes that were necessary to meet their platform's new expectations, including improving growth and scalability while ensuring reliability, automating security, and reducing maintenance. He also discusses some best practices for scaling up engineering and product teams to handle demand in a complex and highly regulated industry like insurance.

Optimize Trace Memory with Scout

Application performance monitoring (APM) is a process of monitoring and analyzing performance issues within an application. In monolithic architecture, monitoring the performance of an application using APM tools was straightforward. However, once the application adopts microservice architecture, the application becomes more complex, and the business functionalities flow into different microservices to complete the workflow.

Answering the FAQ of CPU temperature monitoring

Have you ever wondered how productive we could be if we could measure and monitor our brains and be alerted every time we overused them? While there’s no practical way to measure the performance of the human brain without expensive medical equipment, you can track metrics like these for your computer’s brain—its central processing unit (CPU). A device’s performance depends on the condition of its CPU; a device cannot function properly without a CPU.

Kubernetes Audit Logs - Best Practices And Configuration

Kubernetes is the de facto leader of container orchestration tools. With the growing popularity of micro-service-based development, Kubernetes emerged as the go-to tool to deploy and manage large-scale enterprise applications. However, with the plethora of features offered by Kubernetes, it is a complex tool to manage and operate. This article will focus on how to configure Kubernetes Audit Logs so that you can have the records of events happening in your cluster.

Using Run IDs to Track Run Times Of Overlapping Jobs

Healthchecks.io recently got a new feature: run IDs. Run IDs are client-chosen UUID values that the client can optionally add as a “rid” query parameter to any ping URL (success, /start, /fail, /log, /{exitcode}). What are run IDs useful for? Healthchecks.io uses them to group events from a single “run”, and calculate correct run durations. This is most important in cases where multiple instances of the same job can run simultaneously, and partially or fully overlap.

11 Best Tools to Monitor and Debug JavaScript in 2023

JavaScript is one of the most widely used programming languages for creating dynamic, interactive websites. However, there may be instances where a function is not operating as intended because of a coding error while creating JavaScript projects. Therefore, the majority of developers hunt for JavaScript debugging tools to avoid problems and identify errors before execution.

Five Playwright Tips to Level Up Your Testing Game

I joined Checkly a few months ago, and because our platform enables you to use Microsoft’s Playwright to run your synthetic monitoring, I started getting my hands dirty with the end-to-end testing framework. I’m a massive fan of learning in public, so I started publishing weekly Playwright tips on YouTube. Did you miss a few videos? Don’t sweat it! Here’s the first collection of Playwright tricks I discovered over the last few months.

Edge + AppScope: Unlocking New Insights You Didn't Know Existed Was Never This Easy!

The moment has finally arrived! “Yes! I do” “Yes! I do” With great joy, I now introduce to you the newly married Edge and AppScope! Beginning the journey of a lifetime, let’s give it up for this power couple! Together they offer auto-discovery, central management, high scalability, high-fidelity data collection, and rich observability.

Watch: How to get started with Grafana Phlare for continuous profiling

A big piece of news to come out of ObservabilityCON in early November was the launch of Grafana Phlare. Phlare is an open source, horizontally scalable, highly available, multi-tenant continuous profiling aggregation system. Continuous profiling has been dubbed the fourth pillar of observability, after metrics, logs, and traces. The idea behind Phlare was sparked during a company-wide hackathon at Grafana Labs.

How Observability Pipelines Save Your Budget

Our recent blog post about observability pipelines highlighted how they centralize and enable data actionability. A key benefit of observability pipelines is users don't have to compare data sets manually or rely on batch processing to derive insights, which can be done directly while the data is in motion. As a result, teams get access to the data they need to make decisions faster.

Announcing New CircleCI + Honeycomb Integration Guide

If you’re writing software today, then you likely use a CI/CD pipeline to build and test your code before deploying it to production. Having a fast and efficient build pipeline saves you development time, shortens feedback loops, and helps you ship features faster. Conversely, slow and unreliable build pipelines are full of lost productivity and sadness.

Ramp Up (MSP Edition) | Ep 01 | David Beck on Hierarchical Multi-Tenancy | Rapid Onboarding

Here is the first episode of Ramp Up - MSP Edition where David Beck gives a quick take on how Hierarchical Multi-Tenancy helps rapid onboarding for MSPs. Also, follow us on social media channels to learn about product highlights, news, announcements, events, conferences and more.

3 Challenges of Kubernetes Monitoring (With Solutions)

Kubernetes monitoring is complicated. Knowing metrics on cluster health, identifying issues, and figuring out how to remediate problems are common obstacles organizations face, making it difficult to fully realize the benefits and value of their Kubernetes deployment. Understanding how to best approach monitoring Kubernetes health and performance requires first knowing why Kubernetes observability is uniquely challenging.

Replaying flows and troubleshooting issues in mobile app development using OpenTelemetry

iOS and Android apps are often a common component of distributed applications, forming a key part of the software architecture. These mobile apps provide another way to access data and perform actions on various services, requiring tight integration between the apps and the components which serve the data and control it.

MTTD: An In-Depth Overview About What It Is and How to Improve It

In this post, we'll learn all about the incident metric mean time to detect (MTTD). We'll see how to measure it and look at its relationship with other incident metrics like MTTR (mean time to recover). Both metrics give useful insights into your incident recovery ability.

Most Citrix VAD/DaaS stack components are not Citrix

A Citrix Virtual Apps and Deskstops or Citrix Desktop as a Service deployment does not comprise Citrix components only. For instance, SQL databases are crucial components of every Citrix VAD stack and therefore need to be monitored and managed intensively. Besides whether your Citrix delivery controllers have proper access to them, you obviously need to track the database performance and capacity as well.

How we run our Python tests in hundreds of environments really fast

Not in a reading mood? You also can watch the talk I gave at DjangoCon 2022. One of Sentries core company values is “for every developer”. We want to support every developer out there with our tools. But not every developer uses the newest or widely adopted tech stack, so we also try to support older versions of libraries and frameworks.

Where is data center architecture headed to?

You know what data centers* are, we’ve told you a lot about the on this blog. Today, however, it is time to check out a particular aspect such as the singleness of their architecture**. In addition to what role they play in the present and which one they will play in the future. * Physical facility that organizations use to host their information, applications, critical data… **There’s a good example of alliteration, great rhetorical figure. So let’s go!

Everything You Need to Know About Synthetic Testing

Part two of a three-part guide to assuring performance and availability of critical cloud services across public and hybrid clouds and the internet Monitoring your user traffic is critical for knowing the quality of the digital experience you are delivering, but what about the performance of new cloud or container deployments, expected new users in a new region, or new web pages or applications that don’t have established traffic? This is where synthetic testing can be invaluable.

Using Kentik Synthetics for Your Cloud Monitoring Needs

The final post of a three-part guide to assuring performance and availability of critical cloud services across public and hybrid clouds and the internet Kentik Synthetics adds integrated, autonomous, pervasive performance test telemetry to our market-leading network traffic analytics and observability platform, the Kentik Network Observability Platform. For modern clouds, proactive synthetic monitoring can no longer be delivered as a standalone tool.

17 Popular Java Frameworks for 2023: Pros, cons, and more

In 2023, Java is still the third most popular programming language in the world. It encompasses a vast ecosystem and more than 9 million Java developers worldwide. Java’s popularity comes down to a few key advantages; it’s a platform-independent language (write once, run anywhere) that follows the object-oriented programming paradigm and is straightforward to understand, write, and debug.

Keep track of Core Web Vitals for transactions

If you are looking to improve the quality of your website and offer a superior user experience, then you are likely looking to improve SEO by whatever technologies are available to accomplish that goal. Google has been pushing out user-centric improvements — eliminating keyword stuffing, making page load time a ranking factor, and so on — meant to make navigating your website easier and improve SEO.

Tableau Review: Tableau vs MetricFire

Every day, businesses monitor system resources for performance, security, performance, and workflows. Otherwise, they jeopardize day-to-day operations when issues go unnoticed. Tableau presents itself as a data-driven monitoring tool that enhances data analysis of physical and virtual server environments. But just how good is it?

SolarWinds Review: SolarWinds vs. MetricFire

SolarWinds is a network and application monitoring solution, but primarily a network monitoring solution. Founded in 1999, the company has built an online community of 150,000 registered users. However, monitoring has come a long way since the early 2000s. How does SolarWinds stack up against MetricFire in terms of features and pricing? In this article, we break down the comparison into easily digestible, unbiased information to help you make an informed decision.

ELK Review: ELK vs. MetricFire

PU, memory use, latency, network bandwidth. These are just some of the monitoring metrics businesses analyze for security and performance. But successful data-driven organizations delve deeper than this. These companies probe millions of real-time metrics for unexpected insights and predict outcomes weeks, months, and years into the future. ELK helps them do this. It's a data analytics platform from open-source developer Elastic.

New Relic Alternative for Python

Python is one of the most used languages among developers. There are many reasons why python is very famous among developers, which we will discuss in this blog. Due to the fame of python, it is used in many business applications, hence monitoring a python application is crucial. New Relic is one of the oldest monitoring tools for python monitoring. But New Relic competitors are growing rapidly; hence, if you do not like the New Relic user interface, many New Relic alternatives exist.

Hidden Losses: The ROI of Insight into Microsoft Teams User Experience

If your business relies on Microsoft Teams to keep employees connected and productive, it is highly likely that some portion of your user base is experiencing problems such as delay or failed calls every day. If this is a surprise to you, you may be wondering what the impact of this is on your business. In the second edition of our blog series, we look at how to measure the impact of these undetected issues, and the Microsoft Teams ROI of implementing solutions to detect them more proactively.

Choosing an Observability Pipeline

An observability pipeline is a tool or process that centralizes data ingestion, transformation, correlation, and routing across a business. Production engineers across ITOps, Development, and Security teams use them to more efficiently and cost-effectively transform their telemetry data to drive critical decisions. Businesses of all sizes can enjoy several benefits and gain a significant competitive advantage by implementing an observability pipeline.

Top 5 FinOps Tips to Optimize Cloud Costs

The efficiency, flexibility and strategic value of cloud computing are driving organizations to deploy cloud-based solutions at rapid pace. Fortune Business Insights predicts the global cloud computing market will experience annual growth of nearly 18% through 2028. As the cloud becomes one of the most expensive resources for modern organizations, cloud financial management, or FinOps, has become a critical initiative.

RedHat OpenShift monitoring with Splunk's OpenTelemetry Operator

Do you have an instant view of all the full-stack automated operations in your OpenShift environment. Would you like to monitor your self-service provisioning as code, to better understand health and performance? Have you been struggling to resolve service issues and reduce the time taken for troubleshooting across all your Kubernetes deployment? We’ve got you covered!

Ensure your Kubernetes workloads are achieving their full potential with Splunk Observability

Kubernetes provides a strong foundation for delivering containerized services. While these capabilities can extend your application’s potential, the platform also introduces new dynamics not present in traditional host-based services. See first hand how Splunk’s Observability platform provides infrastructure monitoring views, to ensure the pods and containers delivering your workloads are continuously monitored and well understood.

Citrix NetScaler Denial-of-Service (DoS) JavaScript Triggers

Are you aware when Denial-of-Service (DoS) JavaScript triggers occur on your Citrix NetScaler devices to block DoS attacks? Do you know when exceptions are made based on 'valid' DoS cookies? MetrixInsight for NetScaler monitors for these occurrences and alarms you whenever this happens (too often).

12 top PHP frameworks for web developers to consider in 2023

PHP, or Hypertext Preprocessor (originally Personal Home Page), is an open-source server-side scripting language used for developing dynamic websites and web applications. It’s extremely popular, too — more than 75% of all websites were still using PHP as of October 2022, with no signs of slowing down any time soon. It’s free to download and use and powerful enough to run some of the biggest websites on the internet (WordPress, Facebook, and Wikipedia, just to name a few).

Grafana vs. Tableau

When it comes to visualization tools, there are various options, all designed for different kinds of data. Some of the most recognized among them include Grafana and Tableau. If you’re not sure which one to use, this article should give you a better idea of what kind of purpose each one has and which one will suit your needs best. One great way to find out what tool works best for you is to try it out! Try out Grafana in seconds on MetricFire's Hosted Grafana free trial.

Binero.Cloud transitions from Nagios to Icinga to enhance user experience

We´re proud of our many customers and users around the globe that trust Icinga for critical IT infrastructure monitoring. That´s why we´re now showcasing some of these enterprises with their Success stories. It´s stories from companies or organizations just like yours, of any size and different kinds of industries. Some of them are our long-standing customers, others have just recently profited from migrating from another solution to Icinga.

OpenSearchCon: Together after 18 Months

OpenSearch was created by the community for the community to continue to keep an open-source alternative to ElasticSearch and Kibana. The project has been hard at work for the last 1.5 years building, launching and iterating on this important initiative. Some remarkable milestones have been achieved, including over 5,800 stars on GitHub with 19 different community-led projects.

One Click Visibility: Coralogix expands APM Capabilities to Kubernetes

There is a common painful workflow with many observability solutions. Each data type is separated into its own user interface, creating a disjointed workflow that increases cognitive load and slows down Mean Time to Diagnose (MTTD). At Coralogix, we aim to give our customers the maximum possible insights for the minimum possible effort. We’ve expanded our APM features (see documentation) to provide deep, contextual insights into applications – but we’ve done something different.

What Causes False Positive Alerts?

According to Orca Security’s 2022 Cloud Security Report, 59% of respondents received over 500 alerts a day, with more than 42% of them being false positive alerts. And 62% of them said it has contributed to employee turnover. With numbers like this, it’s no wonder why developers dread the false positive alert. They waste time, energy, and money for everyone in every technology space, whether it is cloud or web services. It’s time to change that.

A look under the hood at eBPF: A new way to monitor and secure your platforms

In this post, I want to scratch at the surface of a very interesting technology that Elastic’s Universal Profiler and Security solution both use called eBPF and explain why it is a critically important technology for modern observability. I’ll talk a little bit about how it works and how it can be used to create powerful monitoring solutions — and dream up ways eBPF could be used in the future for observability use cases.

Sponsored Post

AIOps Hurdles Not Many Vendors Talk About

According to one survey, 94% agree that AIOps is “important or very important” to manage network and cloud applications performance. AIOps intends to help customers contextualize humongous data volumes and streamline IT operations with automation. As IT infrastructure grows in complexity, alerts flood IT Ops centers and Ops teams drown in managing the deluge.

The Leading Sumo Logic Alternatives

Using Sumo Logic, you can analyze both metrics and logs simultaneously. Developed in 2010, this solution provides a powerful query language and scheduling support. Sumo Logic's production monitoring features provide visibility into production issues. Instead of manually writing alerts, the platform offers pre-configured alert templates (which Logit.io also offers), which makes setting up alerts easier and faster.

How to use Cribl Stream and ChaosSearch for Next-Gen Observability

The market for enterprise observability solutions is growing in 2022, as organizations search for more effective ways to maintain security and oversight of increasingly complex and distributed IT systems. Traditional observability solutions like Splunk, Datadog and New Relic are still widely used by enterprises to analyze logs, metrics, and traces from their IT environments. But as enterprises generate increasing volumes of log data, two things tend to happen.

Grafana alerting

A lot of organizations are using Grafana to visualize information and get notified about events happening within their infrastructure or data. In this article, we will show how to create and configure Grafana Alert rules. To get started, log in to the MetricFire free trial, where you can send metrics and make Grafana dashboards right on our platform.

Kentik takes network observability to KubeCon 2022

If you’re an engineer trying to fix real problems with your apps, looking at just one small part of the picture isn’t going to cut it. This is why Kentik is so focused on helping you understand what’s going on beyond single k8s instances, and it’s a big part of what network observability is all about. This was Kentik’s message at Kubecon 2022, which was a memorable event for us.

Expanded Datadog Lambda extension capabilities with the AWS Lambda Telemetry API

In 2021, we partnered with AWS to develop the Datadog Lambda extension which provides a simple, cost-effective way for teams to collect traces, logs, custom metrics, and enhanced metrics from Lambda functions and submit them to Datadog.

How to Monitor SNMP with OpenTelemetry

With observIQ’s latest contributions to OpenTelemetry, you can now use free open source tools to easily aggregate data across your entire infrastructure to any or multiple analysis tools. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here.

Achieve observability with Site24x7 and AWS Lambda Telemetry API integration

The Lambda Telemetry API empowers users to integrate monitoring and observability tools like Site24x7 with their Lambda functions. Site24x7 is an AWS-reviewed Lambda Service Ready Program Partner and is announced as a launch partner in AWS Lambda Telemetry API feature release. Customers, AWS partners, and the serverless community can use the Lambda Telemetry API to receive telemetry streams from the Lambda service, including function, extension logs, and metrics coming from the Lambda platform.

AWS Lambda Telemetry API: a new way to process Lambda telemetry data in real-time

Back in 2020, we covered the launch of Lambda Extensions and the subsequent release of the Lambda Logs API. These features aren’t designed for the average Lambda user. But they allow vendors to build better tools by giving them much-needed access to the Lambda execution environment.

How many data sources do you monitor? Find out how you measure up in our Observability Survey

Here at Grafana Labs, we’re deeply committed to our “big tent” philosophy — the idea that disparate data sources, from different software providers, in different industries, built for completely different use cases, can come together in one composable observability platform. As part of that commitment, we’ve set out to hear from our community about their observability practice and what they hope to see in this space in the future.

What is Logging as a Service (LaaS)?

Logging as a Service, or LaaS, is a proven approach to managing and monitoring high-volume log data in modern dynamic environments. LaaS allows companies to manage log data regardless of whether it comes from applications, servers, or devices. With LaaS, companies can more easily aggregate and collate data, scale and manage storage requirements, set up notifications and alerts, and analyze data and trends. It also allows teams to customize dashboards, reports, and visualizations.

Solving for cloud/multi-cloud network complexity

Networking in multi-cloud / hybrid cloud / data center environments continues to grow in complexity and so does the inherent challenge of monitoring traffic and resource utilization. Join industry expert and podcaster Eric Wright as he leads a discussion with Kentik and Alkira about observability practices and methods for network, cloud, virtualization, and application ops teams. What you’ll learn.

A practical guide to capturing production traffic with eBPF

Monitoring HTTP sessions offers a potentially powerful way to gain visibility into your web servers, but in practice, doing so can be complex and resource-intensive. Extended Berkeley Packet Filter (eBPF) technology allows you to overcome these challenges, giving you a simple and efficient way to process application-layer traffic for your troubleshooting needs.

Looking Below the Surface to Understand the Microsoft Teams User Experience

Microsoft Teams has become a critical communication and collaboration tool for thousands of enterprises around the world. Teams is keeping today’s hybrid workforce connected and productive through video and voice calls, chat and connectivity to other Microsoft 365 applications like Outlook and SharePoint.

Martello Garners Top Marks for Proactive Microsoft Teams Monitoring Tool

The abrupt transition to remote work that has occurred over the past few years has caused Microsoft Teams to become an indispensable application for many companies. As such, it is absolutely imperative to make sure that Teams users have a good experience, particularly with regard to call quality. Herein lies the problem. Although Microsoft has provided some rudimentary tools that can be used for user experience monitoring, those integrated tools are inadequate for enterprise use.

How to detect anomalies in logs, metrics, and traces to reduce MTTR with Elastic Machine Learning

Elastic Observability has extensive machine learning capabilities that support and improve analysis in APM. Learn techniques for correlating and detecting anomalies of telemetry data from APM agents for a particular application.

Three multi-tenant isolation boundaries of Kubernetes

Many of the benefits of running Kubernetes come from the efficiencies that you get when you share the cluster – and thus the underlying compute and network resources it manages – between multiple services and teams within your organization. Each of these major services or teams that share the cluster are tenants of the cluster – and thus this approach is referred to as multi-tenancy.

Citrix NetScaler HA Failover Monitoring

When your NetScaler appliances are running in a High Availability (HA) pair you should be aware when failovers occur, as this indicates an issue on one or both of the appliances, which you need to solve before both the appliances have the same issue. In the case the latter happens, or is already the case, the appliances will fail over continuously, which is known as HA failover ‘flip-flopping’.

Citrix NetScaler Unsaved Configuration

Are you aware when an administrator has forgotten to save configuration on a Citrix NetScaler device? If configuration is not saved after a configuration change it will get lost when the NetScaler appliance restarts. This can obviously have severe consequences. MetrixInsight for NetScaler will alarm you whenever unsaved configuration is detected.

Using the REST API Monitoring Support in WhatsUp Gold

With the REST API Monitoring in WhatsUp Gold's Application Performance Monitoring (APM), you can monitor applications by creating components in a custom Application Profile. For example, if you need to monitor PAS for OpenEdge, you can use the integrated Application Profile to query its OEManager and HealthCheck APIs. Our team of engineers has developed a comprehensive technical guide detailing exactly how to set up this kind of functionality in WhatsUp Gold.

Network Monitoring for Dummies

Network Monitoring For Dummies helps you recognize best practices for monitoring and managing your organization's network, how to grasp network monitoring fundamentals, define alerts and actions, and much more. In this special edition, you'll learn best practices and key concepts for network monitoring from WhatsUp Gold's Product Expert Mark Towler and author and editor Doug Barney.

Cluster Monitoring with Prometheus and Rancher

In this article, we present an overview of cluster monitoring using Rancher and Prometheus as well as provide some brief setup tutorials for both tools. We further introduce a metric visualization tool called Grafana that transforms your Prometheus time-series data into graphs and visualizations. MetricFire specializes in monitoring systems. You can use this product with minimal configuration to gain in-depth insight into your environment.

How to Monitor Redis Performance

In this article, we are going to look at how to monitor Redis performance using Prometheus. This will allow Redis Administrators to centrally manage all of their Redis clusters without setting up any additional infrastructure for monitoring. To follow the steps in this blog, sign up for the MetricFire free trial, where you can use Graphite and Grafana directly on our platform.

How to monitor NGINX web servers?

Web servers are among the most important components in modern IT infrastructures. They host the websites, web services, and web applications that we use on a daily basis. Social networking, media streaming, software as a service (SaaS), and other activities wouldn’t be possible without the use of web servers. And with the advent of cloud computing and the movement of more services online, web servers and their monitoring are only becoming more important.

How to monitor and troubleshoot Apache web servers

The Apache HTTP Server (Apache HTTPd) is one of the most popular open source web servers available. HTTPd was also the first project developed by the Apache Software foundation which now supports hundreds of well known projects including Kafka, Cassandra and Hadoop. Netdata has a public demo space where you can explore different monitoring use-cases. Check out the Apache demo room to explore and interact with the charts and metrics described here.

Kaplan [InfluxData], Breck [Tesla] | Value of Building Great Developer Experience | InfluxDays 2022

Join Evan Kaplan, CEO at InfluxData, and a long-time InfluxDB community member as they discuss how to create a stronger developer experience (DX). Hear from Evan and Colin Breck, Cloud Platforms Lead at Tesla Energy Products, to learn more about industry best practices and how organizations can improve the experience for developers.

Observability Pipelines Have Never Been This Easy to Set Up, Manage, and Troubleshoot

Did you know? Baby goats stand and take their first steps within minutes of being born. There is no stopping the goats! Likewise, there’s no stopping the goats at Cribl from innovating! Our product herd is growing rapidly, and it’s time for us to share the latest and greatest about our head of the product herd – Cribl Stream. (If you haven’t figured it out yet, We love goats here at Cribl. We even have a goat mascot; meet Ian)

What's New With Cribl.Cloud: Search, BYO IdP, AWS Marketplace, and More

If you have an Amazon Prime membership, you probably got it because of fast and free shipping. And then you discovered your membership also lets you watch tv, movies, and even live sports through Prime Video. When Amazon acquired Whole Foods, your membership included grocery delivery in under 2 hours! The more I explore my Prime membership, the more I find new and exciting access to services I don’t have to sign up for.

Livin' on the Edge With Cribl Edge 4.0: Featuring Improved Scalability, Enhanced Fleet Management, and AppScope Integration

Cribl CEO Clint Sharp first announced Cribl Edge in March of 2022. Our SVP of Marketing, Abby Strong, complemented the announcement with a well-rounded blog post discussing why Cribl Edge is the first fully manageable and auto-configurable agent designed to collect telemetry data at scale. Even Aerosmith gave the product a shoutout! Well, not really, but wouldn’t it be fun if it was true? 🙂 We’re thrilled to be back with exciting news about the latest release of Cribl Edge (4.0).

Cribl's New User Interface: Simple, Accessible, and User-Friendly

The last year has been HUGE for us here at Cribl; we’ve seen explosive growth across our business. For those of us in the Product teams, one of the most exciting areas of growth has been the launch of two new ground-breaking products. First, we launched the first fully manageable and auto-configurable agent designed to collect telemetry data at scale – Cribl Edge, which enables customers to move data collection, processing, and routing out into the data source itself.

Advancing Observability: Cribl Search and New Product Enhancements Available Today

Product launch day is our favorite here at Cribl. It’s the culmination of hard work from our entire team and, better yet, the first time our customers get their hands on our latest innovations. And today is a big one. Our newest product, Cribl Search, is now generally available on Cribl.Cloud.

How to get the most from SAP on Azure

In this post, we’ll show you how to get started with SAP on Azure. We’ll also cover the key features of SAP on Azure and how they can benefit your organization. We’ll show you how to set up SAP on Azure using Azure Active Directory (AD) and how to configure your SAP on Azure account. We’ll also show you how to access your SAP on Azure account. Give Avantra a try today for free.

Getting Started with Fluentd for Data Collection

Fluentd is an open source data collector capable of retrieving and receiving event data from several sources and then filtering, buffering, and routing data to different compatible destinations. It utilizes a plug-in system to help you quickly set up specific inputs, apply any required filtering, and send data to your preferred data ingestion platform. Fluentd supports multiple sources and destinations, and it can be deployed to multiple operating systems, including Windows, Linux, and macOS.

How Time Series Data Empowers Telcos to Stay Competitive

Time series databases can help telecommunications companies become more reliable, efficient and productive. The telecommunications industry is undergoing rapid change as a handful of new technologies and government actions change the underlying business landscape and create space for new companies to innovate and disrupt the established players.

TL;DR Python, Pandas Dataframes, and InfluxDB

InfluxDB has over a dozen client libraries so developers can get started more easily and program in the language they’re most comfortable with. One of our most popular options is the Python client library. InfluxDB supports not just Python but pandas, a tool popular with data scientists for analyzing and manipulating data. You can use the client library to output data from InfluxDB into a DataFrame format pandas can ingest, and you can write pandas DataFrames directly to InfluxDB.

Getting Started Using Scripts with InfluxDB

Using scripts with a time-series database helps developers streamline application development, scale workloads and build lean integrations. Time-series data is everywhere, and that reality isn’t going to change. The very nature of time-series data means that time-series workloads differ from a lot of other kinds of data. Given the prevalence of time-series data in our modern, connected world, it’s more important than ever to ensure that developers have tools to manage it.

Cribl's Fall Launch: Beyond the Pipeline

What's new in Cribl's Fall release? Stream 4.0: A UX refresh, new DB collector, and a Pipeline profiling capability for better visibility and reduced time to resolution. Cribl.Cloud 4.0: BYO IdP, cloud-hosted queueing for sources and destinations, and the ability to purchase a Cribl.Cloud subscription directly from the AWS Marketplace. Edge 4.0: The addition of fleet management, AppScope Edge integration, enhanced Kubernetes support, and the power to handle up to 15k Edge nodes for even more visibility, at scale.

Experience at 35,000 Feet w/ Derek Whisenhunt (Southwest Airlines)

This week we bring you another special “live from the road” episode of the DEX Show – as we sat down with Southwest Airlines’ Derek Whisenhunt ahead of his amazing talk at Experience Everywhere in New York City! If you’ve ever wondered what separates a best-in-class airline from the rest of the pack, this episode’s for you.

Grafana Agent 0.29.0 release: New OpenTelemetry components

Today the Grafana Agent team is excited to announce the release of Grafana Agent v0.29.0. This September, we introduced a new way to easily run and configure Grafana Agent called Grafana Agent Flow, our new dynamic configuration runtime built on components. Within Flow, we are also embracing Grafana Labs’ big tent philosophy by introducing OpenTelemetry (OTel) Collector components and converters for traces, metrics, and logs in Agent v0.29.0.

Touching Grass With SLOs

One of the things that struck me upon joining Honeycomb was the seemingly laissez-faire approach we took towards internal SLOs. From my own research (beginning with the classic SRE book, following Google’s example), I came to these conclusions: If you read the original SRE book when it was released, before the workbook came out, these conclusions all made sense.

The 3 Keys To A Successful Microsoft Teams Meeting Room Strategy

A Microsoft Teams rooms strategy might seem like more effort than it’s worth on the surface, but you’d be surprised how much impact it can make. With many businesses now mixing in-person and remote Teams meeting participation, planning and optimizing how it works just makes sense.

Diagnose Any Microsoft Teams Problem in 3 Clicks or Less!

In this video, we demonstrate how to diagnose Microsoft Teams problems with hop-by-hop analysis and insights in real time - All in three clicks or less. Whether your users work from home, the office, or anywhere in between, a superb call-quality experience is a must. How can IT operations staff ensure that? Using a combination of synthetics and real user monitoring (RUM), support teams can now get comprehensive visibility into Teams performance and use those insights for optimization.

"Managing OpenTelemetry Through the OpAMP Protocol" by Mike Kelly, observIQ

Managing thousands of data collection Agents across just as many servers can overwhelm DevOps teams. Open Agent Management Protocol (OpAMP) is a new network protocol from the OpenTelemetry Project that enables remote management of OpenTelemetry collectors, allowing them to report their status to and receive configuration from a Server and to receive agent package updates from the server. This eliminates the need to create new custom distributions and redeploy, drastically simplifying Agent management.

Network Documentation Best Practices: What to Create & Why

Everybody agrees network documentation is extremely important, but there tends not to be a lot of agreement on what that documentation should include. The short answer is that it should include everything that’s relevant—but what that means varies between networks. For example, in a really small network with one switch and a firewall, and perhaps a single wireless access point, there isn’t much to document. It might be enough to put everything in a single diagram.

How retailers are uncovering insights and driving more conversions this holiday season

This year, global ecommerce transactions are expected to grow by over 12% during the 2022 holiday season. As the industry continues to rely more on ecommerce, retailers are looking at new ways to improve customer experience and provide a safe and secure shopping journey. In an increasingly competitive space, many retailers are leveraging their data assets to accomplish more, spark innovation, create more personalized experiences, and drive higher conversion rates.

Mobile Cloud Computing: Overview, Challenges and Scope

The process of delivering mobile apps utilizing cloud technology is known as mobile cloud computing (MCC). Complex mobile apps today carry out activities including authentication, location-aware features and providing users with customized communication and content. As long as your device is online, mobile cloud computing enables you to store and access data anywhere. This makes it possible for data to be sent without difficulty anytime required.

How to Use Quarkus Live Coding in Docker

Here at LogicMonitor, we like to experiment and try new things. In fact, one of our core tenets is “better every day”. This means that we’re always hungry to learn new things, experiment, innovate, and take chances. One thing we’ve been experimenting with lately (and loving!) is the Quarkus framework. We love how lightweight it is and that it’s built from the ground up with Kubernetes in mind.

geeks+gurus: How Ulta Beauty digital services shine for the holidays

For many online retailers, the bulk of sales happen during the holiday season. It is critical everything goes off without a hitch. In this session, longtime digital services veteran Omar Koncobo, IT Director of Ecommerce/Digital and Marketing Systems at Ulta Beauty discusses his top tips for successful holidays learned from seasons past: When to start preparing for the holiday traffic spikes Lessons learned from Ulta on scalability When things go wrong — spotting problems and fixing them fast Managing costs and preparedness.

AWS Lambda Telemetry API: Enhanced Observability with Coralogix AWS Lambda Telemetry Exporter

AWS recently introduced a new Lambda Telemetry API giving users the ability to collect logs, metrics, and traces for analysis in AWS services like Cloudwatch or a third-party observability platform like Coralogix. It allows for a simplified and holistic collection of observability data by providing Lambda extensions access to additional events and information related to the Lambda platform.

Configuring Fargate custom application metrics in CloudWatch using Prometheus

Over the past few months, Helios has experienced rapid growth resulting in our user base increasing, our services multiplying, and our system ingesting more data. Like all tech companies that need to scale, we wanted to avoid our performance becoming sluggish over time.

Sumo Logic's investment in OTel

When teams collect data without full observability of what others on the team can see, it becomes clear that no one’s picture is truly accurate. In this picture, all of the people are wearing blindfolds and feeling around to see what is in front of them. One thinks this creature is a spear, another thinks it is a tree trunk, and another a rope. As long as they cannot observe what the others can, there is poor data fidelity.

Monitoring Windows 365 Cloud PCs

A new era is here, where cloud computing is taking a growing share of the digital workspace market. Microsoft are well placed to capture an increasing part of this market to deliver digital workspaces to end users to run their business applications. Using Azure as the umbrella we see new Microsoft services arise each year. Last year the Windows 365 service was born with a Cloud PC service offering both Business and Enterprise subscriptions.

How to Monitor MongoDB: Key Metrics to Measure for High Performance

Monitoring distributed systems like MongoDB is very important to ensure optimal performance and constant health. But even the best monitoring tool will not be efficient without fully understanding the metrics it gathers and presents, what they represent, how to interpret them, and what they affect. That’s why it is crucial not only to collect the metrics but also to understand them.

Understanding the Three Pillars of Observability: Logs, Metrics and Traces

Many people wonder what the difference is between monitoring vs. observability. While monitoring is simply watching a system, observability means truly understanding a system’s state. DevOps teams leverage observability to debug their applications or troubleshoot the root cause of system issues. Peak visibility is achieved by analyzing the three pillars of observability: Logs, metrics and traces.

How does GeoDNS work?

Latency is the key differentiator when it comes to application performance on the internet. Reduced latency accelerates the delivery of apps. DNS resolving is the first step towards application delivery and it involves a series of steps. Prior to Anycast DNS, the DNS servers responsible for resolving users’ DNS requests were sitting continents away from many users, contributing to the high latency and slow delivery of applications.

The importance of using the latest browser version for monitoring

Security is one word that is at the tip of everyone’s tongue when it comes to ensuring that web browsers are up to date. In August 2022, after zero day vulnerability CVE-2022-2852 was identified as a critical risk, Google issued a new set of Chrome security updates and tech-industry folks carried on with business as usual. In general, security vulnerabilities of all types are an all-too-common game of Whac-A-Mole. Once a security flaw is discovered and patched, another pops up to take its place.

Observability is Still Broken. Here are 6 Reasons Why.

In an era where there’s no shortage of established best practices and tools, engineering teams are consistently finding their ability to prevent, detect and resolve production issues is only getting harder. Why is this the case? Our most recent DevOps Pulse Survey highlighted alarming trends to this end.

Empower the SREs - Conclusions from The SRE Report 2023

Let's be honest, nobody loves surveys. Ok, well I sure don't. But surveys satisfy a huge need in our demand for insights into complex human-computer, sociotechnical systems. It turns out that we've been measuring the computer part pretty well, but the humans – not as easy to keep track of. When Google SRE first defined toil as a metric we wanted to reduce, we spent far too long trying to quantify it numerically based on tooling and insights from computer systems.

I've Made a Huge Mistake: Implementing Agile on Infrastructure Teams

Bad planning methods can damage team morale and prevent teams from improving the systems they maintain. In this talk, Sam Handler from Shopify explains how his attempts to fix poor infrastructure planning processes through Agile methods failed. Drawing from this experience, he offers several principles that can help infrastructure teams improve the way they work.

Scaling Up, One Network Bottleneck at a Time

Processing data at scale involves moving packets through a network—but what happens when that network isn't cooperative? Anatole Beuzon, a Software Engineer at Datadog, discusses how he investigated and resolved network issues in Datadog’s larger data-processing apps and how you can apply these same methods to your own production workloads.

Ask a Site Reliability Engineer (SRE)

Site reliability engineering (SRE) can be complicated, and at Datadog, we’ve spent a lot of time thinking about SRE and refining how we implement it. Join Datadog’s Brandon West and Rick Mangi as they provide a brief overview of SRE and its core concepts. This video also contains a Q&A session from the live taping of this panel.

FinOps and Cloud Cost Optimization

As companies scale, it’s become increasingly important to keep cloud cost management and optimization top of mind. In this talk, Yuval Yogev from Sygnia walks you through Sygnia’s optimization journey of cutting their total cloud costs in half. Yogev also shares insights into how you can optimize your own organization’s cloud usage and spend.

Deploying OpenTelemetry Organizationally: From Proof of Concept to In-Production at Scale

Observability involves telling a coherent story about an entire system. Over the years, video streaming service Pluto TV has had to navigate many storytellers in terms of observability vendors, tools, and formats before settling on OpenTelemetry to analyze and compare features across its many destination platforms. During this presentation, you'll see how Bharathi Ramachandran—Engineering Manager at Pluto TV—used OpenTelemetry to implement his initial proof of concept and get his entire organization shipping observability data at scale.

Changing Perspectives: A Deep Dive into the Security Posture of 600+ Real-World AWS Environments

Earlier this year, Datadog released the “State of AWS Security” study, which examined real-world data from more than 600 organizations and AWS accounts to understand the security posture of global AWS users who also leverage the Datadog Cloud Security Platform. Join Datadog’s Christophe Tafani-Dereeper and Andrew Krug as they explore some important insights from this study, such as the top ways organizations are breached on AWS and how tooling like Datadog Cloud Security Posture Management can help.

Auditing Your Automation's Access: Using More Automation

Between CI/CD pipelines, container orchestrators, and developer debugging tools, more and more automation is needed to scale your systems. But how do you know if that automation is accessing the right systems at the right time? And how do you ensure that your automation is safe from exploits by unauthorized users?

Develop and Deploy a Python API with Kubernetes and Docker

Docker is one of the most popular containerization technologies. It is a simple-to-use, developer-friendly tool, and has advantages over other similar technologies that make using it smooth and easy. Since its first open-source release in March 2013, Docker has gained attention from developers and ops engineers. According to Docker Inc., Docker users have downloaded over 105 billion containers and 'dockerized' 5.8 million containers on Docker Hub. The project has over 32K stars on Github.

Splunk Platform Demo

Splunk Platform enables you to search, analyze, and visualize your data across your technology landscape. Take a look at how Splunk offers an extensible data platform that powers unified security, full-stack observability, and limitless customer applications. This Splunk Platform overview and demo will show you how Splunk can help you make data transformations to further accelerate your cloud-driven initiatives.

Develop and Deploy a Python API with Kubernetes and Docker - part II

In part I of this tutorial, we developed a Python API then we used Docker and Docker Compose to containerize the application and create a development environment. In part II, we are going to discover some other details about Docker and Docker Compose as well as how to deploy the same app to a GKE cluster.

Prometheus vs. Zabbix

For a successful business, you need to introduce an effective monitoring system covering all areas of your business and infrastructure - servers, databases, services, overall traffic, and even revenue collected. The users of this monitoring system can be system administrators, software engineers, information engineers, as well as all sorts of analysts.

Time Series Forecasting with PyTorch and InfluxDB

Time series data (also known as time-stamped data) refers to a collection of observations (data points) measured over time. When plotted on a graph, one of the axes for this type of data will always be time. Because time is part of every observable entity, time series data can be used in all kinds of industries, like the stock market, weather data, logs, and traces.

Are you a network observability champion?

At Kentik, we pride ourselves as innovators and thought-leaders for network observability. “Kentik is network observability” is more than a slogan for us. It’s an idea that informs our product roadmap and guides our problem-solving with customers. We’ve done a lot to explain network observability to prospects.

The Hidden Cost of Overlogging

Logging is usually an afterthought. When we're creating a feature, we often neglect to think about how we'll observe the actual behavior, and even if we do end up adding proper logging messages - we rarely consider the implications those log lines will have on our bottom line. It's a hidden cost factor, but one that continuously comes up in conversations we have with practitioners. To help shed a light on this topic we gathered a panel of industry veterans, who will come together on November 8th to discuss the problem in depth and offer a completely new approach to solve it.

How Is Uptime Calculated?

Any modern organization depends heavily on the health of its network and servers. If a server goes down, it can seriously impact a business’s ability to provide services for clients and customers to get work done. If network admins don’t know a server went down, the problem could quickly worsen. No one may realize there is a problem until the support lines are loaded with calls, and everyone needs to scramble first to find the issue and then fix it.

Create AWS Cloudwatch metric alerts with Lumigo

Amazon CloudWatch monitors metrics of your Amazon Web Services (AWS) resources in real time and can trigger alarms when a metric goes above or below certain thresholds. Typically, Amazon CloudWatch sends out alarms by posting a message to an SNS (Amazon Simple Notification Service) topic, which distributes the message via several mediums, including email, SMS, and Lambda functions. Setting a CloudWatch alarm can be complex.

Top 5 reasons to monitor Citrix PVS with GripMatix PVS SCOM Management Pack

GripMatix SCOM Management Pack solution for monitoring Citrix Virtual Apps and Desktops and Citrix DaaS, MetrixInsight for Citrix VAD/DaaS, includes comprehensive monitoring for Citrix Provisioning Service, also known as Citrix PVS. Check out our top 5 reasons to monitor this advanced Citrix software-streaming technology with GripMatix PVS SCOM Management Pack, to increase your Business Continuity and Performance.

What is AIOps (Artificial Intelligence for IT Operations)? AIOps Use Cases

The volume of data that IT systems generate nowadays is overwhelming, and without intelligent monitoring and analysis tools, it can result in missed opportunities, alerts, and expensive downtime. However, with the advent of Machine Learning and Big Data, a new category of IT operations tool has emerged called AIOps. AIOps can be defined as the practical application of Artificial Intelligence to augment, support, and automate IT processes.

How Major League Baseball Scales Kubernetes Monitoring

Millions of global baseball fans tuned into the World Series last week, and we at Circonus were proud to help our customer, Major League Baseball, ensure they provided those fans with seamless viewing experiences. To celebrate our partnership, we’re rolling the replay on how MLB overcame Kubernetes observability challenges with Circonus as the league quickly scaled its Kubernetes deployment.

Observability Data Documentation Best Practices

A few weeks back, I got the chance to sit down with our very own Jordan Perks from the Cribl Customer Success Team. Jordan is an Observability subject matter expert AND knows a thing or two about Cribl Products! After geeking out a bit about data best practices, we started chatting about enabling our customer champions to have different conversations with stakeholders across their organizations. When someone becomes an observability engineer, they step into a much different role.

10,000+ GitHub stars, Enterprise edition, and Performance Benchmarks - SigNal 18

Starting this one off on a different chord! Startups are challenging yet fulfilling. There is always a long road ahead - full of challenges and uncertainties. Yet, sometimes we hit a milestone and feel nostalgic about the journey completed. 10,000+ GitHub stargazers seemed like a distant dream, but finally, it’s here. Moments like these are special to the team. We are grateful to the developer community for their continued support since our inception.

How to monitor Windows logs with the updated Windows integration for Grafana Cloud

As we all know, Windows is one of the most popular operating systems in the world. It has a dominant share in the desktop computer market, with more than 70% of the machines running the operating system. It makes sense, then, that the Windows integration is also one of the most used and popular integrations in Grafana Cloud.

What Can OpenTelemetry Distributed Tracing Architecture Do for Frontend Developers?

When developers talk about the options OpenTelemetry opens up to them, one of the most powerful use cases is troubleshooting distributed architectures. With OTel data and insights, developers can identify bugs and solve a wide range of issues across various types of architecture and flows. These include asynchronous flows, flows with Lambda functions, and many more.

10 things you should know about using AWS S3

Almost everyone who’s used Amazon Web Services (AWS) has used Amazon simple storage service (S3). In the decade since it was first released, S3 storage has become essential to thousands of companies for file storage. While using S3 in simple ways is easy, at a larger scale it involves a lot of subtleties and potentially costly mistakes, especially when your data or team are scaling up. Here are the most important things about AWS S3 that will help you avoid costly mistakes.

How to decide on self-hosted vs managed Apache Airflow

Apache Airflow is an open-source orchestration platform that enables the development, scheduling and monitoring of tasks using directed acyclic graphs (DAGs). Here at Sumo, our team has been using this technology for several years to manage various jobs relating to our organization’s Global Intelligence, Global Confidence, and other data science initiatives. We self-host the tool through Kubernetes.

Australia's Victoria University optimizes student learning experience using OpManager

Victoria University is a world-renowned educational institution based in Melbourne, Australia. The public research university has several campuses comprised of six academic colleges, six research institutes, seven research centers, and a polytechnic institution. It provides courses in higher education and technical training. The IT staff at Victoria University chose OpManager to monitor its IT infrastructure and enhance its network performance.

What Is Observability?

In today's complex, multi-cloud environments, IT and engineering teams are under increasing pressure to respond to errors affecting their entire system. Therefore, IT operations, DevOps, and SRE teams are all striving to gain complete observability across these increasingly complex and diverse computing environments. But what exactly does observability mean?

Digital Immunity and Internet Resilience: On Point in Orlando

The world is getting back to normal… I attended my first conference after three years of almost no travel and Gartner did not disappoint. The Gartner IT Symposium//Xpo™* drew thousands of attendees to Orlando, Florida from around the globe and I think everyone was as excited (and - a bit - nervous) as I was to be back together again. As Gartner does annually, they identified Top Strategic Technology Trends for 2023.

How to Reduce Telegraf Binary Size with a Customized Telegraf Agent

Is Telegraf too big for your device? Too many plugins in one binary getting you down? Let me introduce you to the Telegraf custom builder – the new tool for reducing Telegraf’s overall memory and disk footprint. In this blog, we will discuss the “what, why, when” and also how to use the new custom builder.

An Interview with Melissa Person, IT Leader at PVH

In today’s business climate, innovation is critical to business success, and IT leaders are pressed to consistently innovate at a pace that the business has come to expect. LogicMonitor is thrilled to feature a Q&A with Melissa Person, the Global Vice President of IT Infrastructure and Operations for PVH, the parent company of iconic fashion brands such as Calvin Klein, Tommy Hilfiger, Van Heusen, and others.

Grafana Labs Writers' Toolkit: This is the way

At Grafana Labs, we understand that clear, informative technical documentation is critical to users’ success, whether they’re just getting started or trying to quickly troubleshoot an issue. That’s why the Documentation and Technical Writing Team at Grafana Labs is pleased to announce the launch of our very first, very own writers’ toolkit.

Are You Getting Everything You Can From Your ITIM and ITSM Integration?

IT Service Management (ITSM) tools are for many organizations the lifeblood of the help desk and possibly the entire IT department. Some would argue these tools are the lifeblood of the entire organization, as well. Many IT departments live and die by their IT Service Management tools, using them to track everything from support tickets to change control requests, provisioning and de-provisioning of resources and more.

Microsoft Teams Troubleshooting Performance and Connection Issues

How many times has this happened? You’re on a Microsoft Teams call, and your call disconnects, lags or freezes - so you go to Google to look up how to solve the problem. Well look no further! We’re teaching you the how to’s of Microsoft Teams troubleshooting to solve common Microsoft Teams performance issues and connection issues using Network Monitoring!

Turn-Key Infrastructure and Application Monitoring

The way businesses obtain infrastructure has changed dramatically over the past decade, as Infrastructure-as-a-Service (IaaS) has taken the place of self-hosted infrastructure for most IT deployments. At the same time, it has become common to build complex infrastructures that blend components from multiple providers – such as two or more public clouds (aka. multicloud infrastructure) or mixing an on-prem data center and a public cloud (aka. hybrid cloud infrastructure).

Rule sitting on your throne: New Update Manager

Pandora FMS allows - since the Open Source version - weekly semi-automatic updates. For this, it is based on an improvement of the Update Manager system, which was previously only available for the Enterprise version; in this way, you can check online if there is an update and, on request, download it and automatically update your Console, in a comfortable and automatic way.

How Sentry uncovered an N+1 issue in djangoproject.com

Sentry recently launched Performance Issues, a feature to help developers discover and fix common performance problems in their projects. We tested this project internally and with alpha users, so when we finally turned it on for all Sentry users, we were delighted (and dismayed) to hear from Carlton Gibson, current Django fellow and great human, that Sentry had.

Hey! Let's talk AIOps!

As the rapid digital transformation has put a lot of pressure on IT organizations to be more proactive and agile, DevOps principles and practices have been an invaluable resource. However, to remain at the top of the game, organizations need an even stronger solution. So, what’s the answer? AIOPs (artificial intelligence for IT operations), of course!

observIQ Announces Enterprise Edition of Open Source Observability Pipeline BindPlane OP

Continuing its commitment to open source observability, observIQ announces the enterprise edition of BindPlane OP. BindPlane OP provides the ability to control observability costs and simplify the management of telemetry agents at scale while avoiding vendor lock-in.

How to mute alerts during maintenance windows or scheduled backups?

The health management APIs in Netdata allows teams to eliminate unnecessary alerting during scheduled maintenance, testing, auto scaling events, and instance reboots. For all SREs, it is absolutely crucial to filter out expected events during maintenance windows and quickly pinpoint critical issues in your infrastructure. Every minute is crucial while dealing with troubleshooting issues and any distractions that may hijack the troubleshooting process should be subdued.

Modern Canadian MSSP drives next-gen MDR with Logz.io and Tines

Today’s Managed Security Service Providers (MSSPs) are trying to grow their business quickly, improving margins and onboarding customers with high-quality tool sets that scale with the business. This means reducing cost, improving onboarding time and building the next generation of Managed Detection and Response (MDR) to deal with threats that are increasing in volume and sophistication.

Monitoring Cloud Database Costs with OpenTelemetry and Honeycomb

In the last few years, the usage of databases that charge by request, query, or insert—rather than by provisioned compute infrastructure (e.g., CPU, RAM, etc.)—has grown significantly. They’re popular for a lot of the same reasons that serverless compute functions are, as the cost will scale with your usage. No one is using your site? No problem: you’re not charged.

Performance Monitoring for AWS NICE DCV VDI and Cloud Protocol

In today’s article, I will be highlighting eG Enterprise’s monitoring capabilities for Amazon’s AWS NICE DCV VDI protocol that was used first in Amazon’s AppStream 2.0 and now subsequently also in WSP 2.0 for the Amazon WorkSpaces service for digital workspaces.

InfluxDays Recap - Paul Dix and the Journey of InfluxDB

According to the old adage, life’s a journey not a destination. The same can be said for software. It’s unlikely that any developer would ever say that something they built was truly done. There are always bugs to squash, features to add, and updates to implement. As a company intensely focused on time and the context of time, it comes as little surprise that these themes played a significant role in Paul Dix’s presentation for InfluxDays.

What to Expect from Flux 1.0

This week at InfluxDays we announced that Flux 1.0 is coming soon. Version 1.0 of Flux lang is a commitment to no longer make breaking changes to the Flux language. Importantly, today’s Flux scripts will work on Flux 1.0, and no breaking changes will be introduced between now and the release of Flux 1.0. Along with version 1.0, we have some features we are also releasing soon. Here are the features we have coming and a short explanation of why you might want to leverage them.

Opentelemetry vs. Prometheus

OpenTelemetry and Prometheus are classified as monitoring tools, but they also have significant differences that your company should know about. For cloud-native applications, OpenTelemetry is the future of instrumentation. It’s the first critical step that allows companies to monitor and improve application performance. OpenTelemetry also supports multiple programming languages and technologies.

How to Index and Process JSON Data for Hassle-free Business Insights

If your IT department is generating a tsunami of JSON-based log and event data, ChaosSearch® JSON Flex® can fast-track automatic, flexible indexing for custom insights of your valuable business data. JavaScript Object Notation (JSON) has become the de facto standard for log and event data created by business applications and services. The easy-to-read, semi-structured format can hold a wealth of information and statistics.

Sematext Updates Review Ep. 1 | Features and Product Updates Overview

Sematext has undergone some amazing updates this year. In this episode of Sematext Update Review (SUR), we will look at these features in detail and how they can help you monitor your full stack. Sematext is an all-in-one monitoring solution for your IT stack. Monitor Logs, Infrastructure, synthetic runs, or real user experiences all in one place.

Top Highlights from Experience Everywhere: New York!

This year’s Experience Everywhere is in the books – and just as we expected, the New York crowd helped us end this unforgettable four-city tour on a high note. We knew we had to bring something special for our big return to a live conference, and our incredible attendees and a star-studded lineup of speakers helped us do just that.

How to correlate performance testing and distributed tracing to proactively improve reliability

At ObservabilityCON, we announced our first step towards launching a native integration between Grafana k6 load testing and Grafana Tempo tracing (k6 x Tempo) in Grafana Cloud. We created k6 x Tempo to help dev, testing, and operation teams analyze their performance test results more effectively and proactively improve the reliability of their business-critical applications.

New GKE dashboards and metrics provide deeper visibility into your environment

Google Kubernetes Engine (GKE) is a managed Kubernetes service that enables users to deploy and orchestrate containerized applications on Google’s infrastructure. Datadog’s GKE integration, when paired with our Kubernetes integration, has always provided deep visibility into the health and performance of your clusters at the node, pod, container, and application levels.

OpenTelemetry, Auto-Instrumentation and Splunk Observability Cloud: A Jump Start

Have you been meaning to learn about OpenTelemetry and the integration of all available application and service telemetry? If you like to learn things by doing; get ready to dive in and have some fun with OpenTelemetry and Splunk Observability Cloud. Quickly learn more about OpenTelemetry auto-instrumentation and collectors at your own pace with these walkthroughs and guides.

AIOps (artificial intelligence for IT operations)

Artificial intelligence for IT operations (AIOps) is an umbrella term for the use of big data analytics, machine learning (ML) and other artificial intelligence (AI) technologies to automate the identification and resolution of common IT issues. The systems, services and applications in a large enterprise produce immense volumes of log and performance data. AIOps uses this data to monitor assets and gain visibility into dependencies within and outside of IT systems.

Platform Engineering: DevOps Evolution or a Fancy Re-name?

Everyone’s talking about Platform Engineering these days. Even Gartner recently featured it in its Hype Cycle for Software Engineering 2022. But what is Platform Engineering really about? Is it the next stage in the evolution of DevOps? Is it just a fancy rebrand for DevOps or SRE? As a veteran of the PaaS (Platform as a Service) discipline about a decade ago, and a DevOps enthusiast at present, I decided to delve into this topic, peel off the hype, and see what it’s about in practice.

SAP system refresh automation

SAP system refresh automation is extremely powerful when leveraged with care; system refreshes are complex and challenging processes to manage. System refreshes can be fraught with risk for organizations with critical data due to their level of complexity. Mitigating this risk comes down to knowing the benefits of automation and how the processes work. This article will help you: To try out Avantra's SAP automation features, sign up for a free trial.

Announcing Grafana Phlare, the open source database for continuous profiling at massive scale

At ObservabilityCON in New York City today, we announced a new open source backend for continuous profiling data: Grafana Phlare. We are excited to share this horizontally scalable, highly available database with the open source community — along with a new flame graph panel for visualizing profiling data in Grafana — to help you use continuous profiling to understand your application performance and optimize your infrastructure spend.

ObservabilityCON 2022: A guide to new OSS projects, LGTM stack updates, and more from Grafana Labs

ObservabilityCON 2022 is taking place today with a host of exciting announcements, from new OSS projects, partnerships, and integrations to the latest easy-to-use features in the Grafana LGTM stack. “As an OSS company, we prioritize interoperability. The big tent is at the heart of everything we do, and Grafana is at the heart of the wider ecosystem,” says Grafana Labs Co-founder and CEO Raj Dutt.

Introducing Grafana Faro, an open source project for frontend application observability

Today, during the ObservabilityCon 2022 keynote session, we announced a new open source project for frontend application observability, Grafana Faro. The project is launching with a highly configurable web SDK that instruments web applications to capture observability signals. This frontend telemetry can then be correlated with backend and infrastructure data for seamless, full-stack observability. There’s supposed to be a video here, but for some reason there isn’t.

True Network Monitoring ROI Means Upgrading Your NetOps Strategies

Running network operations today is more complicated than ever before. Networks are growing in size and complexity due to modern network architecture adoption, but organizations are not hiring more staff to compensate for this increased complexity and demand. The prevailing mentality seems to be to do more with what you have.

InfluxData Announces New Platform Enhancements at InfluxDays 2022

SAN FRANCISCO, November 2, 2022 – Today, InfluxData, creator of the leading time series platform InfluxDB, announced significant product enhancements at InfluxDays 2022, its annual developer and community event. New features including InfluxDB Script Editor, Telegraf Custom Builder, and Flux 1.0 support developers working with time series data, allowing them to do more with less code.

More Capabilities, Less Code: Announcing Platform New Features at InfluxDays 2022

The InfluxDB platform has evolved a lot over the past decade. But with every innovation we’ve added to the platform, the focus behind our efforts has remained the same: Build cool stuff for people who build cool stuff. What we mean by this is we want to make it incredibly easy for users to build valuable applications with their time series data. We do that by offering a wide range of tools, features, and resources that meet builders on their terms.

Low-Code/No-Code: The Past & Future King of Application Development

Business organizations that want to save money and be competitive take into consideration the time costs associated with investments in new technologies. Will the efficiency gains translate to a rapid return on investment? Will users embrace the change and be more productive? Or will those investments be a hassle to employees and result in time-wasting workarounds and a fallback to inefficient, manual processes?

Distributed Tracing: Build vs. Buy

With serverless and containerized applications becoming a norm, workloads and integrations are spread across multiple cloud environments. As these apps become increasingly more distributed, monitoring also becomes more complicated with siloed and incomplete telemetry. This is where distributed tracing brings great value. It enables end-to-end visibility in your modern and complex application.

Grafana ObservabilityCON 2022 Keynote

Live from New York, it’s Grafana ObservabilityCON! Join CEO/Co-founder Raj Dutt, VP of Technology Tom Wilkie, and members of the Grafana Labs engineering team for a look at the latest developments in the open and composable LGTM (Loki-Grafana-Tempo-Mimir) observability stack. Spoiler alert: There will be some exciting announcements!

Introduction to continuous profiling

In this video, we will review the advantages of continuous profiling, the difference between continuous profiling vs. traditional profiling, and how continuous profiling can fit into your overall observability strategy. You’ll also learn more about Grafana Phlare, the new open source continuous profiling database, and get a look at the latest flame graph visualization in Grafana.

Monitoring MongoDB performance metrics (WiredTiger)

This post is part 1 of a 3-part series about monitoring MongoDB performance with the WiredTiger storage engine. Part 2 explains the different ways to collect MongoDB metrics, and Part 3 details how to monitor its performance with Datadog. If you are using the MMAPv1 storage engine, visit the companion article “Monitoring MongoDB performance metrics (MMAP)”.

Nastel XRay Release 1.5

Nastel XRay 1.5 release builds on industry analyst acclaim for leading AIOps & transaction observability vendor. Nastel Technologies, the leader in integration infrastructure management (i2M) solutions, announced today significant enhancements to its versatile AIOps and Transaction Observability solution, including machine learning for integration management, and visualization of business flows and IoT locations.

Top 8 Open Source Dashboards

Before exploring open-source dashboard tools, we first need to learn about Dashboards and how they can be useful. A dashboard is a data visualization and management tool that visually tracks and analyzes the Key Performance Indicators (KPIs), business analytics metrics, infrastructure health and status, and data points for an organization, team, or process. It can be used to present operational and analytical business data with interactive data visualizations to your team.

Pandora's Flask: Monitoring a Python web app with Prometheus

We eat lots of our own dog food at MetricFire, monitoring our services with a dedicated cluster running the same software. This has worked out really well for us over the years: as our own customer, we quickly spot issues in our various ingestion, storage, and rendering services. It also drives the service status transparency our customers love. Our customers include large multinational coffee brewers, game companies, and other data science/SaaS companies.

Recapping Our Inaugural SolarWinds Day Event

Our inaugural SolarWinds Day event was a smashing success! From the announcement of our SolarWinds® Observability solution—which was built fully in the cloud—to important updates to our on-premises SolarWinds® Hybrid Cloud Observability solution, this was our biggest day of product launches since the founding of SolarWinds. It was exciting to be a part of the event and to see so many people participate and engage in the discussion.

User Experience for Observability

Modern software applications involve multiple layers of code and services, working together to meet increasingly demanding user requirements. To achieve this, systems became distributed, providing improved scalability, fault tolerance, and complexity. However, this innovation brought new challenges to basic troubleshooting and performance monitoring to maintain the health of systems. It’s for these reasons that observability is trending.

High Five! Splunk Honored With Five TrustRadius Best Software Awards

Customers have spoken, and we’re feeling the love. Splunk has just been honored with no fewer than five “Best Software” Awards from TrustRadius! Based exclusively on customer reviews, Splunk Enterprise Security (ES) took home the top spot in three categories: Best Software for Enterprise, Best Software for Mid-Sized Businesses, and Best Software for Small Businesses.

A New Era of Sentry

Today we are releasing Dynamic Sampling, available to all new customers, and opt-in for existing customers. This goes beyond a new feature however and is an overhaul to the way we package Sentry’s Performance Monitoring product. We are saying goodbye to the days of static, magic number sampling configured within the SDK and moving to a world of flexibility.

What is Jaeger Distributed Tracing?

Distributed tracing is the ability to follow a request through a software system from beginning to end. While that may sound trivial, a single request can easily spawn multiple child requests to different microservices with modern distributed architectures. These, in turn, trigger further sub-requests, resulting in a complex web of transactions to service a single originating request.

Sponsored Post

Are Prometheus & Grafana Sufficient To Support Modern IT?

When discussing Prometheus and Grafana, our VP of Service Delivery said to me, "We can say whatever you want on our website or on a blog post but what REALLY makes a difference in terms of $$$ is DOING it in the field. Applying this in the field with real customers is where everything gets real. My customer was going full tilt to build a project for testing Prometheus, Grafana, ELK and Splunk for leveraging data intelligence until we stopped them. We told them unabashedly: 'Gents, we're sorry but that's just a messy strategy. You should be using XRay for that.' Sometimes you just have to go balls out with a customer ... well, they listened, and we delivered."

IBM MQ Prevents Message Duplication or Loss

A digital platform may have billions of messages flowing through it each day, with real-time updates considered the standard by customers and enterprises. Ensuring that messages aren’t duplicated or lost in the process is an arduous task, and one that is the focus of IBM MQ, an enterprise-grade messaging solution that has been on the market for over 25 years.

Sponsored Post

What Is MLOps? Machine Learning Operations and Its Role in Technology Transformation

Across all industries, businesses are investing in applications and services powered by artificial intelligence (AI) and machine learning (ML) to boost productivity and gain a competitive advantage.

Sponsored Post

JavaScript unit testing frameworks in 2022: A comparison

Choosing a JavaScript unit testing framework is an essential early step for any new front-end development project. Unit tests are great for peace of mind and reducing software errors. You should always make the time to test. But which framework should you choose for your project? We examined 11 of the most popular JavaScript unit testing frameworks according to stateofjs.com, to help you decide which is best for you.

Sponsored Post

Microsoft 365 Monitoring for CIOs: Challenges and Solution

Microsoft Office 365 monitoring for the entire company is a big achievement. When the pandemic took hold worldwide, the once "nice-to-have" applications became the "go-to" apps. During this time, CIOs became responsible for relocating their employees to a remote workplace and providing reliable end-user services, such as Microsoft Teams, Outlook, SharePoint, and more. But even as the pandemic subsides, hybrid work gains prominence, and events like the Microsoft Ignite become in-person, monitoring the end-to-end connectivity and network performance of Microsoft 365 still remains a high strategic priority for CIOs.

How to monitor Nginx

Are you interested in learning how to monitor Nginx? In this post, we'll show you all about how Nginx works and how you can use Hosted Graphite to monitor it. First, we'll read what Nginx monitoring is all about and how it can together work with Prometheus. Nginx, pronounced like “engine-ex”, is an open-source web server that, since its initial success as a web server, is now also used as a reverse proxy, HTTP cache, and load balancer.

10 Best Open Source Switch Port Monitoring Tools

Switch port monitoring is one of the most crucial facets of network management. It not only provides insights into network switch port status but CPU load, memory utilization, historical port utilization, and more. Investing in switch port monitoring improves network-related performance across your organization and optimizes port usage. As a result, you'll enhance security, reduce cybercrime, optimize networks, enhance compliance, and safeguard your entire IT infrastructure.

9 Best Open Source Network Monitoring Tools

Network monitoring is a critical component of your network management strategy that provides valuable insights into network-related problems which can affect your organization. When you monitor networks regularly, you'll mitigate risks like overloaded networks, router problems, downtime, cybercrime, and data loss. Network monitoring lets you: All successful companies invest in network monitoring tools that provide accurate insights into performance, speed, security, and productivity.

Integrating Heroku Metrics with Amazon CloudWatch Metrics

Application monitoring plays a critical role in the success of your digital products. As you monitor various performance metrics such as usage of CPU, memory, network traffic, and more, you can swiftly take pre-emptive actions before things develop into a larger problem. In spite of the importance of monitoring, the task can become challenging when your infrastructure exists across multiple cloud platforms including AWS and Heroku.

DX UIM 20.4 CU5: What's New and Why Upgrade

For DX Unified Infrastructure Management (DX UIM) customers, there are significant benefits to be gained by staying current with the latest releases—and that’s especially true now. The latest version, release 20.4 cumulative update (CU) 5, offers a significant number of enhancements and new capabilities. This new release provides teams with a number of advantages, including increased flexibility, improved operational efficiency, and enhanced insights.

Visualizing GraphQL Traces in Microservices

One of the things that most excites me about what we at Helios are doing differently than anyone else is trace visualizations. While there are many ways to troubleshoot microservice architectures, a good visual overview goes a really long way to speeding up understanding and therefore accelerating time to a resolution. When your manager asks, “Why did that break down?” with Helios you can answer quickly with accurate data—this is the value of the Helios platform.

OpsRamp Patch 2.0 - Solving Your OS Patching Challenges

OpsRamp’s Operating System Patch Management module is a flexible, yet powerful capability provided to all OpsRamp platform customers or licensed separately. With our SaaS-based OS Patching solution for Windows and Linux endpoints, you can automate the entire patch management process from identification of missing OS patches to the process of patch installation.

SolarWinds Day October 2022 - Observability for All

To kick off our first SolarWinds Day, we are pleased to announce the launch of SolarWinds Observability, our new cloud-native SaaS offering. It combines application, infrastructure, database, network, digital experience, and log analysis into a single, integrated platform. This full-stack approach provides a centralized view of your IT infrastructure and services and delivers powerful functionality to help businesses and organizations of any size maximize their time and resources.

Saving your team from alert fatigue

It's a story as old as the web itself: someone on your team gets excited to install a new tool. The tool promises to finally give you a clear view into the problems your users have with your product. Your team agrees to give it a go. The errors start coming... ...and they don't stop coming... Soon enough, most of your team has either created an email filter to manage all the alerts, or has unsubscribed themselves entirely. Just like all the other tools. Welcome to alert fatigue.

Interlink Software Achieves Cyber Essentials Certification

Cyber Essentials is a UK government backed scheme, developed by the National Cyber Security Centre. Since its inception the scheme has become the benchmark for IT security, helping organizations to deploy technical controls to guard against the common types of cyber-attacks and improve data security.

October Monthly Product Update - InfluxDB New Engine and More!

We love to write and ship code to help developers bring their ideas and projects to life. That’s why we’re constantly working on improving our product in sync with developer needs to ensure their happiness and accelerate Time To Awesome. This month is very special. We now have a new engine that significantly increases the “horsepower and torque” for InfluxDB.

Faster MQTT Data Collection with InfluxDB

Native MQTT eliminates the need to write custom code, orchestrate additional technology layers or incorporate additional hosting services. MQTT is a powerhouse within the Internet of Things (IoT) space. Its pub/sub model and lack of defined payload structure make it infinitely adaptable to the needs of modern sensors, devices and systems. IoT data is also time-series data.

Announcing general availability of Elastic APM .NET agent profiler auto-instrumentation

A few months back, we introduced the beta release of Elastic APM.NET agent profiler auto-instrumentation. Fast forward to today, we're excited to announce the general availability (GA) of this powerful capability that allows the.NET APM agent to automatically instrument.NET Framework, .NET Core, and.NET applications without requiring code changes or recompilation.

Current DevOps Problems & How Scout APM Solves Them

Most software companies rely on DevOps at some scale to aid their software development and deployment processes. DevOps has recently seen a major increase in popularity due to the advent of cloud-based tools and automation possibilities. DevOps can help you completely forget the woes of deploying software and focus better on building better apps and providing a holistic experience for your end user. However, just like other things in tech, DevOps is not perfect.

Nexthink Infinity: Automation & Remediation Overview

Learn how your IT team can automate and fix anything with Nexthink Infinity Automation and Remediation. Your IT team will find issues faster by retrieving unlimited data, deliver instant fixes behind the scenes with no employee disruption, and send targeted self-help notifications to speed resolutions. Deliver targeted self-help and automated fixes to speed resolution today!

The SolarWinds Platform

The SolarWinds Platform is designed to connect with your critical business services, to provide flexibility, visibility, and control—wherever your environment lives and wherever you’re going next. It’s the simplicity you expect from SolarWinds, with deployment models to support you today and tomorrow, from on-premises to cloud-native SaaS solutions.

Set up Performance Monitoring With Tonal

We’re drowning in dashboards with no clues or clear steps to help us take action on our app’s performance. But as our eyes glaze over, we’re missing bugs and slowdowns sometimes weeks too late. Join our interactive co-working session with Max Lapides, Senior Manager of Mobile Software Engineering at Tonal, and learn how to customize performance monitoring in real time with our latest product updates.

Watch Grafana Labs CEO, Co-founder Raj Dutt discuss why companies need observability

Grafana Labs CEO and Co-founder Raj Dutt sat down with “NYSE Floor Talk” ahead of ObservabilityCON to discuss why companies are increasingly focused on observability as a means to improve customer satisfaction. In his conversation with Judy Khan Shaw, host of “NYSE Floor Talk,” Dutt also talked about Grafana Labs’ big tent philosophy and the growth of Grafana Labs and the Grafana open source community.

Kentik Market Intelligence just increased its IQ - introducing KMI Insights!

Early this year we launched Kentik Market Intelligence (KMI). If you missed it, KMI enumerates transit and peering relationships as well as produces rankings based on the volume of IP space transited by ASes in different geographies. Using tables and charts, KMI offers a global view of the internet out-of-the-box without any configuration or setup. KMI uses public BGP routing data to rank ASes based on their advertised IP space.

Managing your Kubernetes cluster with Elastic Observability

As an operations engineer (SRE, IT manager, DevOps), you’re always struggling with how to manage technology and data sprawl. Kubernetes is becoming increasingly pervasive and a majority of these deployments will be in Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), or Azure Kubernetes Service (AKS). Some of you may be on a single cloud while others will have the added burden of managing clusters on multiple Kubernetes cloud services.

Employee Engagement: Product Overview

Learn how Nexthink Infinity bridges the gap with contextual two-way communications, cutting through the digital workplace noise with attention-grabbing notifications that employees respond to. Don’t waste time chasing down employees who ignore emails. Instead, provide critical information, share a survey, or help them fix issues automatically to improve their experience and productivity with Nexthink Employee Engagement.

How to Assemble the Ultimate Network Management Toolset

Enterprise Management Associates (EMA) research determined that network infrastructure teams are challenged to monitor infrastructures that include hybrid public and private clouds. Their findings are that teams need to be more strategic in building their network management toolset. Based on this latest research from EMA, this white paper offers a guide to building the ultimate network management toolset.

What is Network Log Management?

It's no secret: if you manage a network, every device connected generates a log. These logs tell an extremely important story about events that have happened on your network and are a vital part of helping you easily understand network activities, user actions and more. With WhatsUp Gold's Log Management you can monitor, filter, search, and alert on logs for every device in your network while also watching for meta trends like log volume changes.