Operations | Monitoring | ITSM | DevOps | Cloud

April 2022

A practical approach to Active Directory Domain Services, Part 4: AD groups and OUs

Active Directory (AD) objects are rarely managed as standalone entities. In Part 3 of this series, we covered practical exercises for creating and managing two of the most critical AD objects, namely users and computers, after setting up a laboratory AD environment on virtual machines. To manage AD effectively, knowledge and practical experience with AD groups and organization units (OUs) is imperative. In this fourth part of our series, we’ll elaborate on this.

SAP cloud integration - A definitive guide

Deciding between SAP cloud integration methods can be confusing. There are so many options out there, with each one offering different benefits. In this article, we’ll discuss the benefits and drawbacks to a SAP cloud on Rise integration methods. We’ll also cover the steps you’ll need to take to choose the method that’s right for your organization. Take An in depth look at SAP cloud on Rise integration, what it involves, and how you can begin planning your own path.

Fixing SCOM's blind spots

Since its origins as MOM 2000, SCOM has been widely regarded for being awesome at two things: firstly, being the hands-down best solution for monitoring Windows Server environments (it is Microsoft’s monitoring tool, after all); and secondly, being an extensible monitoring platform that can act as a single pane of glass across your data center environment.

Now in Beta: Better Uptime Integration

StatusGator has a wide a variety of use cases: from education to help desk to IT and managed services and DevOps, too. All corners of an organization depend on cloud services and StatusGator gives you visibility into the status of all of your vendors. We’ve heard over and over from our DevOps users that alerts and notifications for their teams are already centralized into a single incident management platform such as OpsGenie, PagerDuty, or FireHydrant.

Ask Miss O11y: When should you delete instrumentation?

When do you delete instrumentation? You delete instrumentation when you delete code. Other than that, if you’re doing things right: almost never. One of the best things about honeycomb is that it completely transforms the incentives around preserving instrumentation. With metrics-based tools, the most valuable metrics are always custom metrics. You need to define a custom metric for literally any question you might ever want to ask about the app and its utilization or performance.

The AIOps Journey by NN Bank: Driving Business Performance With Observability in Financial Services

Too often IT sees the impact of issues on infrastructure and eventually determines the causes, but has a hard time figuring out the relationship between them. Most enterprise organizations have some form of monitoring in place but are drowning in data from the systems in use and not getting the visibility they need to understand what is going on.

How to Ensure Microsoft Teams Meeting Reliability

Dependence on Microsoft Teams has never been greater, and the pressure is on for IT teams to deliver exceptional user experiences – anytime, anywhere. In this series of videos, we explore the most common Microsoft Teams performance use cases that our customers managed with our help. Our Microsoft Teams monitoring solution enables the IT to prioritize and resolve problems to optimize the performance of Microsoft Teams, Microsoft Office 365 and business applications delivered to the business lines and customers.

How to Limit the Impact of Microsoft 365 and Microsoft Teams Outages

Dependence on Microsoft Teams has never been greater, and the pressure is on for IT teams to deliver exceptional user experiences – anytime, anywhere. In this series of videos, we explore the most common Microsoft Teams performance use cases that our customers managed with our help. Our Microsoft Teams monitoring solution enables the IT to prioritize and resolve problems to optimize the performance of Microsoft Teams, Microsoft Office 365 and business applications delivered to the business lines and customers.

How to Ensure Microsoft Teams PSTN Connectivity and Monitor its Service Quality

Dependence on Microsoft Teams has never been greater, and the pressure is on for IT teams to deliver exceptional user experiences – anytime, anywhere. In this series of videos, we explore the most common Microsoft Teams performance use cases that our customers managed with our help. Our Microsoft Teams monitoring solution enables the IT to prioritize and resolve problems to optimize the performance of Microsoft Teams, Microsoft Office 365 and business applications delivered to the business lines and customers.

How to Easily Manage Microsoft Teams Device Compliance

Dependence on Microsoft Teams has never been greater, and the pressure is on for IT teams to deliver exceptional user experiences – anytime, anywhere. In this series of videos, we explore the most common Microsoft Teams performance use cases that our customers managed with our help. Our Microsoft Teams monitoring solution enables the IT to prioritize and resolve problems to optimize the performance of Microsoft Teams, Microsoft Office 365 and business applications delivered to the business lines and customers.

How to Manage Microsoft 365 and Microsoft Teams SLAs

Dependence on Microsoft Teams has never been greater, and the pressure is on for IT teams to deliver exceptional user experiences – anytime, anywhere. In this series of videos, we explore the most common Microsoft Teams performance use cases that our customers managed with our help. Our Microsoft Teams monitoring solution enables the IT to prioritize and resolve problems to optimize the performance of Microsoft Teams, Microsoft Office 365 and business applications delivered to the business lines and customers.

How Theia Scientific and Volkov Labs use Grafana and AI to analyze scientific images

Dr. Christopher Field is Co-Founder, President, and Principal Investigator at Theia Scientific. With formal education in analytical chemistry and instrumentation, Chris has expertise in scientific hardware and software design, deploying embedded Linux devices for Internet of Things (IoT) and sensor fusion applications, and developing computer vision and image processing pipelines for cell analysis.

How to Monitor Third Party Services Used to Deliver Microsoft Teams

Dependence on Microsoft Teams has never been greater, and the pressure is on for IT teams to deliver exceptional user experiences – anytime, anywhere. In this series of videos, we explore the most common Microsoft Teams performance use cases that our customers managed with our help. Our Microsoft Teams monitoring solution enables the IT to prioritize and resolve problems to optimize the performance of Microsoft Teams, Microsoft Office 365 and business applications delivered to the business lines and customers.

How to Ensure Microsoft Teams Video Call Quality with the Return to the Office

Dependence on Microsoft Teams has never been greater, and the pressure is on for IT teams to deliver exceptional user experiences – anytime, anywhere. In this series of videos, we explore the most common Microsoft Teams performance use cases that our customers managed with our help. Our Microsoft Teams monitoring solution enables the IT to prioritize and resolve problems to optimize the performance of Microsoft Teams, Microsoft Office 365 and business applications delivered to the business lines and customers.

How to Monitor Redis with OpenTelemetry

We’re excited to announce that we’ve recently contributed Redis monitoring support to the OpenTelemetry collector. You can check it out here! You can utilize this receiver in conjunction with any OTel collector: including the contrib collector, the observIQ’s distribution of the collector, as well as Google’s Ops Agent, as a few examples.

What Data Types Does SQL Server Support?

There are many types of data compatible with SQL Server, and it’s important to understand what they are to avoid issues with non-compatible data types. Understanding the compatible data types is also fundamental to understanding the data type precedence, which determines what type of data will result when working with objects of two different types.

Interlink Software: Enterprise AIOps Platform Mobile App

To protect the availability of the services your customers rely on, AIOps adoption is an imperative for large enterprises. Interlink Software’s AIOps platform applies machine learning to automate ITOps; reducing alert noise, performing event correlation, anomaly detection and root cause determination. As the world emerges from the Covid-19 pandemic, organizations are increasingly embracing the flexibility of home and hybrid working.

Avoid IT's Worst Nightmare by Monitoring Proactively - Not Reactively!

The day-to-day operations of a business rely on having a top-of-the-line network performance. Customers, employees, managers and business partners require fast access to applications, services and data. When bottlenecks or security breaches occur, identifying these threats immediately is an absolute must. Monitoring networks is necessary for IT departments, but doing so effectively is an even more significant challenge.

California trucking company Quik Pick Express handles downtime and troubleshooting issues using OpManager

Quik Pick Express, a California-based trucking company, provides a variety of services, including container drayage, warehousing services, container shipping, transloading, and much more. The company services seven locations in the state of California, with its IT infrastructure network being critical for its business. After a thorough evaluation, Quik Pick Express’ IT team chose OpManager, an all-inclusive network monitoring solution, to monitor its business-critical IT infrastructure network.

Monitor Knative for Anthos with Datadog

Developed and released by Google in 2018 with contributions from IBM, VMWare, Red Hat, and other companies, the Knative project is designed to make it as simple as possible to build, deploy, and scale serverless containers across your existing Kubernetes infrastructure. By operating on top of Google Anthos, Knative for Anthos takes this even further by allowing developers to build and deploy applications across any hybrid environments that include both on-prem and cloud-hosted serverless clusters.

Get More Value From Your Logs Without Compromising Costs

Everyone at LogDNA is (unsurprisingly) obsessed with the power of log data. It is the single source of truth for what is happening in your environment and, when used correctly, provides the insights needed to deliver better experiences. Now more than ever, people across various teams understand the value of having easy access to log data within key workflows.

3 Key Benefits to Web Log Analysis

Whether it’s Apache, Nginx, ILS, or anything else, web servers are at the core of online services, and web log analysis can reveal a treasure trove of information. These logs may be hidden away in many files on disk, split by HTTP status code, timestamp, or agent, among other possibilities. Web access logs are typically analyzed to troubleshoot operational issues, but there is so much more insight that you can draw from this data, from SEO to user experience.

Deployment-time testing with Grafana k6 and Flagger

When it comes to building and deploying applications, one increasingly popular approach these days is to use microservices in Kubernetes. It provides an easy way to collaborate across organizational boundaries and is a great way to scale. However, it comes with many operational challenges. One big issue is that it’s difficult to test the microservices in real-life scenarios before letting production traffic reach them. But there are ways to get around it.

Troubleshooting Alerts the Right Way: As a Team

At Netdata, we love two things more than anything else: Our goal is to make troubleshooting and monitoring as seamless as possible with the open-source Agent. This includes giving you pre-configured alerts so that you get notified immediately when a disruption occurs. The Netdata Agent comes with over 250 pre-configured and optimized alerts.

High-Performance Javascript in Stream - Why the Function in Your Filter Matters

Being a Cribl Pack author, I frequently receive questions related to why I chose to implement a certain functionality inside my Packs the way I did. A few lives ago, I worked for a Fortune 250 oil & gas company where I managed our SIEM environment. We didn’t have much in terms of system resources, so we needed to make everything run as efficiently as possible. (Maybe that’s where I get my love for performance from?)

4 Mobile Vitals to Keep a Pulse on Your Flutter Applications

Flutter is one of the fastest-growing open source cross-platform development frameworks. The likes of BMW, Google Pay, Tencent, and iRobot all use Flutter to quickly build and maintain mobile applications. In fact, Flutter was used by 42% of software developers in 2021, surpassing React Native as the most popular cross-platform mobile framework.

Best practices for monitoring mobile app performance

In a crowded and competitive market, mobile app developers must offer continuous availability and a frictionless user experience to minimize churn. Monitoring and maintaining mobile apps presents unique challenges. Since mobile apps run on a wide range of devices, it can be difficult to get clear visibility into client-side performance.

Apache Kafka Consumer Lag Monitoring

The world lives by processing the data. Humans process the data – each sound we hear, each picture we see – everything is data for our brain. The same goes for modern applications and algorithms – the data is the fuel that allows them to function and provide useful features. Even though such thinking is not new, what is new in recent years is the requirement of near-real-time processing of large quantities of events processed by our systems.

CNCF Live: Power up your machine learning - Automated anomaly detection

Our Analytics & ML lead Andrew Maguire recently had a chance to share our new Anomaly Advisor feature with the wider CNCF community. In his demonstration he did some light chaos engineering (using Gremlin and stress-ng) to generate some real anomalies on his infrastructure and watch how it all played out in the Anomaly Advisor in Netdata Cloud. There were also some great questions and discussion from the audience around ML in general and in the observability space itself.

Is it DDOS or is it you?

Server load can tell you a lot about your day-to-day user traffic. A sudden spike in server traffic can indicate an attack, but that’s not always the case. As website and performance monitoring become more mainstream, and you add a wider variety of backend testing and web monitoring checks to your infrastructure – you have to ask the question – Is that spike in server traffic DDOS? Or is it me…

How Nexthink Experience Complements Unified Endpoint Management

I’ve worked in IT for over 20 years and specifically in End User Computing (EUC) for the last 10 years, notably working for Citrix and Dell Technologies. I want to share with you what some of the key differences are from a Unified Endpoint Management (UEM) platform and a Digital Employee Experience (DEX) platform (such as Nexthink Experience), and how they complement one another, and where there is overlap.

Status Pages Now Have Favicons

It’s no secret, we love favicons (short for “favorites icon”) here at StatusGator. Our aggregated status dashboards and status pages feature the favicons of all the services you depend on, aiding in recognition. Favicons are shown when you bookmark, or add a site to your favorites. And, more importantly, they are shown in the tabs of your browser. Now, your own StatusGator aggregated status page can have a custom favicon, too!

13 Best Performance Monitoring Tools for Java

The Java programming language is simple to learn and use, and it is frequently used by web developers to create applications. However, monitoring the performance of a Java-based application might be difficult because it is not a simple process. Implementing multiple monitoring tools to track Java logs, metrics, infrastructure data, and other operational factors is necessary for troubleshooting inefficiencies.

Kubernetes Incident Response Best Practices

Inevitably, organizations that use technology (regardless of the extent) will have something, somewhere, go wrong. The key to a successful organization is to have the tools and processes in place to handle these incidents and get systems restored in a repeatable and reliable way in as little time as possible.

CI/CD & DevOps Pipeline Analytics: A Primer

Tracking application-level and infrastructure-level metrics is part of what it takes to deliver software successfully. These metrics provide deep visibility into application environments, allowing teams to home in on performance issues that arise from within applications or infrastructure. What application and infrastructure metrics can’t deliver, however — at least not on their own — is breadth.

Introducing the new Confluent Cloud integration for Grafana Cloud

At Grafana Labs, we’re continuing to expand our platform of Grafana Cloud integrations that make it easier than ever to connect and monitor external systems. These integrations enable you to answer the big picture questions in your organization and tell your observability story.

4 steps to bring network observability into your organization

The vast majority of corporate IT departments have a network monitoring solution. Typically that solution is built on standalone software platforms. If that’s you, this post is for you. You’re probably hearing a lot about “observability” these days. Generally, that’s the ability to answer any question and explore unknown or unexpected problems to deliver great digital experiences to your users.

Real User Monitoring vs Synthetic Monitoring Comparison: What Should You Use? | Sematext

What is a real user monitoring tool? and what is a synthetic tool? Which monitoring tool do you really need? In this comparison video, we will look at the pros and cons of monitoring your site with synthetic vs. real user monitoring tools. Ultimately, we will see that these two technologies work together to ensure that your website runs well and is optimized for the end-user.

Logbook: Team Discussion and Full Incident History

We've launched a feature that will help you fix errors and performance issues as a team! 🎉 With Logbook you get the full incident history. Read and leave team comments, see which notifications were sent at what time, and see team activity for changes in incident states. It's now easier than ever to see what the current state of an incident is.

Overview of Check Types

00:26 Web Checks

00:31 HTTP(S) Check

01:13 Transaction Check

01:45 API Check

02:38 RUM Check

03:12 Malware and Virus Checks

03:37 SSL Check

04:00 Network Checks

04:04 WHOIS/Domain Expiry Check

04:19 DNS Check

04:43 Ping (ICMP) Check

04:59 NTP Check

05:22 SSH Check

05:34 TCP/UDP Port Checks

05:54 Email Checks

05:59 IMAP, POP, SMTP and Domain Blacklist Checks

06:37 Custom Checks

Elastic on Elastic: How we saved $100,000/month by keeping our own software up to date

Let's start with the bottom line: When we upgraded to Elasticsearch 7.15 last year, our internal observability clusters saw a reduction in inter-node traffic from 464TB to 204.5TB per day. We monitored this reduction through subsequent upgrades and noticed its impact on our data transfer and storage costs. So here it is: upgrading saved Elastic $3,500 per day, or approximately $100,000 a month, or $1.2 million annually.

AppScope 1.0: Changing the Game for Infosec, Part 2

We’re introducing AppScope 1.0 with a series of stories that demonstrate how AppScope changes the game for SREs and developers, as well as Infosec, DevSecOps, and ITOps practitioners. This blog is the second of two Infosec stories. For both Part 1 and Part 2, Randy Rinehart, Principal Product Security Engineer at Cribl, contributed extensively.

VMware vSphere vs. ESXi vs. vCenter

VMware is best known for developing software packages that support virtualization. Using the company’s solutions, your business can adopt a multi-cloud environment. It also works to modernize applications and support security initiatives. You can even create an anywhere workspace for your team. One of the products that will help you accomplish these things is known as vSphere. vSphere is a suite of products. This means you can download and use components of the suite individually.

Extend Device Lifecycles and Increase Employee Happiness

See how Nexthink’s IT department extended hardware refresh cycles while increasing employee happiness by offering device choices and incorporating sentiment and environment stewardship into the process. Like many organizations, our IT team has fallen behind on hardware refresh cycles due to the global chip shortage and ongoing supply chain issues.

Optimizing Microsoft Teams PSTN for the Hybrid Workforce

Martello Vantage DX™ enables IT teams to understand and share Microsoft Teams user experience metrics with every stakeholder involved in the service delivery. Gain insight into Microsoft Teams’ telephony user experience; whether the calls are direct or using PSTN connectivity such as Calling Plans, Direct Routing or Operator Connect.

Network Monitoring Safety Checklist-5 Steps to Take

Network monitoring is a vital part of your security architecture and layered security portfolio. Step one in your network monitoring safety checklist is making sure you have network monitoring in the first place. The rest of our steps assume you have network monitoring and lay out five ways to optimize its value.

The Great Resignation - What's at Stake for IT?

Roughly 47.4 million people quit their jobs and left the workforce last year in search of better ones, leading to what we now call the Great Resignation. Then, as the economy re-opened and companies intensified hiring efforts, millions of people switched careers, searching for better working conditions and higher salaries. Experts say the trend will continue as the Gen Z population reshapes the labor market.

Slack's New Logging Storage Engine Challenges Elasticsearch

Elasticsearch has long been the prominent solution for log management and analytics. Cloud-native and microservices architectures, together with the surge in workload volumes and diversity, have surfaced some challenges for web-scale enterprises such as Slack and Twitter. My podcast guest Suman Karumuri, a Sr. Staff software engineer at Slack, has made a career on solving this problem. In my chat with Suman, he discusses for the first time in a public space a new project from his team at Slack: KalDB.

TL;DR InfluxDB Tech Tips: From TICKscripts to Flux Tasks

If you’re a 1.x user of InfluxDB, you might be a Kapacitor user as well. If so, you’re also familiar with TICKscripts, the data processing and transformation language for Kapacitor, the batch and stream processor for InfluxDB. Kapacitor is a great tool, but it’s largely a black box, so using and implementing TICKscripts to execute data processing tasks, checks, and notifications can be a challenging developer experience.

How To Configure Flowmon and WhatsUp Gold

In the previous “Flowmon and WhatsUp Gold: Discover application experience issues through single pane of glass” blog post we have demonstrated how IT Infrastructure Monitoring (WhatsUp Gold) and Network Performance Monitoring & Diagnostics (Flowmon) work seamlessly together to report on application performance, user experience and infrastructure status.

C-Suite Reporting with Log Management

When security analysts choose technology, they approach the process like a mechanic looking to purchase a car. They want to look under the hood and see how the product works. They need to evaluate the product as a technologist. On the other hand, the c-suite has different evaluation criteria. Senior leadership approaches the process like a consumer buying a car.

Observability for New Teams: Part 1

Any significant shift in an organization’s software engineering culture has the potential to feel tectonic, and observability (o11y for short)—or more specifically, Observability Driven Development—is no different. Leaning into observability, which calls for tool-enhanced investigation, hypothesis testing, and data richness can be cumbersome even for the most veteran of teams.

Reducing False Positives in Capped Campaigns

As the adtech industry continues to expand and the volume of ads sold and served grows exponentially, the only way to manage the business is through programmatic advertising. This approach utilizes data insights and algorithms to automatically serve ads to the right user, at the right time, on the right platform, and at the right price. The speed and scale of online advertising means that adtech companies need to collect, analyze, and act upon immense datasets instantaneously, 24 hours a day.

The new Check Overview is now live!

Today, I'm excited to share the release of a long-planned and requested feature - our new Check Overview Page. Until now, Checkly enabled you to troubleshoot single alerts, but a deep dive into the long-term performance trends was limited. That is not the case anymore. In the new Check Overview, we’re introducing the enhanced analytics in four distinct categories: The update is focused on two important outcomes.

Set up and observe a Spring Boot application with Grafana Cloud, Prometheus, and OpenTelemetry

Spring Boot is a very popular microservice framework that significantly simplifies web application development by providing Java developers with a platform to get started with an auto-configurable, production-grade Spring application. In this blog, we will walk through detailed steps on how you can observe a Spring Boot application, by instrumenting it with Prometheus and OpenTelementry and by collecting and correlating logs, metrics, and traces from the application in Grafana Cloud.

Peering, edge computing, and community with Grant Kirkwood | Network AF Episode 16

Chief Technology Officer and Co-founder of Unitas Global, Grant Kirkwood, joins Network AF to discuss motivations for starting the company and where they're at currently. Avi and Grant talk about what it is like to be a service provider and a solution provider (MSP) in one, and how it plays into what Avi calls the APIfication of networks and IT strategy.

Using AI & ML for Application Performance (APM)

Today, IT and site reliability engineering (SRE) teams face pressure to remediate problems faster than ever, within environments that are larger than ever, while contending with architectures that are more complex than ever. In the face of these challenges, artificial intelligence has become a must-have feature for managing complex application performance or availability problems at scale.

Cloud Log Management Strategy & Best Practices

For IT Operations and Site Reliability Engineering (SRE) teams, logging is nothing new. In fact, collecting and analyzing logs is one of the oldest cornerstones of performance management. Logs have been part and parcel of APM workflows for decades. Yet the logging strategies that worked in eras past often fall short today. That’s thanks to the advent of cloud-native computing, which has ushered in fundamental new challenges in the way teams aggregate, analyze, and manage logs.

The Critical Role of the SRE & Error Budgeting

The role of SRE, Site Reliability Engineer, was first created by Benjamin Treynor in 2003 at Google after he was tasked with ensuring that their websites were available and reliable. The SRE is a multi-disciplined role that needs to have the ability to automate monitoring and observability across hundreds and thousands of complex systems.

Are You Curious? Announcing the Launch of Cribl Curious: A Q&A Site for the Cribl-Inclined

Our amazing user community is growing so fast that we want to give you more resources to learn and share your knowledge and experience with others. So…today we launch Cribl Curious! Curious is a Q&A site for asking and answering technical questions about Cribl Stream, Cloud, Edge, Packs, and AppScope. Goat a question about how something works in Cribl? Come on over to see how your peers have solved similar problems. Checked the docs and it’s just not clicking for you?

What Are The Different Types of Authentication?

The goal of authentication is to confirm that the person attempting to access a resource is actually who they say they are. As you can imagine, there are many different ways to handle authentication, and some of the most popular methods include multi-factor authentication (MFA) and Single Sign On (SSO). However, these methods just skim the surface of the underlying technical complications. In order to implement an authentication method, a business must first establish an authentication protocol.

Using Synthetic Endpoints to Quality Check your Platform

Quality control and observability of your platform are critical for any customer-facing application. Businesses need to understand their user’s experience in every step of the app or webpage. User engagement can often depend on how well your platform functions, and responding quickly to problems can make a big difference in your application’s success. AWS Canaries can help companies simulate and understand the user experience.

Introducing OID Monitor History

Despite everyone’s best efforts, network failures happen. And when downtime means lost productivity, fast troubleshooting becomes an integral part of IT operations. So with the addition of OID (object identifier) monitoring history, Auvik providing users an archive for troubleshooting, analysis, and planning. When it comes to managing network issues, diagnosing the root cause is the first step. And often, there’s a gap between when an incident occurs, and when it’s reported.

Events in MS Windows and Pandora FMS, does anyone give more?

If the spreadsheet was the essential application for accounting and massification of personal computers, MS Windows® operating system was the graphical interface that turned work into something more pleasant and paved the way for web browsers for the Internet as we know it today. Today, in Pandora FMS blog, we discuss.

ASP.NET Core 7 has built-in dark mode for error pages

You may remember Dark Screen of Death, the Chrome extension to bring dark mode to ASP.NET Core exception pages that we launched back in February. I probably should have followed the commits on the aspnetcore repository more closely, since it turns out that ASP.NET Core 7 comes with its own dark mode version of error pages. In this post, I'll share how to enable it and look at the differences between the built-in version and the Chrome extension.

NiCE Active 365 Management Pack 4.0 released

Microsoft 365 services help companies worldwide improve business and revenue by providing best in class digital workspace experience. The NiCE Active 365 Management Pack complements this by advanced M365 monitoring such as full Teams Call analysis integrated into Microsoft SCOM. Advanced monitoring and analytics help you reveal unwanted micro-events influencing the health and performance of the system and its users.

Debugging the Java Message Service (JMS) API using Lightrun

The Java Message Service API (JMS) was developed by Sun Microsystems in the days of Java EE. The JMS API provides us with simple messaging abstractions including Message Producer, Message Consumer, etc. Messaging APIs let us place a message on a “queue” and consume messages placed into said queue. This is immensely useful for high throughput systems – instead of wasting user time by performing a slow operation in real-time, an enterprise application can send a message.

How to Monitor Website Performance

Most organizations today have a large digital presence, and some rely significantly on their web applications to provide value to their customers and generate income. Keeping your website up and running 24/7 isn't enough in today's digital world. To provide a better experience, you should optimize your web pages frequently. Slow-loading pages or those that aren't mobile-friendly might cause an increase in bounce rate as well as influence your search engine rankings.

New in the Kubernetes integration for Grafana Cloud: Kubernetes events, Pod logs, and more

The Kubernetes integration for Grafana Cloud helps users easily monitor and alert on core Kubernetes metrics using the Grafana Agent, our lightweight observability data collector optimized for sending metric, log, and trace data to Grafana Cloud. It packages together a set of easy-to-deploy manifests for the Agent, along with prebuilt dashboards and alerts.

Video: How to migrate to Grafana Mimir in less than 4 minutes

Since we launched Grafana Mimir — the most scalable, most performant open source time series database in the world — we have answered many of your questions about our latest open source project, including how to pronounce it. (All together now: /mɪ’mir/.) We have also walked through how we scaled Grafana Mimir to 1 billion active series.

Azure AD Monitoring

As the Azure cloud administrator, you need to know who is accessing your cloud resources, how they are access it, what they access, what changed, when they access and from where, etc? Azure AD (Azure Active Directory) provides answers to above by storing the information in two logs, the information stored in them is extremely valuable for troubleshooting, monitoring and for general security related work, the logs are.

Announcing new simple query options in Cloud Logging

When you’re troubleshooting an issue, finding the root cause often involves finding specific logs generated by infrastructure and application code. The faster you can find logs, the faster you can confirm or refute your hypothesis about the root cause and resolve the issue! Today, we’re pleased to announce a dramatically simpler way to find logs in Logs Explorer.

Measuring RPKI ROV adoption with NetFlow

Resource Public Key Infrastructure (RPKI) is a routing security framework that provides a mechanism for validating the correct originating autonomous system (AS) and prefix length of a BGP route. Route Origin Authorization (ROA) is a cryptographically signed object within the RPKI that asserts the correct originating AS and prefix length of a BGP route. For as long as the internet has existed, the challenge of securing its underlying protocols has persisted.

5 Cloud Migration Planning Tool Misconceptions

By now, most of us know that migrating workloads to the cloud isn’t like simply moving software from one server to another—especially in a complex enterprise infrastructure with many interdependent services and components. There are a lot of things that can go wrong: technical public cloud provisioning issues, security and compliance challenges, lack of cloud skills, wrong public cloud service provider (CSP) selection, unexpected costs, and more.

Why Is Normalizing Log Data in a Centralized Logging Setup Important: Operations & Security

The phone rings. Your email pings. Your marketing team just told you about a flood of messages on social media and through live chat that there’s a service outage. You thought your Monday morning would be calm and relaxed since people are just returning from the weekend. How do you start researching all of these incoming tickets? How do you know which ones to handle first? Is this just a hardware failure, or are you about to embark on a security incident investigation like Log4j?

Java vs Python

Computer science is crucial to our lives today, and programming languages play a fundamental role. These languages act as a programmer’s toolbox. However, choosing a language can be challenging, especially when deciding between the two most popular options. These options include Java and Python. Java and Python are widely used as general-purpose programming languages for desktop and web applications. There are many similarities when comparing the two, but there are also significant differences.

Synthetic Monitoring for CI/CD Pipelines

For DevOps teams, delivering quality software has long required reconciling a major tension: In a perfect world, you’d catch every issue in each new release of your application before you deployed the release into production. But in the real world, doing so is tricky, not least because it’s hard to collect data about application performance before the application is actually deployed.

On-Premises Application Monitoring: An Introduction

In the present age of cloud-native everything, it can be easy to forget that some applications still run on-premises. But they do and managing the performance of on-premises apps is just as important as monitoring those that run in the cloud. With that reality in mind, here’s a primer on how to approach on-premises application performance monitoring as part of a broader cloud-native performance optimization strategy.

What To Do When Your Shopify Site Goes Down

Shopify downtime can be a real risk to your business. It can cause you unprecedented losses. For example, it can prevent clients from accessing your ecommerce store. Anytime you experience situations like slow browsing or outage, it’s crucial to act swiftly to troubleshoot the issues and fix them. How should you go about it? In this article, we will.

Papertrail Now A DigitalOcean SaaS Add-On

A little over a year ago, we announced a partnership with DigitalOcean to make it easier for users to add log management and monitoring for applications running in DigitalOcean Droplets (VMs). Since then, we’ve created a DigitalOcean Marketplace Listing, provided a direct link to the signup page, and shared many ideas on how to use SolarWinds® Papertrail™ to troubleshoot and optimize applications running in DigitalOcean Droplets.

How SAP built a Dojo Community of Practice to support a cultural shift to DevOps

by Sam Fell, VP, Product Marketing, Observability, Sumo Logic I love technology, and I’m thrilled to work in a profession where I’m steeped in it! In my career as a developer, consultant and marketeer I've learned it’s not “the cool new tech stack” that helps win the day.

Use formulas and functions in RUM monitors for high-value alerts

Real User Monitoring (RUM) gives you visibility into the behavior of your users and the performance of your applications. You may already be using RUM monitors to automatically notify your team when the number of RUM events—such as pageviews, clicks, or errors—rises above a threshold you define.

Distributed Tracing Best Practices for Microservices

The management of modern software environments hinges on the three so-called “pillars of observability”: logs, metrics and traces. Each of these data sources provides crucial visibility into applications and the infrastructure hosting them. For many IT operations and site reliability engineering (SRE) teams, two of these pillars — logs and metrics — are familiar enough.

SolarWinds IT Management and Monitoring for Government

Each of our products are designed to solve the problems IT professionals face every day. With a continuously expanding portfolio that scales to meet your needs, SolarWinds is revolutionizing the way federal IT manages their operations with our simple, powerful, and secure IT management software. SolarWinds supports every branch of the DoD, nearly every civilian and intelligence agency in the United States, and a majority of National Health Trusts in England and European Union institutions.

eCommerce giant Blinkit's journey from ELK Stack to Grafana Cloud

The promise: Order any groceries and essentials from Blinkit’s mobile app, and they’ll be delivered to your doorstep within 10 minutes. The process: Very difficult with a legacy logging tool. For Blinkit, the instant delivery service formerly known as Grofers that serves millions of consumers across India, their tech stack was beginning to interfere with business operations at a time when the company was hyperscaling due to its popularity.

How to Foster Digital Dexterity in Your Workplace

Digital dexterity is a fundamental attribute within the most successful workforces – and will only become more essential as businesses enter the rapidly approaching future workplace. Yet, many businesses struggle to assess and promote this vital skill among their own employees. The increasing value of digital dexterity is due in large part to the digitization of the workplace, which was well underway prior to the pandemic..

Dashboard Fridays: Sample M365 dashboard

In this latest Dashboard Fridays episode, join our VP of Innovation Adam Kinniburgh as he showcases this M365/O365 dashboard built with SquaredUp. This summary dashboard built for the MP using SCOM Edition gives a clear overview of your M365/0365 health, including: subscriptions, consumed licences, network issues, teams performance, mail flow duration and more. This dashboard pack also includes a bunch of cool perspectives with dedicated views (open and closed alerts, metrics, health) for different components, including licencing, teams, alerts and more.

How Network Monitoring Defuses Hacker Bombs

You know by now that hackers literally never sleep. Chances are your network has been hit before and absolutely will be hit again. Hackers invent new techniques every day and tweak existing ones, many of which are automated—which is why we can say that hackers literally never sleep. Hackers either attack your network directly or attack your infrastructure through your network. Either way, the network itself is your first line of defense.

Explore a centralized view into service telemetry, Error Tracking, SLOs, and more

When your service is undergoing performance issues, it is essential to address them in a timely and frictionless manner. With access to more telemetry and insights, the APM Service Page provides a comprehensive overview of your service and helps you quickly drill down under the hood to diagnose and investigate issues.

Expo 2020 Dubai: How ManageEngine was part of the world's greatest expo over the last five years

In late November 2013, the highly anticipated Expo 2020 Dubai was announced by World Expo. Expos over the years have showcased the world’s greatest innovations, initiated the dialogue for a more progressive future, and encouraged people to embrace the evolving technology-driven shift in society. Expo 2020 Dubai started on October 1, 2021 with over 192 countries participating, hosting a whopping 5,610 official events on its site in the first month.

How to Profit From the IoT (Internet of Things) Revolution

The IoT (Internet of Things) market is growing rapidly with increasing adoption of smart infrastructure to improve efficiency and countries initiating smart city projects. Today I’ll analyze Cisco Systems (CSCO), Zebra Technologies (ZBRA), and TE Connectivity (TEL), which are well-poised to benefit from this revolution. The Internet of Things (IoT) refers to connecting devices to the internet.

What is SAP cyber security?

Recently, we discussed the various security measures SAP takes to mitigate and prevent security threats to their customers’ ERP systems, and how Avantra can help you understand which SAP HotNews releases are relevant to your business-critical applications. But let’s take a step back for a moment to discuss what SAP cyber security is, common threats to your SAP landscape, and what SAP/Avantra products are available to strengthen your S/4HANA or legacy product.

How to Keep DevOps in Sync with Business Needs

If you’re an engineer, it’s probably easy enough to appreciate the technical value of DevOps. DevOps makes software delivery faster, increases agility, improves collaboration and more. That being said, this is likely not the case for business professionals. They don’t always see the value of DevOps as clearly from their perspective. After all, even if you adopt the best DevOps tools and design optimal DevOps processes, there’s no guarantee that DevOps will drive business value.

Top 10 Requirements of Cloud Monitoring Tools

Most organizations are moving applications and workloads to the cloud. Our APM survey found that 88% of organizations had some form of cloud technology deployed already. At the same time, there are several misconceptions about the cloud. There are many who believe they don’t need monitoring tools for the cloud because their cloud provider will take care of all of their performance needs. This is a myth because cloud provider SLAs are mainly around infrastructure availability.

Unlocking self-service monitoring with the Sensu Integration Catalog

Introducing the Sensu Integration Catalog — a marketplace-like UX for simplifying new user onboarding, and deploying production-ready monitoring in a matter of minutes. The Sensu Integration Catalog is also an open marketplace that new and existing users can contribute to by sharing Sensu configurations. Continue reading to learn more!

Committed to Open Source - Sumo Logic Simplifies Infrastructure and Application Monitoring Deployments

REDWOOD CITY, Calif. – April 21, 2022 – Sumo Logic (NASDAQ: SUMO), the SaaS analytics platform to enable reliable and secure cloud-native applications, today announced a new open source offering, the Sensu Integration Catalog. Available today on GitHub, the Integration Catalog is an open, self-service marketplace featuring over 40 turn-key integrations.

The 3-Step Communication Game Plan for a Site Outage (One of Our LEAST Favorite Things)

If those von Trapp Family singers from The Sound of Music collectively woke up in a really, really bad mood and decided to write a song about their least favorite things, then it’s a safe bet that not being able to connect to a website would make the list (alongside airline passengers who tilt their seat back, and clam shell plastic packaging).

New in Grafana 8.5: updated panels, new RBAC features, simplified reporting, and more!

Grafana 8.5 is here! Download Grafana 8.5 We’ve worked on a variety of improvements that focus on Grafana’s usability, data visualization, and security. For a full list of new features and capabilities, check out our What’s New in Grafana 8.5 documentation. You can get started with Grafana in minutes with Grafana Cloud. We have free and paid Grafana Cloud plans to suit every use case — sign up for free now.

Ask Miss O11y: Baggage in OTel

Miss O11y is delighted to welcome our newest band member: Martin Thwaites! Martin has been a member of the Honeycomb user community practically since its inception. He is a UK-based consultant who specializes in helping teams scale up and tackle challenging business problems, and a long-time contributor to the Azure and.NET communities. We think he looks ✨amazing✨ in a tiara.

We Moved Some Data to S3

When clients make HTTP POST requests to ping URLs, Healthchecks captures and stores request body data. You can use this feature to log a command’s output and have it available for inspection later: Same thing, using runitor instead of curl: You can view the request body data in the web UI: Healthchecks also captures and stores email messages, when pinging by email.

How to Write a Custom Terraform Provider Automatically With OpenAPI

So you’ve just been tasked with creating a Terraform Provider (or maybe upgrading an existing one). As you do your research to prepare for the project, you slowly begin to realize, “well this looks like I’ll just be developing a Terraform specific wrapper for a client of my API.” It’s not particularly difficult, but it seems tedious. Is there a better way to build this? Maybe something that can be modular and automated? A way that’s actually stimulating to implement?

Splunk Operator 1.1.0 Released: Monitoring Console Strikes Back!

The latest version of the Splunk Operator builds upon the release we made last year with a whole host of new features and fixes. We like Kubernetes for Splunk since it allows us to automate away a lot of the Splunk Administrative toil needed to set up and run distributed environments. It also brings a resiliency and ease of scale to our heavy-lifting components like Search Heads and Indexer Clusters.

Two-factor authentication in Pandora FMS

I have been a regular user of Pandora FMS for years and the best I can say about them is that they always have something new to add to my learning. Today, for example, I rediscovered the Two-Factor authentication in Pandora FMS! *And I did it, in part, through this article already published on their blog Although I devote myself to programming (and it is what I like to do the most), I am more of a Web 2.0 person than a Web 3.0 person because I consider that the latter has been abused too much.

SolarWinds launches comprehensive Observability, empowering customers to accelerate digital transformation

Integrated solution enables IT agility, productivity, and actionable intelligence for organisations of all sizes and industries, wherever they are on their modernisation and cloud migration journeys.

Troubleshoot directly from any replay with Browser Dev Tools

Session Replay now includes Browser Dev Tools, a new feature that enables engineers to identify and debug the root causes of issues even faster by exposing key information about a playback session, such as network performance bottlenecks and any console log errors. This wealth of surrounding context will make it easier to trace frontend incidents throughout your application and remediate larger, ongoing issues.

MQTT vs Kafka: An IoT Advocate's Perspective (Part 1 - The Basics)

With the Kafka Summit fast approaching, I thought it was time to get my hands dirty and see what it’s all about. As an advocate for IoT, I heard about Kafka but was too embedded in protocols like MQTT to investigate further. For the uninitiated (like me) both protocols seem extremely similar if not almost competing. However, I have learned this is far from the case and actually, in many cases, they complement one another.

Monitoring vs. Observability: What's The Role of Each For DevOps?

DevOps: Development and Operations joined together in perfect harmony, one feeding the other and vice-versa. That's the dream. But it's easy for the link between the two to be broken. 'Dev' stops talking to 'Ops,' or Ops falls out with Dev, often because of a lack of understanding of each other's goals. That's where Monitoring and Observability come in. They're like the mediators whose job is to make sure the two main players in DevOps keep that metaphorical dialogue open.

Why External Probe Servers Offer the Most Accurate Performance Monitoring

We hear a lot of questions from folks taking their first steps into website monitoring about how the service works and what we offer. One of the more frequently asked questions is why they need us at all. After all, they have metrics from XYZ provider who can tell them if they have consumed too much bandwidth or are overloaded with traffic. Wouldn’t they just know that they were up or down by watching those metrics?

New StackPod Episode: OpenTelemetry - the Future of Observability?

OpenTelemetry has been getting a lot of attention in the observability field. Moreover, in StackState’s latest release, we added support for OpenTelemetry traces. Melcom van Eeden, software developer at StackState, was one of our developer champions who made this possible. In addition to joining us on this episode of StackPod, he wrote a blog post on how to leverage OpenTelemetry with StackState and he recorded a tutorial video about the topic.

Why SolarWinds Is Evolving From Monitoring to Observability

At SolarWinds, our purpose is to enrich the lives of the people we serve. In a world where complexity is expanding, our commitment to offer simple yet powerful IT solutions has served us well throughout our more than 20-year history. Today’s pace of digital transformation means businesses are actively building new applications and leveraging multi-cloud deployments, while reworking their existing ones, to achieve operational excellence.

Simplify Your Digital Transformation With Hybrid Cloud Observability

Today marks a significant milestone for SolarWinds, as we’ve formally announced our organizational direction to evolve from our leadership position in monitoring to become a leader in observability. Today is also significant because it marks the official release of our Hybrid Cloud Observability, and I’m excited to share more details on what we believe expands the scope of observability for the market.

The evolution of network visibility

In the old days, it took a bunch of help desk tickets for an engineer to realize there was something wrong with the network. At that time, troubleshooting meant logging into network devices one-by-one to pore over logs. In the late 80s, SNMP was introduced giving engineers a way to manage network devices remotely. It quickly became a way to also collect and manage information about devices. That was a big step forward, and it marked the beginning of network visibility as we know it today.

Why UX Designers Don't Feel Valued-and Why This Is a Problem for Your Business

It’s time we had a real conversation about why UX designers everywhere are still unhappy, why that elusive “seat at the table” feels so impossibly out of reach to so many (even at companies that embrace design), and how this impacts your business Design is facing down an epidemic of designers who feel burned out, taken for granted, marginalized, and disrespected. Yes, there is something different about our experience compared to other disciplines.

Unlocking self-service monitoring with the Sensu Integration Catalog

Introducing the Sensu Integration Catalog — a marketplace-like UX for simplifying new user onboarding, and deploying production-ready monitoring in a matter of minutes. The Sensu Integration Catalog is also an open marketplace that new and existing users can contribute to by sharing Sensu configurations. Continue reading to learn more!

Sentry's Android Gradle Plugin Updated with Room Support and More

Monitoring performance is a critical part of software development. We just released version 3.0.0 of our Sentry Android Gradle plugin, which brings a handful of auto-instrumentation capabilities to Android developers, featuring Room and SQLite queries performance, File I/O operations performance, and more.

Synthetic Monitoring for Mobile Apps

The customer experience within mobile apps has become mission-critical. Today's consumers have come to expect rich, robust functionality in their mobile applications; with over 2.1 million apps available to Android users and nearly 2 million apps up for download in Apple's app store, mobile phone users have plenty of options when it comes to the applications they download.

Advanced Traffic & Security Analysis in WhatsUp Gold with Progress Flowmon

It’s a simple fact that the more tools an IT team has to use, the less effective they are. That’s why one of the chief values of WhatsUp Gold is putting everything you need to find and fix network issues fast in one simple, easy-to-use application. With that in mind, we’ve simplified advanced network traffic analysis and security threat detection from Progress Flowmon by adding those dashboards to the WhatsUp Gold interface.

Deutsche Telekom Security trusts in Icinga monitoring

We´re proud of our many customers and users around the globe that trust Icinga for critical IT infrastructure monitoring. That´s why we´re now showcasing some of these enterprises with their Success stories. It´s stories from companies or organizations just like yours, of any size and different kinds of industries. Some of them are our long-standing customers, others have just recently profited from migrating from another solution to Icinga.

Why is Application Performance Monitoring Important?

Picture this: Your on-call engineer gets an alert at 2 AM about a system outage, which requires the entire team to work hours into the night. Even worse, your engineering team has no context of where the issue lies because your systems are too distributed. Solving the problem requires them to have data from resources that live in another timezone and aren’t responsive. All the while, your customers cannot access or interact with your application, which, as you can imagine, is damaging.

Leveraging Elastic to improving data management and observability in the cloud

Two recent studies conducted by Nucleus Research, focused on how a global telecommunications provider, and multi-line insurance company realized quantified business value through Elastic. The companies that were studied saw great levels of satisfaction from deploying Elastic Cloud. Through their adoption they were able to increase the maturity of their tech stack and circumvent prior limitations in scalability.

Survey: Complexity and Costs Threaten Health of Strong DevOps Pulse

Oh how quickly the times change. Over the past 5 years, as Logz.io has executed its annual DevOps Pulse Survey and created the accompanying report, the state of “modern” cloud engineering and monitoring practices have advanced at a torrid pace. As a result, so has the project itself. Back in 2017 when we started polling the industry and collecting data, we sought to validate the usefulness of DevOps itself.

SolarWinds Hybrid Cloud Observability - The Next Evolution in Monitoring

In this video, we introduce SolarWinds Hybrid Cloud Observability and detail how our new platform can help organizations of all sizes and industries optimize performance, help ensure availability, and reduce remediation time across on-premises and multi-cloud environments by increasing visibility, intelligence, and productivity.

Introducing SolarWinds Hybrid Cloud Observability

The rise of digital transformation has accelerated opportunities and increased challenges for organizations managing complex, diverse, and distributed environments. SolarWinds Hybrid Cloud Observability is designed to take complex IT deployments and make them easy to manage with proactive end-to-end observability.

Troubleshooting Sources and Destinations in Cribl Stream

This is Part One of a series of blogs around troubleshooting Cribl Stream. Part One will focus on identifying and troubleshooting issues with Sources and Destinations in Stream. I will cover some of the common problems that users face and how you can work through them and find the root cause.

Best PHP Monitoring Tools on the Market

As of April 2022, more than 75% of the internet comprises websites made using PHP (source). That is to say, three out of four websites on the internet have been developed using PHP! That is huge! With that many PHP web applications on the web, there need to be systems that can automatedly keep track of how they respond to the hundreds of thousands of clients connecting worldwide.

Tools for Threat Hunting and IT Service Risk Monitoring

Cybersecurity can often seem intimidating for IT teams. After all, things like “threat hunting,” “red teaming,” and “blue teaming” are not used in IT operations. On the other hand, just because these words are terms of art doesn’t mean that they’re activities you don’t do already. You’re probably already using log data as part of your IT operations incident response.

How Grafana Mimir's split-and-merge compactor enables scaling metrics to 1 billion active series

Grafana Mimir, our new open source time series database, introduces a horizontally scalable split-and-merge compactor that can easily handle a large number of series. In a previous blog post, we described how we did extensive load testing to ensure high performance at 1 billion active series. In this article, we will discuss the challenges with the existing Prometheus and Cortex compactors and the new features of Grafana Mimir’s compactor.

C-Suite Perception vs. Employee Experience: How DEX Drives Customer Experience

No matter what market they occupy or products they sell, nearly every business prioritizes customer experience above all else. After all, the customer knows best. But outside of the tried-and-true methods – setting policies around customer service, offering customers thoughtful perks, etc. – how can enterprises best ensure a positive customer experience? The answer: by ensuring a positive employee experience first.

What is Splunk? (2022)

How do you thrive in today’s unpredictable world? You keep your digital systems secure and resilient. And above all, you innovate, innovate, innovate. Splunk is the extensible data platform that processes data from any cloud, any data center and any third party tool. At massive scale. We’re ready to help you accelerate your digital transformation and pave the way for incredible innovation.

Filling in the Gaps: Manufacturing and MSPs

It’s no secret that the world of manufacturing is changing fast. The “fourth industrial revolution”, a.k.a. Industry 4.0, is leading to the digitization of just about every stage in the manufacturing process. Of course, a shift from a primarily analog and mechanical world to one where digital is at the forefront means increased bandwidth requirements, larger potential attack surfaces for cybersecurity exploits, and a drastic increase in the sheer amount of network-connected devices.

Successfully migrate to Azure with the Microsoft Cloud Adoption Framework and Datadog

Migrating your applications from on-prem infrastructure to the cloud comes with a number of benefits, including increased agility, resilience, and scalability, as well as potential cost and IT overhead reductions. But it can be complex, which is why organizations moving to Azure often use Microsoft’s Cloud Adoption Framework for Azure and its strategy for successful migrations.

Service level objectives: How SLOs have changed the business of observability

Forget the latest tech gadgets and the newest products. One of the most talked about trends in observability right now? “SLOs have really become a buzzword, and everyone wants them,” said Grafana Labs principal software engineer Björn “Beorn” Rabenstein on a recent episode of “Grafana’s Big Tent,” our new podcast about people, community, tech, and tools around observability.

The SolarWinds Database Portfolio is Growing

Since acquiring SentryOne in October 2020, the SolarWinds team has been regularly updating our Microsoft® Partners and customers about the upside these combined offerings bring to the IT community and the marketplace. Of course, SentryOne has long been recognized for its SQL Server and Microsoft Data Platform tools, with an impressive portfolio of category-leading solutions.

Troubleshooting Spring Boot applications with Sentry

Some months ago we wrote a quick guide on how to use Sentry with Spring Boot and Logback. Since then, we’ve continued working on improving the development experience, added several features for error reporting and, most importantly, implemented the performance feature in Sentry Java SDK with a dedicated integration with Spring MVC. If you haven’t yet used Sentry in a Spring Boot application - nothing to worry about - you will find all the necessary steps below.

Types of Network Performance Monitoring Tools

Modern networks are complex. Their complexity is increasing daily. As much as this complexity solves modern-day problems, it may also give rise to many. Adding to the complexity is that networks have become the backbone of an organization. With remote working increasing, organizations are relying on networking infrastructure and technologies.

Importance of Networking Performance Monitoring Tools and How Infraon can Help

The importance of networking management is even more now because of increasing network complexity. Network management has become one of the most crucial aspects of a business. Without effective network management, it is impossible to have a productive workplace. It is also impossible to have zero downtime and network failures. Network management makes the high availability of networks a reality. Table of Contents.

Monitor your Redis Enterprise clusters with Datadog

Redis is an in-memory key-value data store that offers fast performance, flexible data structures, and multi-model databases, allowing it to handle a variety of use cases. Redis Enterprise enhances open source Redis with features designed to run distributed applications at scale, such as multi-tenancy, tiered data storage, active-active cluster replication, and support for up to five 9s of availability.

Sponsored Post

Are disconnected RDP sessions ticking time bombs in your network?

I think we’ve all been there before – you log on to a server remotely via RDP, and do the needful – but don’t immediately log off. But then you get distracted by a phone call, an email, a chat, or a good old-fashioned physical interaction with another human being. So when it comes time clock out for the night, you shut down your computer or log off. Or maybe you’ve been working on a laptop and your VPN got interrupted.

Businesses Must Know About the Best Practices in Asset Monitoring

Lots of organizations do not pay attention to their assets and pieces of equipment, who is using them and where are they located. These are particularly important questions but usually, they are ignored as a result assets are lost and nowhere to be found when they are required. Lots of employees waste their time finding the required assets and pieces of equipment. It also leads to delayed production work. Overall, the top line and bottom line suffer.

Ask Miss O11y: Logs vs. Traces

Ah, good question! TL;DR: Trace instead of log. Traces show connection, performance, concurrency, and causality. Logs are the original observability, right? Back in the day, I did all my debugging with `printf.` Sometimes I still write `console.log(“JESS WAS HERE”)` to see that my code ran. That’s instrumentation, technically. What if I emitted a “JESS WAS HERE” span instead? What’s so great about a span in a trace? Yeah, and so do logs in any decent framework.

Video: Get started with Grafana Mimir in minutes

Since we launched Grafana Mimir — the most scalable, most performant open source time series database in the world — we have answered many of your questions about our latest open source project, including how to pronounce it. (All together now: /mɪ’mir/.) We have walked through how we scaled Grafana Mimir to 1 billion active series. And we will be hosting webinars to showcase cutting-edge features like query sharding and the two-stage compactor.

Implementing OpenTelemetry in Angular application

OpenTelemetry can be used to trace Angular applications for performance issues and bugs. OpenTelemetry is an open-source project under the Cloud Native Computing Foundation (CNCF) that aims to standardize the generation and collection of telemetry data. Telemetry data includes logs, metrics, and traces. Angular is a frontend Javascript framework that uses HTML and Typescript. It’s a popular framework used by many organizations for their frontend applications.

The ins, outs, and benefits of using Grafana Loki as a backend logging solution

As organizations have moved from monolithic to microservice-based architectures, there has been an explosion in the volume of logs generated. Most logging solutions create a full index of the logs and use SSD drives, which results in costly compute and storage resources for logs that are mostly write once, read never. We created Grafana Loki to solve these problems. Loki only indexes the metadata of the log lines, relies on inexpensive object storage, and is architected for scalability. In addition, Loki takes advantage of parallelism and sharding that results in fast query performance. In this session, we will discuss the benefits of using Loki as a backend logging solution.

Analyzing Cardinality in Grafana Cloud and Grafana Enterprise Metrics

Cardinality Analysis of metrics is an enabler to reducing costs and focusing observability on the necessary metrics to identify and investigate where issues are occurring in your services. Grafana has added cardinality management dashboards to Grafana Cloud and Grafana Enterprise Metrics to make this an easy and fast process. In this introductory session, we will provide an overview of the Grafana Cardinality features and offer a set of discovery questions to help you with the process.

When code fails in production - and how to fix it in minutes

All developers know that moment when, after we’ve reviewed everything and completed our sanity tests, our code somehow doesn’t work in production. The question is, how quickly and easily can we discover the problem so that we can continue delivering features faster and more confidently to our users?

Database monitoring - Do's and Don'ts

Enterprises evolve and transform into data-driven businesses, which take valuable insights from the data collected to grow and develop their business. This means that massive chunks of data are collected every second and companies search for ways to process it faster, secure and more accurately. The more data is processed, the smarter is the organisation and the greater potential for data-driven decisions is available.

Amazon AppStream 2.0 vs Amazon WorkSpaces

Amazon offers two different services, Amazon WorkSpaces and AppStream 2.0, that can be used to deliver apps remotely either streamed via a browser or within a virtual workspace (desktop). Once you understand the differences between the two services the choice is usually clear from the use case. It is in fact common for organizations to use a mixture of both.

Why the Russia-Ukraine war means companies should monitor website changes closely

The Russian invasion of Ukraine has caused seismic ripples around the globe. Economically, businesses with investments and holdings in Russia have faced unprecedented public pressure to withdraw. Various industries have received supply chain impacts as economic sanctions continue to expand in the West.

Understanding Kubernetes pod pending problems

Kubernetes pod pending is ubiquitous in every cluster, even in different levels of maturity. If you ask any random DevOps engineer using Kubernetes to identify the most common error that torments their nightmares, a deployment with pending pods is near the top of their list (maybe only second to CrashLoopBackOff). Trying to push an update and seeing it stuck can make DevOps nervous.

Flowmon 12 - Workflows and UX Improvements

We released Flowmon 12 at the end of February. The new and updated functionality in the latest version has been well received by existing users, and has prompted many new organizations to consider the product. The headline changes in Flowmon 12 are in the blog post Progress Flowmon 12 – Ultimate Enabler of Your Multi-cloud Strategy.

How I Stream: Solving Tricky Security Challenges and Optimizing Splunk

Greetings Criblers! We’re introducing a new series by the Criblers, for the Criblers called How I Stream! Each month (maybe more frequently–you, too can be featured, share your insights here), we’ll share a quick profile from one of our community GOATS (Greatest of All Time Streamers) sharing use cases and lessons learned. Our first guest goes by Hobbit in the community.

A Primer on Cloud Architecture

The cloud is growing more and more popular each day. We are in an era where there is a prominent trend of companies migrating from traditional on-premise systems to more reliable and fast cloud-based systems. However, the conversion is still not rampant on a large scale, primarily due to the lack of awareness in the up-and-coming businesses about the cloud’s fundamentals. However, the cloud has proven to be a sound and worthy option time and time again.

Application Performance Monitoring vs Application Performance Observability

You’ve likely heard the term Observability lately. There’s a fundamental change taking place in the Monitoring space, and Observability is behind it. Observability itself is a broad topic, so in this post we’ll talk about what it means to move from Application Performance Monitoring to Application Performance Observability.

How AgriTech used IoT and Grafana to help industrial hemp farmers hit a new production high

In 2019, Alexander Mann was working in the microchip industry, putting in 12-hour shifts that left no time for him to tend to his large vegetable garden. “I started looking for ways that I could remotely water or check on my plants,” he says. Products that could help him were either too costly for a hobby gardener or required special internet connections, so Mann decided to learn about IoT and create his own setup.

Outage Alert: Top 5 Outages of Q1 2022

By now it’s no secret that system outages and website downtime are more widespread and frequent than ever. In fact, the frequency of outages jumped 9% in just the first week of 2022. This can be attributed to a rapid increase in traffic and reliance on tech infrastructures – resulting in connectivity, server, and other technical issues that are alternately unforeseen and unavoidable.

What Does Observability Mean For You?

The late 1990s were a crazy time in the technology industry. Apple converted a blueberry into a computer, Google still had a “new search engine” smell, and while Y2K loomed over our heads Napster was showing everyone how bad Metallica’s music sounded. Meanwhile, in a garage in Tulsa, Oklahoma, brothers Donald and David Yonce launched a network monitoring software company and named it SolarWinds.

Understanding data analysis and online activity with David Belson

Network AF host and Kentik CEO Avi Freedman discusses data analysis and trends in understanding online activity with David Belson. David is Cloudflare's Head of Data Insight, where he helps the organization communicate information about the internet such as outages and changes in protocol adoption.

Monitoring critical systems at Roblox with Grafana and Grafana Agent

It’s like an obby unto itself: With roughly 100 million global active users, how does an observability team monitor operations and troubleshoot problems that pop up across more than 10,000 servers? In this session, you’ll get an inside look at how the Roblox team evolved their observability platform to combine a multitude of data sources, from low-level hardware health to high-level performance metrics. Grafana Agent plays a key role by replacing many special-purpose pipelines with a single, easy-to-manage tool. Roblox’s observability team has met growing demand to provide actionable insights to hundreds of engineers, covering dozens of data sources and thousands of Grafana dashboards, which all help keep its infrastructure running and ready for play.

Monitoring Hyper-V and ESXi-what should you do?

Over the years, I found that building out monitoring scripts and using them properly has proven to be a challenge. When I look back at my internal IT days using platforms like Whatsupgold, PRTG, or N-central, the question always remained the same: how can I monitor efficiently and get alerts that matter? In this blog post, I thought I’d tackle something that is a challenge for a lot of people: monitoring Hypervisors.

Kubernetes: Tips, Tricks, Pitfalls, and More

If you’re involved in IT, you’ve likely come across the word “Kubernetes.” It’s a Greek word that means “boat.” It’s one of the most exciting developments in cloud-native hosting in years. Kubernetes has unlocked a new universe of reliability, scalability, and observability, changing how organizations behave and redefining what’s possible. But what exactly is it?

Network Alerts-Monitoring and Notifications

When it comes to IT, you can’t do anything with an asset you can’t see. When it comes to your networking, monitoring offers the eyeballs to know what is going on. But IT and network pros don’t spend all day staring at a dashboard waiting for something to happen. Like your local police department, they rely on notifications of trouble. Instead of 911 calls, IT depends on network alerts.

Accelerate incident investigations with Log Anomaly Detection

Modern DevOps teams that run dynamic, ephemeral environments (e.g., serverless) often struggle to keep up with the ever-increasing volume of logs, making it even more difficult to ensure that engineers can effectively troubleshoot incidents. During an incident, the trial-and-error process of finding and confirming which logs are relevant to your investigation can be time consuming and laborious. This results in employee frustration, degraded performance for customers, and lost revenue.

How Site24x7 automates your serverless workflows using the AWS Lambda function URL integration

AWS Lambda is a compute service that lets you run code on high-availability infrastructure without any server provisioning. You can perform tasks such as maintenance of servers and operating systems, capacity provisioning, automatic scaling, and code logging and monitoring. When using AWS Lambda, you are just responsible for your code. Lambda manages the resources needed to run your code, like CPU, network infrastructure, and memory.

Working with Cloudflare to mitigate DDoS attacks

The rolling thunder of cybersecurity warnings has built to a crescendo this year. According to HelpNetSecurity, cybercriminals launched over 9.75 million DDoS attacks in 2022. The Cloudflare Attack Trends 2022 Q1 Report published yesterday shows an alarming increase in application-layer DDoS attacks. And our own Doug Madory has been sharing analysis on the impact of cyberattacks, too.

Build a Cypress tests infrastructure for serverless applications

When a startup is in its very early stages, rapid iteration and dynamism are at the top of its priorities. The ability to do so, while maintaining a stable and high-quality product, is a big challenge facing the R&D group. We want to release features as quickly as possible, but this rapid velocity cane cause conflicts when writing in-depth, comprehensive tests.

Lights, Camera, Action: Introducing The Fellowship of the Stream

Last week, an article from SiliconAngle came out detailing the challenges facing cybersecurity professionals. Companies are in desperate need of solutions to deal with cloud-native applications that exist in fast-paced environments. The security and IT teams monitoring these applications need scalable and flexible solutions that drive actionable insights. That’s why we built Cribl Stream.

Tackling Your Carbon Footprint with the Sustainability Toolkit for Splunk

Simple questions can be overwhelming and not knowing the answer after a mouse click is no longer an option: Sustainability is top of mind for organizations across all verticals and Splunk can help with the power of data. Our upcoming Sustainability Toolkit based on the Splunk platform equips organizations with capabilities to gain deep insights into their carbon footprint and as such empowers them to take the necessary actions towards their carbon neutrality goals.

Managing Time Series Data in Industrial IoT

The industrial revolution was a watershed period in human history. The shift from piecemeal, cottage-industry work to mechanized manufacturing transformed the way humans work. Since the 18th century, successive waves of innovation, such as the assembly line and the computer, continued to alter and change the nature of manufacturing. Today, we find ourselves in the midst of another industrial transformation.

Use Service Design in Operations Management to Enhance Security

As an IT operations manager, you spend a lot of your time mitigating service outages and service level risks. You worked diligently to get the right people, products, processes, and partners in place to meet your goals. You managed to ensure continued uptime. You’ve reduced the number of tickets and the cost per ticket. And for your efforts, you’re rewarded with managing your company’s cybersecurity program. The problem? You’re not a security specialist.

Log Observer Connect: Leverage the power of Splunk Enterprise data in Splunk Observability Cloud

With Splunk Log Observer Connect it’s easier than ever to correlate all of your metric, trace and log data to deliver better customer experiences! Available now for existing Splunk Enterprise and Splunk Observability Customers. Log Observer Connect lets observability users explore the data they’re already sending to their existing Splunk instances with Splunk Log Observer’s intuitive no-code interface integrated in Splunk Observability, for faster troubleshooting, root-cause analysis and better cross-team collaboration.

What's New with Flowmon ADS 12 and outlook ahead?

Flowmon ADS 12 is here and ready to alert you faster than traditional NDR tools that only rely on blacklist with known malicious domains. With its latest release, Flowmon ADS 12 brings detection of anomalous behavior on the network and allows users to easily tune the detection. And that’s not all! Flowmon ADS 12 also enables you to lower the count of false-positives and decreases the time required to investigate detected events.

New in Grafana Enterprise Metrics 2.0: Cross-tenant alerting and recording rules

On the heels of launching our new open source TSDB Grafana Mimir, we are excited to introduce Grafana Enterprise Metrics 2.0. GEM 2.0 is built on top of Grafana Mimir 2.0, our easy-to-operate, high-scale database which we’ve shown can handle upwards of 1 billion active series. That means that GEM 2.0 inherits all of the highlights of Mimir, including easy deployment, native multi-tenancy, high availability, durable long-term metrics storage, and exceptional query performance.

Expert Series: Large MSP Was First to Upgrade to DX UIM 20.4

A NoSQL database provides a mechanism for data storage and retrieval, without using the tabular relations associated with relational databases. Originally referred to as "non-SQL" or "non-relational" databases, NoSQL databases are increasingly used in big data and real-time web application environments. NoSQL systems are also sometimes called “Not only SQL” to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

Upgrading vs Migrating - How Atlassian Scaled Developer Efficiency in One Week

With millions of monthly active users across Bitbucket and Jira, Atlassian relied heavily on the real-time telemetry they got from their open source instance of Sentry. However, because it took the equivalent of two full-time engineers to maintain their out-of-date instance, Atlassian started to explore paths to upgrade or migrate to SaaS, as the risk of things breaking increased with each new release.

How to Use OpenTelemetry to Troubleshoot a Serverless Environment with StackState

Losing track of communication between applications or code has become a problem with the tech world growing more into supporting Serverless cloud architectures and allowing the developer to maintain, upgrade and update these services. One might say that services and code are becoming more loosely coupled, allowing code to run and execute in silos. Let's take an AWS Lambda function as an example.

How to Surface Relevant Performance Data

Your applications most likely consist of multiple components. These components could be written in different languages, with each individually instrumented with Sentry’s SDK. The goal of our tracing solution is to make sure developers get a full picture of the data captured within their tech stack. Tracing allows you to follow a request from the frontend all the way to your backend application and back.

Benefits of using an Application Performance Monitoring solution in Banks & How can banks optimize their performance using an APM tool?

Monitoring business-critical applications is just as vital as deploying them since their performance directly impacts your ability to meet business goals. When it comes to the banking industry, ensuring superior application performance of applications is crucial to enabling smooth banking operations. But it’s also complex since banks deploy interconnected applications across on-premise and cloud.

NoSQL Database Monitoring with DX UIM

A NoSQL database provides a mechanism for data storage and retrieval, without using the tabular relations associated with relational databases. Originally referred to as "non-SQL" or "non-relational" databases, NoSQL databases are increasingly used in big data and real-time web application environments. NoSQL systems are also sometimes called “Not only SQL” to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

Debug JavaScript in Mobile Safari (iOS) in 8 easy steps

Debugging JavaScript is an inevitable part of web development, and not the nicest one. Debugging jobs always seem to pop up when you’re already buried under piles of work, and a teammate pings you about an issue that was overlooked in testing and has been causing frustrations since your last release. That’s why it helps to be prepared for that eventuality, and equipped with the developer tools to help you debug faster.

Human-centric IOT: Why it can be a top priority for business success and branding

I was intrigued by two recent IoT related survey and reports. They emphasize use cases about efficiencies and return on investments but, surprisingly, there is no mention of people-safety nor productivity. I think it is imperative to place people at the heart of the IoT universe. Today, people related use cases have been greatly understated. We will be doing a great injustice if we continue to ignore the human angle.

Why Icinga?

We have decided to make some short educational videos about Icinga, and today we will be releasing the first one: Why Icinga? In these videos we want to explain the Whys and Whats and Hows around Icinga in a way that is accessible to anyone who is interested. So Why do you want to use Icinga? Monitoring is the foundation you want to build your infrastrastructure on.

Slack's New Metrics Storage Engine Challenges Prometheus

Metrics storage engines must be specially engineered to accommodate the quirks of metrics time-series data. Prometheus is probably the most popular metrics storage engine today, powering numerous services including our own Logz.io Infrastructure Monitoring. But Prometheus was not enough for Slack given their web-scale operation. They set out to design a new storage engine that can yield 10x more write throughput, and 3x more read throughput than Prometheus! In February 2022 Suman Karumuri, Sr.

Elasticsearch Release: Roundup of Change in Version 8.1.0

Elastic released a major version of its platform on February 10, 2022. Version 8.0.0 is the latest major version. There has already been a new minor release to version 8.1.0, and there are anticipated minor and patch releases coming as Elastic rolls out new features and fixes. The latest release is the first significant revision since April 2019, when version 7.0.0 was generally available. Users can find a complete list of release notes on the Elastic website.

What is remote network monitoring?

Remote network monitoring is a technical specialty that was born almost at the same time as networks themselves. Since then, many strategies have emerged when it comes to monitoring network elements. In this article we will talk about the current techniques based on SNMP polling and network statistic collection through Netflow, and we will also mention outdated systems such as RMON. Most techniques are purpose-oriented, so they are especially useful.

Network Nickels and Dimes: Government IT on a Budget

State and local governments have always had to get a little creative when it comes to efficiently managing the government IT budget—they have to ensure they can stretch those dollars as far as they’ll go. However, inefficient network management practices are only adding to the challenges of local government IT modernization, and sucking the life out of their information technology budget.

How to Time Your Data Collection with Telegraf Agent Settings

Many Telegraf and InfluxDB users often spend a lot of time finding that perfect balance of getting the data they want in while not writing in too much data that they have to deal with unnecessary data in their database. This blog post will give you a better understanding of Telegraf’s data collection settings and help you fine-tune your configuration.

Five Key Monitoring Capabilities for Top Payment Gateway Performance in E-commerce Applications

Payment gateway outages and performance issues have a disruptive effect on your business. When customers cannot complete a transaction, it leaves them frustrated and anxious. Even if it is not an outright outage, customers are wary of a flaky payment experience. They are often reluctant to retry the transaction for fear of being charged twice. This results in abandoned purchases and lost revenue.

Kubernetes 1.24 - What's new?

Kubernetes 1.24 is about to be released, and it comes packed with novelties! Where do we begin? Update: Kubernetes 1.24 release date has been moved to May 3rd(from April 19th). This release brings 46 enhancements, on par with the 45 in Kubernetes 1.23, and the 56 in Kubernetes 1.22. Of those 46 enhancements, 13 are graduating to Stable, 14 are existing features that keep improving, 13 are completely new, and 6 are deprecated features.

The Bird is the Word: Getting Up and Running Fast on Humio, by Crowdstrike

I’ve been in the log data analytics space for years, and I have loved seeing the technology and methodologies change and evolve. One of my favorite changes has been the emergence of index-less solutions, and Humio has a great solution here. If you haven’t heard of Humio, you should check out their index-less log management solution for yourself (free up to 16 GB/day too).

Where are Monitoring Tools Headed? Help from Innovation Insight for Observability by Gartner

Enterprises are getting fed-up with their existing system monitoring tools. Despite decades of investments in monitoring tools, many businesses fail to notice a problem in their digital services until a customer calls to complain about it. So, it’s no surprise that businesses are looking for better solutions, and this has sparked an increasing interest in observability, according to Gartner in its updated report, “Innovation Insight for Observability.”

Observability for K-12 and higher education: Top 4 challenges and how monitoring can help

K-12 and higher education institutions experienced massive changes in 2020 with the shift to online learning. New challenges arose, such as an increase in cybersecurity threats, students and staff requiring 24/7 access to their computers, and the need to update and improve infrastructure and applications IT infrastructure monitoring allows K-12 and higher education institutions to face common technology challenges both reactively and proactively.

OnCallogy Sessions

Being on call is challenging. It’s signing up to be operating complex services in a totally interruptible manner, at all hours of the day or night, with limited context. It’s therefore critical to have proper on-call on-boarding procedures, offer continuous training sessions, and continuously improve documentation. We also need to make sure people feel safe by providing ways to reduce their stress, and make room for questions to surface all sorts of uncertainties around our operations.

OpenTelemetry Collector - What Is It?

Before we dive into the Collector, let’s cover the components that make up the OpenTelemetry project. If you missed it, our post What is OpenTelemetry gives a high level introduction to OpenTelemetry and the key components of OpenTelemetry project: The OpenTelemetry collector is optional when using a SaaS service like Scout. Even so, knowing what the Collector can do and when to use it is helpful to understand.

Hot Ones Interview with an Engineering Manager

In this video, we do a spin off of one of our favorite shows Hot Ones, where we combine an interview with Web Platform Team Technical Project Manager Steven and our Web Platform Team EM Vladan, and hot sauces. Meet both of them in this in this video, and watch them suffer as they go from one hot sauce level to the next just like they do in Hot ones.

Improved visibility for Issue Alerts

As most developers know, alert-fatigue is real, and the last thing you want is another feed of notifications. Read on to learn how the new Alert Details view helps you filter out notifications for Issue Alerts you don’t care about, and how it can help you focus on the ones that do. When you get started with Sentry, you’ll likely create an Alert for every Issue.

How-To Obtain Cloud Pricing at Scale

In 2020, we started an internal project to price our data center systems to understand what they might cost to run in the AWS or Azure cloud environments. We initially used some of the online pricing tools available from these vendors, but they were slow and difficult to use in bulk. You can find the public cloud tools here: AWS and Azure. We were forced to add individual systems one at a time and pick options for each because of the nature of these tools.

How OpenTelemetry Works Under the Hood in JavaScript

OpenTelemetry (OTel) is an open source selection of tools, SDKs and APIs, that allows developers to collect and export traces, metrics and logs. It’s the second-most active project in the CNCF, and is emerging as the industry standard for system observability and distributed tracing across cloud-native and distributed architectures. OTel supports multiple languages, like JavaScript, Java, Python, Go, Ruby and C++.

Bolster network monitoring with root cause analysis

If you own an enterprise, then you know the value of a healthy network and how seriously detrimental a network outage is to your business. But network issues are inevitable. The heavy dependence on networks to meet the ever-changing client and internal usage requirements takes a heavy toll on the network. This makes networks vulnerable to common problems such as unplanned, sudden downtime, high resource utilization, and hardware malfunctioning.

Troubleshoot faster with improved Datadog Events

Datadog Events provides customers with a data feed about their infrastructure and applications, delivering an up-to-the-minute history of activity such as code deployments, configuration changes, and triggered alerts. Events collects data from Datadog products and over 100 third-party integrations—including Docker, Jenkins, Kubernetes, Sentry, AWS CloudWatch, and Azure Service Health.

Debugging jsoup Java Code in Production Using Lightrun

Scraping websites built for modern browsers is far more challenging than it was a decade ago. jsoup is a convenient API that makes scraping websites trivial via DOM traversal, CSS Selectors, JQuery-Like methods and more. But it isn’t without its caveat. Every scraping API is a ticking time bomb.

New in Grafana Loki 2.5: Faster queries, more log sources, so long S3 rate limits, and more!

I’m very excited to tell you all about the latest Grafana Loki installment, 2.5! A huge amount of work, nearly 500 PRs, has gone into Loki between v2.4 and now. The major themes for this release are improved performance, continuing ease of operations, and more ways to ingest your logs. I usually find myself the most excited about performance improvements, so let’s start there.

Welcome to the Auvik family, MetaGeek!

We have some exciting news. Today I’m thrilled to welcome a new member to the Auvik family. The MetaGeek team are experts in wireless and through their 15+ year history, MetaGeek has led the way in creating tools to help network administrators and wireless engineers build, monitor, and troubleshoot Wi-Fi networks. At Auvik, we have an ambitious and aggressive roadmap to deliver a remarkable technology management experience.

Website Performance Monitoring: What Are You Really Paying for?

Have you found yourself confused by the plans and pricing around website performance monitoring? Are you using the features you’re paying for? Finding the right service involves many moving parts. Very often, that journey begins with a quest to find a simple up or down monitoring tool for external verification. It’s only after you take that first step into the market that you begin to notice additional features and expanded functionality.

Modernizing Network Monitoring with InfluxDB and Telegraf

This article was originally published in The New Stack. As the technology landscape continues to change at a rapid pace, enterprise companies are in a rush to catch up and modernize their legacy IT and network infrastructure to capture the benefits of newly developed tools and best practices. By adopting modern DevOps techniques, they can reduce their operational costs, increase the reliability of their services and improve the overall speed and agility at which their IT teams are able to move.

New Honeycomb Whitepaper on Frontend Observability

Big news: I can finally stop pointing anyone who asks about Honeycomb’s story for frontend observability to Emily’s blog post from 2017 on “Instrumenting Page Loads with Honeycomb.” (It was a great post, don’t get me wrong, but I don’t think any of us knew it would bear such weight for so long.) I am ecstatic to announce that we have released a new whitepaper called “Getting Started With Honeycomb Client-Side Instrumentation for Browser Applications,” wri

The Fellowship of the Stream: Unlock Radical Levels of Choice & Control with Observability Data

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure. You’ll finally achieve full control of your data, empowering you to choose how to treat your data to best support your business goals..

Machine learning for infrastructure monitoring and troubleshooting, explained

Learn exactly what machine learning is and how it takes part in the observability, monitoring, and troubleshooting industry. We'll also cover the future of ML trends within the industry, and how Netdata is staying at the forefront of machine learning development.

How We Run Successful Beta Tests with Error Reporting

We’ve recently completed a large beta test for our new product here at Testmo. We build a test management tool, so most of our users are professional software testers. As you can imagine, our customers are a rather critical group of users when it comes to software quality. We’ve learned some important lessons about running a large beta test and we want to share how we benefited from Sentry error reporting to identify, find, and fix issues quickly.

Continuous Performance Regression Testing for CI/CD

Developers strive to produce efficient code. Many times, developers will add code to their repositories and test it to make sure it works, but they are forgetting one very important step: benchmarking! Benchmarking allows developers to see the performance impact on their code output. If properly integrated into a CI/CD pipeline, it could prevent catastrophic drops in performance before any code is shipped/deployed at all.

Grafana Labs EMEA Virtual Meetup - April 2022

Join Grafanistas Jessica Brown, Marcus Olsson, and Mat Ryer as they present at the April 2022 Grafana Labs Virtual EMEA Meetup. Here's a brief look at their talks: “Extend your Grafana experience through plugins” By default, Grafana comes with an impressive set of different visualizations and data source integrations. But that’s not all! Many more panels and data sources are available as plugins, built by the Grafana community. In this session, you’ll learn about the new in-app plugin catalog and a few nifty plugins to get you started using plugins!

How we scaled our new Prometheus TSDB Grafana Mimir to 1 billion active series

Last week, we announced our new open source TSDB, Grafana Mimir, which lets you scale your metrics monitoring to 1 billion active series and beyond. The announcement was greeted with a lot of excitement and interest – and some questions too. Namely: Really, 1 billion? Yes, really!

Building Oh Dear's new design: Project setup

We are currently rebuilding the Oh Dear website and application frontend. The goal is to go next level in aesthetics and user experience. The Oh Dear redesign will be launched later this year. In this post, you'll read more about the project setup and tools used. This is the first blog post of a series that will share progress and the knowledge gained along the way.

Dashboard Fridays: Sample SQLFacts dashboard

The ugly truth when working with SQL Server is that there usually isn’t much money left over for monitoring once you pay for the license. Most DBAs accept this reality by implementing a custom data collection. But what happens when all you want is to visualize that data quickly? Tune into our latest Dashboard Fridays video to see this SQLFacts dashboard built to visualize the powerful suite of free tools.

Achieve End-to-End Network Visibility for SASE

Enterprise networks now extend out over the internet due to cloud migrations, SaaS applications and hybrid work approaches. To support this, organizations are adopting SASE solutions to extend their traditional secure web gateways into the cloud. Having complete network visibility from end-to-end across these distributed environments is critical in safeguarding user experience. AppNeta and DX NetOps from Broadcom Software help you achieve network visibility anywhere, assuring the security and performance users can rely on.

Network AF, Episode 13: Talking networking and PR with Ilissa Miller

In this episode of the Network AF podcast, Avi Freedman connects with Ilissa Miller, network whisperer and PR industry veteran. Ilissa and her team translate technology into business terms by helping clients understand the value and functionality of a company. Avi asks Ilissa how she got into the field, her biggest takeaways that helped launch her own business and what’s important in today’s networking world.

How to upgrade to SCOM 2022 step-by-step

The long awaited SCOM 2022 is here! If you’d like to know what improvements to expect, check out our blog on SCOM 2022 key highlights. If you’re as excited as we are and would like to proceed with your upgrade, keep reading, this is the detailed step-by-step guide you are looking for. We will take you through the whole process, covering: Stick with us, and by the end of this blog you’ll have successfully upgraded to SCOM 2022!

100% office-based to remote-first: SquaredUp's journey

Back when everyone was talking about the ‘new normal’ and what the future of work would look like post-pandemic, SquaredUp was having a rethink too. SquaredUp has always been a 100%, office-based company with a heart for bringing people together. Richard Benwell, CEO, and the executive team had created a company that is truly fun, collaborative, and integrated, largely based on their drive to keep people connected and build an incredible community – in the office.

The DEX Super-Team: 5 Innovative IT Roles for the Experience-Driven Future

Digital Employee Experience (DEX) is no longer a novel concept. Today’s workplaces are increasingly digital – and a majority of business leaders recognize DEX as a top priority. But many leaders are coming to a new realization: managing DEX is too complex of an undertaking to remain one priority on an IT team’s long checklist. The modern IT department must evolve – and improving experience requires a dedicated set of DEX focused IT roles.

Healthcare Innovation Awards Q&A : Making Electronic Health Records Run More Efficiently

My career started with a healthcare company called U.S. Surgical; we provided health systems with stapling instruments for the operating room. So my familiarity with healthcare started from within the hospital.

API Testing: An Introduction

Digital businesses are making a radical change in the way they build and deliver software. Gone are the days of apps that rely solely on in-house tools. Rather, today’s apps are increasingly dependent on external APIs and third-party app providers (which, in turn, are reliant on other APIs and apps). While this type of modularity allows for product flexibility and rapid development, it can be difficult to address any issues that arise.

Q1/2022 Release Roundup: Announcing VictoriaMetrics v1.76 & More

Since the beginning of the year, our team has been busy working with the open source community of VictoriaMetrics users and our customers as we continuously enhance and improve Vicky! Thanks to everyone who has contributed with their feedback, questions, feature requests, bug reports, etc.

The Future of Monitoring: Turning Unknown Unknowns into Known Knowns

To ascertain risk, national security and intelligence professionals have long used concepts such as known knowns, known unknowns, and unknown unknowns. The idea of unknown unknowns was created in 1955 by American psychologists Joseph Luft (1916–2014) and Harrington Ingham (1916–1995). This concept continues to be used today in risk assessments and is applicable to technology. The unknown unknowns are the threats and potential problems that remain invisible until their impact manifests.

OpenTelemetry and Jaeger | Key concepts, features, and differences

OpenTelemetry and Jaeger are both open-source projects under Cloud Native Computing Foundation. In this article, let us understand the key concepts involved in both projects, their features, and their differences. OpenTelemetry is a vendor-agnostic instrumentation library. It provides a set of tools, APIs, and SDKs to create and manage telemetry data(logs, metrics, and traces). Jaeger is an open-source tool focused on distributed tracing of requests in a microservice architecture.

Monitor your gRPC APIs with Datadog Synthetic Monitoring

gRPC is an open source Remote Procedure Call (RPC) framework developed by Google and released in 2016. Although gRPC is still relatively new, large organizations are adopting it in increasing numbers to build APIs to connect complex microservice meshes that use disparate languages and frameworks. gRPC-based APIs can process requests up to seven times faster than REST APIs, and they also allow customers to easily implement SSL authentication, load balancing, and tracing via plug-in libraries.

Debug issues and automate remediation with Shoreline and Datadog

Shoreline is an incident response automation service that enables DevOps engineers and site reliability engineers (SREs) to quickly debug and remediate issues at scale and develop automated routines for incident management. Using Shoreline’s proprietary Op language, customers can run debug commands across all their hosts simultaneously and then deploy custom scripts via Actions to trigger automated remediations.

How to Measure VoIP Quality & MOS Score (Mean Opinion Score)

VoIP Quality is highly reliant on network performance, which means that many network problems like packet loss, latency, and jitter can cause high levels of VoIP degradation. To avoid embarrassing choppy voice calls, or lagginess during your next client meeting, we’re running you through how to measure VoIP Quality with MOS Score (Mean Opinion Score).

SigNoz - Open-source alternative to New Relic

If you're looking for an open-source alternative to New Relic, then you're at the right place. SigNoz is a perfect open-source alternative to New Relic. SigNoz provides a unified UI for both metrics and traces with advanced tagging and filtering capabilities. In today's digital economy, more and more companies are shifting to cloud-native and microservice architecture to support global scale and distributed teams.

Write an Ecommerce Customer Journey

The ‘Add to Cart’ user journey is one of the most common ecommerce customer journeys used by website owners. For obvious reasons, ensuring potential customers can search, select a product, and checkout successfully is a vital function. Unlike ecommerce customer journey map tools, Synthetic user journey monitoring can assist in creating a positive customer experience by tracking performance metrics, and sending alerts if issues do occur.

4 Common Email Problems Businesses Face (And How To Fix Them)

Staying on top of deadlines, busy schedules, team organization, and customer needs are challenging. If you add some of the common email problems we discuss in this article, your list of challenges can rapidly grow. While it's helpful to have an IT department to deal with technical issues, it's a good idea to know how to fix some of them yourself, so let's get to business.

Introduction of the ansible-collection-icinga

Ansible is a commonly known tool to easily automate deployments in infrastructures, its configuration is based on YAML and is able to scale in big environments. Icinga 2 provides its own secure agent to monitor hosts, high available satellite zones and monitoring configuration. To manage this monitoring environment we introduce you to the ansible-collection-icinga, this collection can install Icinga 2 server, configure monitoring and deploy Icinga 2 agents in your infrastructure.

How to Identify Memory Leaks

You may not be familiar with thinking about the memory usage of your applications as a software developer. Memory is plentiful and usually relatively fast in today's development world. Likely, the programming language you're using doesn't require you to allocate or free memory on your own. However, this does not mean you are safe from memory leaks. Memory leaks can occur in any application written in any language. Sure, older or "near to the metal" languages like C or C++ have more of them.

Top 5 Takeaways From GDC 2022

The Game Developers Conference (GDC) is a yearly event that brings together leading brands in the gaming industry to talk about trends in development and showcase new features and releases. One of the cool things about the conference is that it’s an excellent opportunity for gaming enthusiasts, aspiring game developers, and industry vendors to connect, network, learn and celebrate the achievements of the industry.

Improve Performance in Your iOS Applications - Part 1

Since inception back in 2007, Apple and the iOS ecosystem has drastically improved with a plethora of changes and new features added (or removed) over time. At the same time, the size of the applications and the data has consistently grown. This has its own impact on the powerhouse in your hand - the iOS device. Developers strive to design the best experience, often compromising speed and performance.

WP Engine Uses InfluxDB to Power Observability on a Global Scale

The WP Engine platform provides brands the solutions they need to create remarkable sites and apps on WordPress that drive their business forward faster. It hosts over 1.5 million websites, serving over 175,000 customers in more than 150 different countries, and processes 5.2 billion requests per day. In total, WP Engine’s footprint comprises about 8 percent of the entire web.

What's New in StatusGator, April 2022

In the first 3 months of 2022, we’ve deployed 242 updates to StatusGator. As our growth has accelerated, we have become the platform relied upon by thousands to gain visibility into their vendors’ service statuses. Today, we’re announcing several big new features you may have seen trickle in over the last few weeks.

5 Common Step Function Issues

Step Functions, the serverless finite state machine service from AWS. With DynamoDB, Lambda, and API Gateway, it forms the core of serverless AWS services. If you have tasks with multiple steps and you want to ensure they will get executed in the proper order, Step Functions is your service of choice. It offers direct integrations with many AWS services, so you don’t need to use Lambda Functions as glue. This can improve the performance of your state machine and lower its costs.

You want to know whether a dangerous stranger has your passwords?

We already live in a post-apocalyptic future that has nothing to envy to great franchises like Mad Max or Blade Runner. Proof of this are pollution, pandemics and the fact that your most intimate secrets can be violated because your most impenetrable slogans are in a database of leaked passwords. Do you feel that pinch? It’s fear and cruel reality knocking at your door at the same time. But, well, let’s stand by. Just as Mel Gibson or Harrison Ford would do in their sci-fi plots.

Ask Miss O11y: Pls ELI5 TLAs like PRO, SRE, and SLOs!

Dear Acronymically, I'll try to answer without using a single (new) acronym! First things first—"PRO" refers to our Pro plan, rather than being an acronym in and of itself. Honeycomb Pro is our cost-effective offering for professionals like you who are running a few production workloads! And we're hoping that folks will get even more benefit now that they have access to our SLO feature!

Context propagation in distributed tracing: Beyond "Hello World" examples

In our line of work, it’s not uncommon to encounter customers that have custom workflows or architectures that may not always support distributed tracing mechanisms. In development in general, we often have our beliefs on what should happen in theory and we rely on Hello-World examples, only to be surprised by what happens in reality. Our customers too sometimes have to build creative solutions to deal with unique situations – which then motivates us to build creative solutions as well.

Network Monitoring Handbook: Your Complete Guide to Network Monitoring and Management

This guide has been designed to document all the critical elements in designing and managing an efficient, effective and secure network. This handbook gives you complete insights into various aspects of network monitoring and the best practices involved in creating an effective and efficient network.

Troubleshoot end-to-end tests with CI Visibility and RUM

Adding automated testing to your CI/CD pipelines can help you ensure that you deploy changes safely. But as you continue to shift left, the number and complexity of tests are likely to increase, making them slower to run and harder to troubleshoot. Datadog CI Visibility can help you track the performance of your CI/CD pipelines and tests—and now you can also use Real User Monitoring (RUM) to monitor end-to-end (E2E) Cypress tests.

Grafana Labs announces $240 million Series D round led by GIC and welcomes new investor J.P. Morgan

Today, I’m happy to share that Grafana Labs has closed a $240 million Series D round of investment. This is a major milestone for us that goes well beyond the number of dollars. First, we are grateful to our internal team (affectionately called Grafanistas) and our community of users and customers. Without them, we would not have been given this opportunity.

InfluxData Wins Comparably Awards for Best Company Outlook and Best Engineering Team

Annual awards recognize companies based on workplace culture ratings from current employees SAN FRANCISCO, April 6, 2022 – InfluxData, creator of the leading time series platform InfluxDB, today announced it has won awards for Best Company Outlook and Best Engineering Team from Comparably, a leading workplace culture and corporate brand reputation platform.

Making Go errors play nice with Sentry

Here at incident.io, we provide a Slack-based incident response tool. The product is powered by a monolithic Go backend service, serving an API that powers Slack interactions, serves an API for our web dashboard, and runs background jobs that help run our customers incidents. Incidents are high-stakes, and we want to know when something has gone wrong. One of the tools we use is Sentry, which is where our Go backend send its errors.

Apple outages: A week Apple would like to forget

We all know and associate Apple as a reliable and innovative tech giant which sets endless new trends for its industry. It’s arguably the one company that many aspire to be like and emulate with the massive global growth they’ve seen over the past decade. But even this goliath tech company has its bad days and unfortunately for Apple, it had a whole week of bad luck that tested the patience of its customers across the world.

Why the end-user experience is crucial to achieving business success

This last decade has seen abundant changes in the way businesses operate. Digital transformation is no longer a new term, however, is still relevant. It has resulted in virtual establishments superseding the popularity of brick and mortar setups, becoming the prime choice of business operations.

AWS Lambda: function URL is live!

AWS announced the release of the Lambda Function URLs feature today. In this post, I describe what it is, how it works, and how you can benefit from it. API Gateway and AWS Lambda is a potent combination and lets you build REST APIs without having to worry about the underlying infrastructure. API Gateway offers many powerful features out-of-the-box, including: Understandably you pay a premium for these features.

Observability for State and Local Government: Top 3 Challenges and How Monitoring Can Help

State and local governments require a high level of performance and reliability for the millions of people accessing their web applications every day. The talent shortage and lack of resources to train their staff on how to use new technologies prevents them from drawing valuable insights from modern IT environments. IT infrastructure monitoring allows state and local governments to stay ahead of technology trends and face common challenges with intelligence and transparency.

What's New with Flowmon 12?

Flowmon 12 is here and ready for hybrid/multi-cloud deployments! With its latest release, Flowmon 12 brings the support of native flow logs from Google Cloud, and Azure and improves support for AWS flow logs. And that’s not all! Flowmon 12 brings fully redesigned reporting for better readability, command line tool to assess quality of the flow data and improvements in workflow and user experience based on customer feedback.

Crossed 6k+ GitHub stars, enabled S3, better dashboards and webhooks - SigNal 11

Our dashboards enable engineering teams to take quick decisions. So, a good design is critical. We have been busy fine-tuning our dashboards to make them more user-friendly. Welcome to SigNal 11 - our monthly product updates where we update you on what we’ve been up to. Last month, we crossed 6,000+ GitHub stars, 700+ slack community members, jazzed up our graphs, and much more. Let’s see what humans at SigNoz have been up to in the month of March 2022.

What is a Digital Workplace?

The digital workplace is a phenomenon that has grown in leaps and bounds in the wake of the COVID-19 pandemic. The adoption of technologies has increased and changed all aspects of our lives today, including how we work. However, today’s workplace is very different from the traditional workplace. For one thing, its definition is no longer limited to the physical sense.

Introducing Status Overrides

StatusGator is a status page aggregator: We bring together the status of all of your vendors into a single status page you can share with your team. Now, you can override the status of any service on your page to reflect exactly what you want it to. There are two opposite use cases for this feature: When a service is experiencing an outage and it’s not reflected on their status page. Or when an outage posted on their page is not affecting you and your team.

New in StatusGator: Status Page Messages

StatusGator status pages are a unique way to consolidate the status of all of your vendors on a single page. Reduce support ticket volume by publishing your status page to your team or users. Now you can publish a message to the top of your status page for even more effective communication. Use this space for maintenance notifications, highlighting critical outages, or explaining your page to your users.

Data Sovereignty: Everything You Need to Know

“What’s data sovereignty? Is it the same thing as data privacy? What about data sovereignty vs data residency?” These are questions that plague many business owners, especially when you consider the cost of non-compliance. But achieving compliance can be challenging when you aren’t sure what rules you should be complying with.

Who Owns Observability In Enterprises?

It’s common sense. When a logstorm hits, you don’t want to be left scrambling to find the one engineer from each team in your organization that actually understands the logging system – then spending even more time mapping the logging format of each team with the formats of every other team, all before you can begin to respond to the incident at hand. It’s a model that simply won’t scale.

How to Import/Export Orion Custom HTML Widgets

Advanced Orion Platform users are familiar with the power of the Custom HTML widget, but getting started can be difficult. Thankfully, you can download pre-existing widgets directly from THWACK to get you started. After you've crafted some of your own widgets, you can return the love and share yours with the community.

Deliver exception messages through Slack and Webhooks for fast resolution

Building new applications is a lot of fun, but troubleshooting and fixing the crashes that can come with app development is not. While many organizations are fast adopting the DevOps model, there are still some legacy frameworks where developers and operations teams are separate. Developers build and submit apps to their ops team, who in turn deploy and maintain the production stack. A common issue that arises due to this workflow is the time it takes to find and resolve crashes.

Anatomy of an OTT traffic surge: 2022 Men's NCAA Basketball Championship

Last night, Kansas topped the University of North Carolina in a thrilling come-from-behind victory to win their fourth championship in men’s college basketball. It was also notable in how viewers saw the game. Instead of being aired on CBS (network television), the game was carried on TBS requiring viewers to have either a cable TV package or use a streaming service to watch the game. Here’s what we saw.

What is Cloud-Native Monitoring?

Cloud and cloud-based technologies are at their peak today. More and more organizations are turning to intelligent architectures and systems to deploy their apps. And they are not wrong—the cloud has proven to be a great way of performance optimization and cost-cutting. However, there are issues to address with this growing trend. One of those is monitoring. Monitoring is a vital part of application maintenance.

How to use WebSockets to visualize real-time IoT data in Grafana

Mike Szczys is a Developer Relations Engineer at Golioth. His deep love of microcontrollers began in the early 2000s, growing from the desire to make more of the BEAM robotics he was building. When he’s not reading data sheets, he’s busy as an orchestra musician in Madison, Wisconsin. At Golioth, a commercial IoT development platform, we love using the power of Grafana to easily visualize data from IoT installations where tens, hundreds, or even thousands of devices are reporting back.

Honeycomb Pro: Now With Metrics & SLOs

Honeycomb Pro is about to get even better. Starting today, all Pro accounts have access to Honeycomb Metrics and two Service Level Objectives (SLOs), previously only available to Enterprise accounts. Full disclosure: Later this month, the price of Pro plans will be increasing as well (see below). However, existing Pro customers (including those that sign up before the new pricing goes into effect) will be able to enjoy the new capabilities at existing prices for a full year, until April 2023!

How to Conduct a Server Monitoring Software Comparison

Let’s imagine that you really need a car, and you head off to a car dealership to buy one for yourself. Let’s also assume this is your first car, and you don’t know much about cars. On getting to the dealership, you’ll need to choose which type of car you want. Now, you’re not very knowledgeable about the different drive mechanisms of a car, whether it’s a gasoline-powered or electric vehicle.

The basics: How to use the StatusCake API

We offer an API that provides direct access to features the platform offer, with each feature providing a set of endpoints to perform operations on resources associated with your account. The StatusCake control panel offers plenty of useful visualisations and alerting systems so you can be in touch with your data, but sometimes we may have use-cases where we would rather leverage the API so in this blog post we’re going to see how we can make use of these endpoints using C#.

What to Watch on EKS - a Guide to Kubernetes Monitoring on AWS

It’s impossible to ignore AWS as a major player in the public cloud space. With $13.5billion in revenue in the first quarter of 2021 alone, Amazon’s biggest earner is ubiquitous in the technology world. Its success can be attributed to the wide variety of services available, which are rapidly developed to match industry trends and requirements.

Sponsored Post

How Network Vulnerabilities Can Impact Operational Resilience

Operational resilience remains the top priority for those in financial services. From the U.S. Federal Reserve's study into "Sound Practices to Strengthen Operational Resilience" and "Principles of Operational Resilience" from the Basel Committee to the Bank of England's upcoming rule changes for financial organizations in the UK, the intent is to create financial services institutions that are geared towards managing digital disruption. The goal is that financial service businesses can continue providing mission-critical services in the event of disruptions such as IT glitches, outages, and cyber-attacks.

Sponsored Post

Show character with Blameless Postmortems (part one)

This is Part 1 of a two-part series on Blameless Postmortems. Today, we'll discuss why blameless postmortems are so important and their implications for your team; the second part will go into detail on how to set them up as a process and make them successful. Somebody wise may have once told you that how we handle adversity shows our character. Being able to acknowledge and admit mistakes is the first step towards learning - it's a key part of success both in personal relationships and in large companies.

Source-Side Queueing: You Down With UDP?

Source-side queueing is a fancy way of saying: You can configure Cribl products to make sure data isn’t lost in the event of downstream backpressure, again. Those familiar with Cribl Stream might be aware of destination queuing or persistent queuing, wherein Stream can write data to the local disk in the event of an issue reaching the destination. Maybe your SIEM is suffering from disk I/O latency. Maybe there is a DNS problem with your load balancer (Hint: It’s always DNS).

Pricing comparison for Managed Prometheus

Observability has become a critical part of many companies and their business. So did requirements for the systems which collect and store business-critical metrics. Monitoring systems need to be reliable, scalable, fast, and preferably cost-effective. Such features of any monitoring system never come for free or out of the box – you need people, a team of professionals who can build and manage it.

Why Application Performance Monitoring (APM) Tool Is Important?

Modern applications must deliver not only value but also round-the-clock availability, quick replies, and real-time problem-solving in today's digital economy. Since all businesses rely on software applications, their performance is one of their primary worries and frustrations, especially if their applications are the business itself. This is where Application Performance Monitoring Tool enters the scene.

Learn how our Chief Troublemaker transformed infrastructure troubleshooting.

When no available tool could help Costa Tsaousis identify his own infrastructure problem, he invented one that could. Netdata’s founder, CEO, and Chief Troublemaker tells how his invention went viral, how the Netdata Way transforms monitoring and troubleshooting, and how he plans to keep Netdata free, forever.

Logging Best Practices - MDC, Ingestion and Scale

I don’t care about religious wars over “which logger is the best”. They all have their issues. Having said that, the worst logger is probably the one built “in-house”… So yes, they suck, but re-inventing the wheel is probably far worse. Let’s discuss making these loggers suck less with proper usage guidelines that range from the obvious to subtle. Hopefully, you can use this post as the basis of your company’s standard for logging best practices.

Application observability made easier for Compute Engine

When IT operators and architects begin their journey with Google Cloud, Day 0 observability needs tend to focus on infrastructure and aim to address questions about resource needs, a plan for scaling, and similar considerations. During this phase, developers and DevOps engineers also make a plan for how to get deep observability into the performance of third-party and open-source applications running on their Compute Engine VMs.

What is IaC?

I recently had a wonderful opportunity to contribute to the Computer Weekly Developer Network (CWDN) ultimate series on “Infrastructure as Code” that collected articles and overviews from vendors and experts operating in the IaC space to form a formidable reference on all aspects of IaC. My contributions were to offer some insight into our architecture that has been designed to monitor infrastructure that has been deployed as code automatically and without tedious manual configuration.

Leading provider of digital, cloud and advisory services reduced time-to-resolve issues by 1400% with VirtualMetric

A leading provider of innovative digital and cloud services, part of the Microsoft ecosystem, chose VirtualMetric to get critical insights and complete visibility over their cloud environment. With 38,000 professionals in 24 countries, the company is specialized in cloud and application services, managed services, analytics, AI and helps companies to implement the latest technologies to various industries, leveraging the Microsoft platform.

Public Dashboards Are Now Status Pages

One year ago we launched what would become our most popular feature yet: a page you could publish with your name and logo that aggregated the status of all of your cloud vendors. We called it a “public dashboard” because it did not require a StatusGator account to view, and it published your StatusGator dashboard for your entire team. We’ve now renamed this feature a “status page” and made it even more accessible inside of StatusGator. Why the change? Read on.

A primer to understanding observability

The one certainty you will find in IT, developer, and SRE roles is that things always change! One hot topic in DevOps communities is observability. A long word, you may be wondering what it really means and how you can add it to your skillset. Here’s a quick primer to get you going on your path to observability.

Spring4Shell: Responding to Zero-Day Threats with the Right Data

On March 30th, 2022, rumors began to swirl around a GitHub commit from a researcher containing proof of concept (POC) exploit code. The exploit targeted a zero-day in the Spring Core module of the Spring Framework, and was quickly confirmed against specific versions of Spring Core with JDK 9 and above. Anything running Tomcat is most at risk given the POC was based on Tomcat apps. This threat posture will evolve over time as new vectors and payloads are discovered and distributed.

The Netdata Way of Troubleshooting

Together with you, our fabulous community, Netdata is changing the way the world thinks of high fidelity monitoring – and we are gaining momentum. Our chief troublemaker and CEO, Costa Tsaousis, is the pioneer and architect of this revolution that’s brewing in the monitoring and troubleshooting space. Watch him explain the Netdata way of troubleshooting.

Sponsored Post

Who Can Benefit From Kafka Monitoring Services?

The benefits of Apache Kafka monitoring services are widely appreciated across the industry. Organizations that rely heavily on Apache Kafka for their data streaming needs can derive great benefit from the use of Kafka monitoring services. By keeping track of various different key performance indicators (KPIs) related to Kafka, such as message throughput and latency, organizations can ensure that their Kafka-based data pipelines are running as smoothly and efficiently as possible.

Why does a business need middleware management?

In order to answer this question, it is necessary to define what middleware is. It is generally accepted that middleware is software that sits between existing systems and processes and helps to smooth integration and interaction between them. Because of the sheer range of middleware and the scope of what it can do and affect in terms of business infrastructure, it is vital that it can be successfully monitored and managed.

Sponsored Post

Optimize Device Refresh with Digital Experience Monitoring

Last year, during our Work Anywhere webinar, Forrester analyst Andrew Hewitt said that 51% of the technology teams focused on providing the right technology to enhance the employee experience. By 2021, the entire IT organization saw the need to support a flexible digital workplace with appropriate devices, wherever employees were located. According to computer economics, hardware such as PCs, laptops, and mobile devices age every four years. Therefore, device refresh becomes an important planning activity for support teams in keeping their knowledge workers productive and reducing the total cost of ownership (TCO).

Sponsored Post

AIOps & Observability- Which One Should Enterprises Focus on First?

Organizations today are pressured to keep their IT applications and infrastructure up and running and minimize their downtime. While this has always been a critical goal, it’s become harder to achieve with modern architectures, such as microservices, containerization, edge computing, hybrid-cloud deployments and the newer development methods such as agile DevOps techniques.

Honeycomb Service Level Objectives (SLOs)

In this 3-minute video, you’ll see how Honeycomb’s actionable SLOs can help you get to the source of an issue faster. Using a real production SLO (latency per-event) as an example, we walk you through what exhaustion time alerts are and how to configure them, as well as how to use a heatmap to investigate and take action when things happen.

Work-Life Integration Gaining Traction in the Post-Pandemic Workplace

In the realm of employee experience and company culture, there’s no term more ubiquitous than work-life balance. It’s what every job candidate wants and every business promises. But in the years since work-life balance became a ubiquitous selling point, the workplace has changed drastically, and a new concept is starting to supplant work-life balance: work-life integration. The pandemic and coinciding shifts to remote and hybrid work have spurred a deluge of new approaches to working.

DevOps.JS Workshop: Tracking errors and slowdowns across JS applications using Sentry

Join Simon Zhong, Sentry Sales Engineer, as he goes through setting up Sentry step-by-step to get visibility into our frontend and backend. Once integrated, he will track and triage errors + transactions surfaced by Sentry from our services to understand why/where/how errors and slowdowns occurred within our application code. This workshop took place live at DevOps.JS Conference on March 21, 2022.

Pivoting InfluxDB Series Data into Relational Layouts

Most developers are more familiar with the shape of relational data than the shape of time series data. InfluxDB stores time series data in such a way to maximize its effectiveness. As developers get more familiar with time series data, it may be helpful to view time series data in a relational layout. Fortunately, Flux language makes it easy to present your time series data the way that's useful for you.

A Primer for Monitor as Code: How to use Splunk Observability Cloud with Terraform

Managing the complexities of today’s cloud native infrastructure has resulted in the increased need for observability. As cloud adoption continues to grow, the need to deliver a better customer experience, scale efficiently and increase momentum on innovation has never been more important. For many organizations to carry out these principles, two technologies are helping organizations deliver on these goals faster: Monitoring-as-Code and Infrastructure-as-Code.

Splunk Embarks on AWS Graviton Journey with Amazon EC2 Im4gn and Is4gen Instances

We are excited to announce that Splunk Cloud Platform is moving to next generation AWS Graviton2 processor hardware to help enable enhanced performance for customers who choose AWS as a provider. This begins a phased transition of our Splunk Cloud Platform indexer tier in a move that will help Splunk operate more efficiently and provide customers with the cutting edge in processing technology.

Anywhere Operations

Anywhere operations has moved to the forefront of the Infrastructure and Operations (I&O) agenda. Not to be confused with Work From Home, anywhere operations refers to an IT operating model that supports customers and enables employees anywhere. It also manages the deployment of business services across a distributed infrastructure. Accelerated by the COVID-19 pandemic and thanks to the rise of mobile, cloud and social, the move to online business has cemented the value of flexible infrastructure and the inherent weaknesses of traditional, structured processes.