Operations | Monitoring | ITSM | DevOps | Cloud

August 2022

NiCE Active 365 Management Pack 4.1 released

Microsoft 365 services help companies worldwide improve business and revenue by providing best in class digital workspace experience. The NiCE Active 365 Management Pack complements this by advanced M365 monitoring such as full Teams Call analysis integrated into Microsoft SCOM. Advanced monitoring and analytics help you reveal unwanted micro-events influencing the health and performance of the system and its users.

Equip any user to monitor Kubernetes with the Overview Page

Many organizations use Kubernetes to orchestrate their containerized applications. But because Kubernetes is complex, application developers may take some time to ramp up on the intricacies of monitoring a Kubernetes environment. This means that teams often need to create internal documentation and offer hands-on training to bridge the knowledge gap.

How Ekopak Manages Water Treatment Data with InfluxDB

A wide variety of industrial processes rely on water, and before it can be used, it needs to be treated to remove dissolved substances. Minerals have to be filtered out so they don’t form scales on equipment as water is heated and cooled, and bacteria needs to be removed in cases involving human health. Ekopak is a Belgian company working to make water treatment more sustainable by using less water and energy where possible.

Monitor Ruby Application Performance with Magic Dashboards

Application teams must understand what their customer experience is like. This is true not only from a general perspective (in terms of usability and responsiveness) but also on a day-to-day, minute-by-minute basis. In particular, when you work with distributed systems, errors are inevitable. Site traffic fluctuates throughout the day, and any one of a system’s dependencies could also encounter an issue at any time.

Debunking 4 Cybersecurity Myths About Machine Learning

Machine learning has infiltrated the world of security tooling over the last five years. That’s part of a broader shift in the overall software market, where seemingly every product is claiming to have some level of machine learning. You almost have to if you want your product to be considered a modern software solution. This is particularly true in the security industry, where snake oil salesmen are very pervasive and vendors typically aren’t asked to vigorously defend their claims.

Observability: You Can't Buy It, You Must Build It!

In Part 1 of this series, we talked about the origins of observability and why you need it. In this blog (Part 2), we will cover exactly what observability is, what it isn’t, and how to get started. Before we can dive into how to approach observability, let’s get one thing clear: You can’t buy a one-size-fits-all observability solution.

Kubernetes Management Pack for SCOM Released by OpsLogix

OpsLogix is excited to announce the initial release of our new product, the Kubernetes Management Pack for SCOM. This product provides comprehensive monitoring for Kubernetes clusters and gives SCOM administrators a single pane of glass from which to monitor all their Kubernetes resources. With this management pack, you can be sure that your Kubernetes environment is always running smoothly and efficiently!

4 Key Reasons Service Virtualization is a Must for Agile Teams

Service virtualization is not new. In fact, the concept and technology were established 20 years ago. At its core, service virtualization offers the ability to simulate behavior, data, and performance characteristics of applications and services. Through service virtualization, teams can ensure they have an on-demand environment to support their testing needs.

11 Best Redis Monitoring Tools [2022 Review]

Redis is an open-sourced, BSD 3 licensed, highly efficient in-memory data store that can be easily used as a distributed, in-memory key-value store, cache, or message broker. It is known for being extremely fast, reliable, and supporting a wide variety of data structures, making it a very versatile tool widely adopted across the industry. Redis was architectured with speed in mind and is designed in a way that it keeps all the data in memory.

Obtaining and Storing Time Series Data with Python

In this tutorial we’ll learn how to use Python to get time series data from the OpenWeatherMap API and convert it to a Pandas DataFrame. Next we’ll write that data to InfluxDB, a time-series data platform, with the InfluxDB Python Client. We’ll convert the JSON response from our API call to a Pandas DataFrame because I find that that’s the easiest way to write data to InfluxDB.

Best Remote System Monitoring Solutions in 2022

Companies must effectively monitor their assets and networks in today's competitive setting, get the most significant result, and react swiftly to problems. However, such a situation is unusual with companies that continue to run in a traditional, isolated setting. These companies frequently don't have precise asset performance tracking procedures.

OpenTelemetry Logs, OpenTelemetry Go, and the Road Ahead

We’ve got a lot of OpenTelemetry-flavored honey to send your way, ranging from OpenTelemetry SDK distribution updates to protocol support. We now support OpenTelemetry logs, released a new SDK distribution for OpenTelemetry Go, and have some updates around OpenTelemetry + Honeycomb to share. Let’s see what all the buzz is about this time! 🐝🐝

How adding Kubernetes label selectors caused an outage in Grafana Cloud Logs - and how we resolved it

Hello, I’m Callum. I work on Grafana Loki, including the hosted Grafana Cloud Logs offering. Grafana Loki is a distributed multi-tenant system for storing log data — ingestion, querying, all that fun stuff. It also powers Grafana Cloud Logs.

The Enemy of Efficiency is the Lack of Automation

The Parkview Health team understands that the key to providing value to the business is through intelligent automation. Intelligent monitoring and AIOps capabilities can be combined to derive the much-needed context, an ingredient that has proven to be paramount in enabling intelligent automation to immediately address a broad range of IT issues. Innovative organizations, like Parkview Health, are focusing their monitoring and analytics approach to develop this elusive context are rapidly becoming able to.

Microservice Application Monitoring Tips and Tricks

Microservices have grown to become one of the most optimal alternatives to monoliths. However, just building your app and releasing it to the public isn’t everything. Monitoring microservices is as important as building and releasing them. You need to maintain it to resolve issues that may occur and also introduce new features from time to time.

Splunk APM Expands Code Profiling Capabilities with Several New GAs

We’re excited to share new Splunk capabilities to help measure how code performance impacts your services. Splunk APM’s AlwaysOn Profiling now supports.NET and Node.js applications for CPU profiling, and Java applications for memory profiling. AlwaysOn Profiling gives app developers and service owners code level visibility into resource bottlenecks by continuously profiling service performance, at minimal overhead.

Cloud Governance vs. Cloud Management

The cloud has revolutionized the way businesses operate. It allows organizations to access computing resources and data storage over the internet instead of relying on on-premises servers and infrastructure. While this flexibility is one of the main benefits of using the cloud, it can also create security and compliance challenges for organizations. That’s where cloud governance comes in. Cloud governance is a framework for managing and regulating how data is accessed and used in the cloud.

Measuring Cloud Instance Costs for FinOps

Achieving cost savings is one of the main drivers for cloud adoption. But for most companies, controlling cloud spend is much more challenging than anticipated. In a recent survey, 94% of IT decision makers report they are overspending in the cloud. Our own survey on cloud costs revealed 90% of executives say better cloud cost management and cost reduction is a top priority.

10 Ways MSPs Can Punch Above Their Weight

Sometimes we associate fast-growing networks with power and success, but being small doesn’t mean you can’t compete—you just need to be smarter at creating a sustainable competitive advantage. Here are 10 things you can do as a small managed service provider (MSP) to punch above your weight. A documented process is a consistent process. And once you document how a process is carried out, you can then look for ways it can be improved.

Maintaining High-Velocity Feature Development, Without Sacrificing Quality

As in any high-growth environment, expanding your suite of products and capabilities can contribute to a growing backlog of errors, and challenges prioritizing them… a scenario not lost on the team at Airtable, a connected apps platform that more than 300k organizations, including 80% of the Fortune 100, rely on to connect their teams, data, and workflows. To support organizations like Amazon and IBM, Airtable ships new features and updates through multiple deployments a week.

Digital Dexterity Enhances Remote and Hybrid Work

The leap to remote work happened nearly overnight with the pandemic, and it’s clear that hybrid work and digital dexterity is the way of the future. As a result of this shift in the way we work, companies have invested in new technologies to adapt to the growing remote world, with 69% of companies planning to increase their investment in digital tools even more in 2022. Additionally, over 90% of businesses strive to implement hybrid work environments.

How to monitor Couchbase with Google Cloud Ops

You can now easily monitor couchbase metrics and logs in Google Cloud. All of our logging and monitoring Google Cloud contributions are available through the Google Ops Agent GitHub repository. You can check it out here! The Google Ops Agent uses the built-in Prometheus exporter and receiver to monitor Couchbase sources running Couchbase 7.0. You can find documentation on the Prometheus exporter in the Couchbase documentation.

How to get base URL in ASP.NET Core

You typically don't and shouldn't need to know where a web app is deployed. At least not from within the code of the web app itself. I keep seeing questions related to this, though. There are a range of reasons why this can still be relevant like if you want to generate and output an absolute URL in an MVC controller or Razor page. Here's a blog post about how to get the base URL in ASP.NET Core. Let's rewind a bit before we start looking into the code. All websites are deployed somewhere.

Icinga for Windows v1.10.0 - The Next Level

After some drawbacks we are very happy today to finally release Icinga for Windows v1.10.0! The past weeks we have spent a lot of time to polish this release, test many different aspects and optimize the user experience in general. Please make sure to read the upgrading docs carefully, before upgrading to v1.10.0. Otherwise, your installation might not work as expected, or Icinga for Windows might not load anymore.

Fire up new Browser checks with our new templates

Let’s admit it: end-to-end testing is a technical challenge. How do you make features testable? What testing framework should you use? When should you run your test suite? There are so many things to learn and consider. At Checkly, we want to ease end-to-end monitoring so that you can focus on shipping excellent software instead of figuring out how you monitor and test it. But before getting into our latest feature addition, let me answer the above questions.

Rackspace IT Tool Consolidation with Zenoss

Like most growing enterprises, Rackspace accumulated over 20 point IT monitoring tools through company acquisitions and one-off use cases. This can lead to operational inefficiencies and excess licensing costs. As part of Rackspace's best practices, they assessed the use of these tools' capabilities and decided to get rid of the tool bloat. For over 15 years Zenoss has remained a central part of Rackspace's IT monitoring strategy due to our scaling capabilities and 400+ integrations.

Out Of Office Monitoring Tips for the End of Summer

Employees are returning from vacation, the weather is (finally) cooling down and summer is coming to a close. It can feel a bit overwhelming returning to work and getting back into the swing of things. When it comes to website monitoring, there are simple steps you can take to make sure your transition is as smooth as possible. Take advantage of a few ounces of pre-vacation prevention to save on pounds of post-holiday cure.

Why ITOps should care about monitoring application performance

Managing application performance has evolved –requiring ITOps teams to become more involved in understanding application performance and users of APM solutions. Keeping an eye on what affects business outcomes has expanded outside the infrastructure to include the users dependent on applications. Modern ITOps teams can no longer ignore the application performance, status, user experience, and overall health of the application alongside the supporting infrastructure.

Introducing Unified Observability Platform by VMware Aria Operations for Applications

At VMware, we are on a mission to build a comprehensive, extensible, and intelligent monitoring and observability platform to help businesses run seamlessly. Over the past few years, we have evolved our platform to deliver invaluable end-to-end observability across applications and infrastructure.

OpsLogix SCOM Connector for Microsoft Teams: Major New Updates Released!

Since the release of our SCOM Connector for Microsoft Teams, we've been blown away by the positive feedback and uptake from customers. The connector has allowed businesses to connect their Operations Manager environment with Microsoft Teams, giving them a single pane of glass view and anywhere access to their IT operations. We're happy to announce that we've just released a new version of the connector, which introduces some significant updates to the functionality!

How to: Deadman Check to Alert on Service Outage

Whether you’re using InfluxDB to record massive amounts of historical stock market data to analyze the current economic trends or simply to monitor the number of times the lights in your smart home turn on and off to cut down on wasted electricity, a sudden shock or delay in the flow of incoming data can be detrimental to your operation in the majority of scenarios.

Anomaly Detection and AIOps - Your On-Call Assistant for Intelligent Alerting and Root Cause Analysis

In this blog, we examine how anomaly detection helps by setting up healthy alerts and providing efficient root cause analysis. Anomaly detection, part of AIOps, guides your attention to the places and times where remarkable things occurred. It reduces information overload, thereby speeding up RCA investigation.

FinOps: Measuring Allocatable Cloud Spend

Cloud services are the number one source of unexpected overspending for companies today. As a result, cloud financial management is a major focus for most organizations. But how do you track the success of cloud efficiency? Full allocation of multicloud costs is a critical component for understanding your actual cloud services usage, establishing cloud cost management ownership, and creating accurate budgets and forecasts at the line of business, project, application and even team levels.

What is FSLogix?

FSLogix is a profile management solution used to apply personalization to user sessions for application and desktop virtualization technologies such as Citrix and Microsoft Azure AVD (Azure Virtual Desktop) and enable “roaming profiles”. It used to be common to copy a profile to and from the network when a user signs in and out of a remote environment. Because user profiles can often be large, sign in and sign out times often became unacceptable.

How Does Unified Monitoring Reduce MTTR?

Give your team a break from the blame game. When IT issues occur, you don't want to spend precious time trying to figure out whose domain is the problem instead of actually RESOLVING the issue. A single, holistic view leads to much faster root-cause identification and reduced downtime from issues in your dynamic environment. The Parkview Health team is able to take these insights and automate a rapid resolution when issues occur.

Aggregations and Chains: Performance Measurement in Cribl Stream Pipelines

In this post, we’ll discuss two functions in the Cribl Stream arsenal: The Aggregations function, which allows you to perform stats and metrics collection in flight, and the Chain function allows you to call one Pipeline from within another. The event flow will continue when the Chained Pipeline returns. To demonstrate their use, we’ll answer this question: How long did it take for Cribl to process events using your pipeline?

Enterprise vs. SMB IT: What's the Difference?

Enterprises have always had more money to spend on IT than small-and-medium businesses (SMBs). This has traditionally resulted in a disparity in the technology and services available to each. However, with the rise of cloud-based services and the increasing need for remote working, this gap is starting to close. While there are still some differences between enterprise IT management and SMB IT management, the two are becoming more similar.

Splunk Synthetic Monitoring in Splunk Observability Cloud - Product Demo

Splunk Synthetic Monitoring is now available in Splunk Observability Cloud allowing IT and engineering teams to proactively detect issues impacting web and API performance and end-user experience and troubleshoot and remediate issues in the web browser, the server, or a third-party dependency—all within a single UI. Watch this quick demo to learn more.

Using Splunk Observability Cloud to Monitor Splunk RUM

As a principal engineer on the Splunk Real User Monitoring (RUM) team who is responsible for measuring and monitoring our service-level agreements (SLAs) and service-level objectives (SLOs), I depend on observability to measure, visualize and troubleshoot our services. Our key SLA is to guarantee that our services are available and accessible 99.9% of the time.

Middleware 101

In computer science, systems are typically divided into two categories: software and hardware. However, there is an additional layer in between, referred to as middleware, which is a software pipeline—an operation, a process, or an application between the operating system and the end user. This article aims to define middleware and reflect on its necessity, as well as address controversies about when and where it applies.

Monitor your Microsoft Azure VMs featuring Ampere Altra Arm-based CPUs with Datadog

As organizations continue to expand their cloud footprint, managing costs without risking application performance is a priority. Because of this, Arm processors have become popular for their efficient, cost-effective processing power. Microsoft Azure’s new series of Azure Virtual Machines are powered by Ampere Altra Arm-based processors, which provide excellent price performance for scale-out and cloud-native workloads.

Observability: A Concept That Goes Back to the Founding of the Internet

With its market size reaching more than $2 billion in 2020, you’d think that a universal definition of the term observability would have emerged by now. But it turns out that a clear definition of a term or industry isn’t necessarily a prerequisite for the rapid growth of its market size — just ask everyone at your next dinner party to define blockchain for you and see how many different answers you get!

A Quick Guide to Observability vs APM vs Monitoring

The terms observability, APM, and monitoring are often used interchangeably. However, these solutions can actually be quite different depending on the overall needs of the business. In this video, SolarWinds Principal Product Marketing Manager Pete Di Stefano explains the differences between each of these terms and how using intelligence to integrate insights from APM and monitoring into a centralized observability solution is key to gaining a more comprehensive understanding of your entire IT ecosystem.

Get started with Grafana OnCall and Terraform

Managing on-call schedules and escalation chains, especially across many teams, can get cumbersome and error prone. This can be especially difficult without as-code workflows. Here on the Grafana OnCall team, we’re focused on making Grafana OnCall as easy to use as possible. We want to make it easier to reduce errors with your on-call schedules, create schedule and escalation templates quickly, and fit on-call management into your existing as-code patterns.

Elastic Observability helps monitor your Azure workloads on the new Arm-based VMs

Microsoft Azure’s recently launched new Azure Virtual Machines (VMs) feature the Ampere Altra Arm-based processor. These new VMs are engineered to efficiently run horizontally scalable workloads such as web servers, application servers, and open source databases. They deliver excellent price-performance and represent an important addition to Microsoft Azure's portfolio of instance types.

Snowflake DB: Observing a Snowflake From Cloud to Chart

You’ve probably heard something like this before: “It’s a managed service! We don’t need to worry about anything!” But when it comes to your production workloads, database monitoring is imperative. With the new Snowflake Dashboards and Detectors in the Splunk Observability Content Contributors repository you can start seeing the details of individual Snowflakes.

IT monitoring reduces the workload of retailers by about 30%

Food retailers reduce the workload accumulated by their IT areas by almost 30% thanks to monitoring. Controlling data and extensive information from the whole company, by controlling, supervising and ordering everything through the same system, allows to reduce the times of action in the face of possible errors and failures, improves resource management and organization and increases the effectiveness of the business activity. In addition, monitoring saves costs.

Monitoring outages in your third-party services with LogSnag

LogSnag is the new kid on the block, but we bet that it will become relevant in this space soon. Today we will show you how to monitor outages and get alerts for your third-party services within LogSnag. What is LogSnag? Here's the intro, but you can learn more at LogSnag official website.

Creating Homebrew Formulas with GoReleaser

We chose to use GoReleaser with our distro of the OpenTelemetry Collector in order to simplify how we build and support many operating systems and architectures. It allows us to build targeting a matrix of GOOS and GOARCH targets as well as automate creating a wide range of deliverables. Ones we have utilized are building tarballs, nfpm packages, docker images, and Homebrew formula.

The SRE's Quick Guide to Kubectl Logs

Logs are key to monitoring the performance of your applications. Kubernetes offers a command line tool for interacting with the control plane of a Kubernetes cluster called Kubectl. This tool allows debugging, monitoring, and, most importantly, logging capabilities. There are many great tools for SREs. However, Kubernetes supports Site Reliability Engineering principles through its capacity to standardize the definition, architecture, and orchestration of containerized applications.

Monitor Akamai Datastream 2 with Datadog

Akamai is one of the world’s largest CDN solution providers, helping companies greatly accelerate the secure delivery of content to their users all across the globe. Akamai provides this content delivery through its Intelligent Edge Platform, which is made up of hundreds of thousands of edge servers distributed around the planet.

Network Blind Spots Are Endangering Your Business

Network blind spots are the things you can’t see and don’t know about. They’re dangerous. Just like the blind spots on your car, network blind spots can set you up for deadly crashes. Problems will seem to “come out of nowhere” and hit unexpectedly. Network blind spots create all kinds of serious problems. A major network crash is one. But other problems can pile up too.

The Top 15 Distributed Tracing Tools (Open Source & More)

As distributed environments become more complex, users often use distributed tracing tools to improve the visibility of issues evident within their traces. Throughout this post, we will examine some of the best open-source and other generally popular distributed tracing tools available today.

The unreasonable effectiveness of shipping every day

It's fairly common for folks in tech to dream of quitting their day job and working on their side projects. I find when you ask them how their projects are going, they tend to have 2-3 projects running at the same time, none of the projects are actually available for potential users to try out. The question they seem to ask me most is "you seem to complete your projects, how do you stay motivated?" My secret? It's a habit. I ship something every day.

Webinar Highlights: Improving Clinical Data Accuracy - How to Streamline a Data Pipeline Using Node.js, AWS and InfluxDB

Given the global health crises the world has faced over the last few years, the need for expeditious but accurate medical trials has never been more important. The faster clinical trial data is validated, the faster medicines get approved and treatments become available. Pinnacle 21’s customers are driving forces behind creating life-saving treatments.

Level Up Your DevOps Strategy with Intelligent Alerting

In the world of DevOps, every second counts. Problems need to be fixed fast, but with the intention that it’s done with a legitimate purpose for when something’s wrong. Continuous monitoring helps with automation and setting up the right kinds of alerts. If the system is going haywire, every moment not acting can make things worse. That’s why intelligent alerting is critical for enabling observability and continuous effective monitoring.
Sponsored Post

Data Value Gap - Data Observability and Data Fabric - Missing Piece of AI/AIOps

A pivotal inhibitor to mitigate these challenges is the Data Value Gap. Data automation and Data Fabric are emerging as key technologies to overcome these challenges. Learn from industry experts about these key technologies and how they create a lasting impact in enterprise IT.

How to choose the right hyperscaler for your SAP solution

If you’re in the market for a new SAP HyperScaler, you may have been wondering what exactly you should look for. Maybe you’re wondering which model is best for your needs. Don’t worry, we’ve got you covered. We’ve done all the research and found the best SAP HyperScaler on the market. You’ll also find a buying guide to help you navigate the process and find the right SAP HyperScale server for your needs.

Schneider Electric consolidates monitoring tools by 83% with LogicMonitor

Schneider Electric consolidated its monitoring tools by 83% after onboarding LogicMonitor's observability platform. Schneider Electric, one of the most sustainable companies on the planet, is always striving to make energy better. This is done with the help of unified observability.

To Dynamically Sample or Not to Dynamically Sample | Snack of the Week

In the monitoring industry there’s a complicated and frustrating conversation that persisted over the years: how do you deal with the enormous volume of data generated by instrumentation? On one side of the aisle, you will find a cohort of vendors and developers telling you that you have to sample data, followed immediately by another group telling you that sampling will ruin the accuracy of incident analysis. They’re both right.

ManageEngine OpManager MSP is now available in 9 more languages!

ManageEngine OpManager MSP is a reliable solution solely developed to help managed service providers with monitoring client networks and servers exhaustively. It has a wide range of out-of-the box features to help MSPs in their multi-client network monitoring journey. The ManageEngine team works hard to make products that cater to the needs of MSPs, which is why OpManager MSP undergoes regular improvements, because even the most feature-rich tools need tune-ups every now and then.

Stop Using TCP Health Checks for Kubernetes Applications

As developers, one of the most important things we can consider when designing and building applications is the ability to know if our application is running in an ideal operating condition, or said another way: the ability to know whether or not your application is healthy. This is particularly important when deploying your application to Kubernetes. Kubernetes has the concept of container probes that, when used, can help ensure the health and availability of your application.

How to Perform Geolocation Testing to Ensure Your Website Works Globally

So, you have launched a website intending to reach a worldwide audience? If you're running a business, this could be the first step to growing your brand. But is your website really ready to go global? After all, just because your website works for a user in the United States doesn't mean it will be accessible to a user in Japan. For one, not everyone speaks the same language. Does your website offer translation for users visiting from different global locations?

Configuring an OpenTelemetry Collector to connect to BindPlane OP

Bindplane OP is the first open source, vendor-agnostic, agent and pipeline management tool. It makes it easy to deploy, configure, and manage agents on thousands of sources, and ship metrics, logs, and traces to any destination. This blog shows you how to configure an existing OpenTelemetry Collector from any source to connect to Bindplane OP without needing to remove or reinstall the collector.

What is Kubernetes CrashLoopBackOff? And how to fix it

CrashLoopBackOff is a Kubernetes state representing a restart loop that is happening in a Pod: a container in the Pod is started, but crashes and is then restarted, over and over again. Kubernetes will wait an increasing back-off time between restarts to give you a chance to fix the error. As such, CrashLoopBackOff is not an error on itself, but indicates that there’s an error happening that prevents a Pod from starting properly.

An Introduction to PromQL: How to Write Simple Queries

PromQL is a flexible language designed to make it easy for users to perform ad-hoc queries against their data. By default, Prometheus indexes all of the fields in each metric except for source and target, which are not indexed by default. Prometheus is an open-source tool that lets you monitor Kubernetes clusters and applications. It collects data from monitoring targets by scraping metrics HTTP endpoints.

New in Grafana Alerting: File provisioning

We are happy to announce that file provisioning for Grafana Alerting has arrived in Grafana 9.1. This feature enables you to configure your whole alerting stack using files on disk, as you may already do with data sources or dashboards. The Terraform Grafana provider has also been updated to allow the provisioning of Grafana Alerting resources.

Intro to OEE

Efficient manufacturing is important for saving companies time, money, and energy. Making decisions based on data can improve efficiency, but there’s a lot of data to sort through. Manufacturing equipment contains many sensors, especially in the IIoT space. Overall Equipment Effectiveness (OEE) was first described by Seiichi Nakajima in the mid-twentieth century as part of his Total Productive Maintenance (TPM) method.

ROI Benefits of APM Tools

Software applications have become crucial for business growth and success in today's world. However, as businesses become increasingly competitive, the necessity to provide top-notch software applications is also increasing. Additionally, as organisations gravitate towards developing extensive, feature-rich applications, they are witnessing an increase in software complexity – that can often cause things to get out of hand very quickly.

Monitoring - Best Practices for Alerting - New Whitepaper!

When evaluating a monitoring product, it is essential you fully understand its alerting capabilities. Alerting is a responsive action triggered by a change in conditions within the system being monitored. Typically, an alert can be defined by a condition to trigger the alert and an action defining what that alert should do when the trigger condition occurs.

Building a Cost-Effective Full Observability Solution Around Open APIs and CNCF Projects

A full Observability stack has the goal of providing full centralized visibility to Development, Operations and Security teams into all of the Metrics, Logs and Traces generated by the applications and services under their domain. Many companies address these observability needs by buying a complete application performance management (APM) solution from a single vendor, like DataDog.

Monitor 70,000 Managed Resources in a Single Pane of Glass

Longtime customer Rackspace knows a thing or two about operating at scale. As a multicloud solution provider, Rackspace leverages Zenoss to help their customers maximize the benefits of modern cloud by delivering proven multicloud solutions across applications, data and security. With Zenoss, they’re able to monitor the health of over 70,000 managed resources across the globe.

Experience-Driven NetOps: What It Is and Why It Matters

Recent events have changed the world forever. For network operations (NetOps) teams, it means there’s no going back to the way things used to be. Virtually overnight, teams had to adapt to work-from-anywhere (WFA) models. The move to SaaS, cloud, and SD-WAN continue to accelerate—and in the process fundamentally alter the nature of network environments.

How to always match the correct DOM elements with Playwright's strict mode

Playwright's "page" object provides multiple methods to interact with DOM elements ("click", "fill", etc.). But these methods come with one downside: they're not strict. Learn in this video why strict mode is important and how locator method help to write better test cases.

Get Alerted to mission-critical issues directly in Slack

You’ve spoken and we’ve listened and as a result, the highly anticipated Slack integration for Raygun Alerting is here! Once integrated with Slack, you’ll receive customized error, crash, and performance alert notifications directly to the channel of your choosing. Reduce your Mean Time to Resolution (MTTR) and resolve issues in your code before your customers even notice.

Monitor your Edgecast CDN with Datadog

Edgecast is a global network platform that provides a content delivery network (CDN) and other solutions for edge computing, application security, and over-the-top video streaming. Using Edgecast’s JavaScript-based CDN, teams can improve web performance by caching static and dynamic content with low latency and minimal overhead.

How much does RPKI ROV reduce the propagation of invalid routes?

Earlier this year, Job Snijders and I published an analysis that estimated the proportion of internet traffic destined for BGP routes with ROAs. The conclusion was that the majority of internet traffic goes to routes covered by ROAs and are thus eligible for the protection that RPKI ROV offers. However, ROAs alone are useless if only a few networks are rejecting invalid routes.

Why APM distributed tracing is not enough for developers

Distributed tracing is a method of tracking requests as they propagate through a distributed system. A trace is built from spans. Each span represents an interaction, like an HTTP request, a DB query, a serverless function invocation, etc. A trace is essentially a tree of spans. Based on the collected span data, a distributed tracing platform can capture all the interactions between the different architectural components and tie them together with a trace ID.

New in Grafana 9.1: Service accounts are now GA

With the Grafana 8.5 release, we introduced the concept of service accounts. Now with the Grafana 9.1 release, we’re making service accounts generally available. This is a project that came out of technical necessity, but it has given us the opportunity to reflect on API tokens and machine-to-machine interaction across Grafana Labs.

How Zenoss Is Key to Efficient Operations in 4G and 5G

As mobile networks evolved to 4G, network elements transitioned from hardware-based, often with proprietary chassis, to virtualized network elements. Communication service providers often implemented such technologies through private cloud infrastructures based on different flavors of OpenStack. On the other hand, 5G introduced microservices-based deployments with a service-based architecture.

Telegraf Tips from InfluxDB University Experts

Telegraf is a very powerful open source plugin-based agent that gathers data from stacks, sensors, and systems and sends it to a database. It collects data from an input and sends it to an output, and gives you the option to transform data with aggregators and processors before it reaches its endpoint.

Authors' Cut-Actionable SLOs Based on What Matters Most

SLOs—or Service Level Objectives—can be pretty powerful. They provide a safety net that helps teams identify and fix issues before they reach unacceptable levels and degrade the user experience. But SLOs can also be intimidating. Here’s how a lot of teams feel about them: We know we want SLOs, we’re not sure how to really use them, and we don’t know how to debug SLO-based alerts. Don’t worry, we’ve got your answer—observability!

What's the Perfect IT Support Staff Ratio?

On a fairly regular basis, users will post to Reddit or Spiceworks or another IT forum to ask about the best support staff ratio of techs to users, and what other companies are finding sustainable. The question usually comes from an overworked tech who’s drowning in tickets and trying to understand what’s considered. The answers can get interesting. On Spiceworks, many people responded with details about the environments they support—and the range was notable.

Building Auvik Into Your MSP's SOP (Video)

Standard operating procedures—more commonly known as SOPs—are written, step-by-step instructions that describe how to perform a routine activity. While you can create an SOP for anything, an MSP SOP that outlines technical procedures is a well-known path to increasing efficiency in your business. Whether you’ve got existing MSP SOPs you’re interested in updating, or you’re looking for some basic steps to build brand-new SOPs around, you’ve come to the right place.

How to Generate Client Referrals as an MSP

I speak to a lot of MSPs every day, and I’m always asked for advice on meeting new prospects. Finding new clients is one of the biggest challenges they all face. “Have you tried asking your existing clients for referrals?” is my stock answer. And for good reason—prospects referred to you by an existing client are four times more likely to buy than any other opportunities. Here are a few of my tips on how to generate client referrals from existing clients.

Welcome to the Experience-Driven NOC

At Broadcom Software, we strive to build the most scalable operational software in the market. We work to ensure that our network monitoring software can track how constant network changes affect user experiences. As a global provider of networking equipment, we understand that there will always be changes happening on today’s enterprise networks, especially the internet. That’s why we build and refine our monitoring software to align with constant change.

Site Reliability Engineering, Site Reliability Engineers and SRE Practices: State of Adoption

Site reliability engineering (SRE) is what you get when you treat operations as if it’s a software problem. The mission of an SRE practice is to protect, provide for and progress the software and systems offered and managed by an organization with an ever-watchful eye on their availability, latency, performance and capacity.1.

Search all apps - understand the impact of an error across your entire tech stack

One of the most requested features for Crash Reporting has been the ability to perform a search across all of your applications rather than by one application at a time (the default behavior). It’s not hard to see why it’s a popular feature request - rather than manually performing the same search across many applications, it would be super handy to perform one search and understand the impact of the search results across all of your applications immediately.

Find the root cause faster with Datadog and Zebrium

When troubleshooting an incident, DevOps teams often get bogged down searching for errors and unexpected events in an ever-increasing volume of logs. The painstaking nature of this work can result in teams struggling to resolve issues before new incidents appear, potentially leading to an incident backlog, longer MTTR, and a degraded end-user experience.

How the right monitoring tools can bolster operational resilience in finance

The financial services industry has been under increasing pressure during the past several years to view operational resilience and their risk management postures as being symbiotic in the wake of rising operational incidents and increasingly frequent security threats.

Goats on the Road: Getting More Value From Observability Data

The best part of my job is talking with prospects and customers about their logging and data practices while explaining how Cribl focuses on getting more value from observability data. I love to talk about everything they are doing and hope to accomplish so I can get a sense of the end state. That is vital to developing solutions that provide overall value across the enterprise and not just a narrow tactical win with limited impact.

Get the Most Out of Serverless for Fleet Management Apps

You’ve probably seen Rush Hour, a logic puzzle where you have to slide cars and trucks out of the way to steer the red car towards the exit. In real life, when your customers are responsible for tracking hundreds or thousands of data points from dozens of valuable, mission-critical sensors, you’re tracking engine speed, network signal level, distance from the RF, and more—and not just through traffic but across continents.

What Does SASE Mean (for VPN)?

Break out your buzzword bingo cards, it’s time to talk about SASE or Secure Access Service Edge. Pronounced “sassy,” SASE has become one of the hottest topics in networking and security over the last three years. The basic idea is great: all your security and network services are on one platform. The problem comes when you get into the specifics. When does a set of services go from “not SASE” to “SASE”?

What is Azure Advisor?

Azure Advisor analyzes your configurations and usage telemetry and offers personalized, actionable recommendations to help you optimize your Azure resources for reliability, security, operational excellence, performance, and cost. Azure Advisor is a free service and can be accessed via the GUI on the Azure portal where recommendations are collated and can be manually examined. Azure Advisor makes recommendations for potential improvements in several areas, including.

What's new in Sysdig - August 2022

Welcome to another month of What’s New in Sysdig in 2022! I’m Joshua Ma, a Customer Solutions Engineer based out of sunny Los Angeles. I joined the Customer Success team at Sysdig five months ago. After having my first taste of K8s, containers, and Falco at the North America KubeCon/CloudNativeCon in 2019, I haven’t looked back since!

Sysdig launches Partner Technical Accreditation Program

In the quest for business transformation and digital modernization, organizations have rapidly adopted devops frameworks, microservice architectures, serverless technologies, and containerized infrastructures. However, they have realized that legacy tools cannot adequately address the newer security and monitoring challenges associated with modernization. Sysdig’s mission is to make every cloud platform secure and reliable from source to run.

Announcing Native Collectors: Bringing Native Data Collection to InfluxDB Cloud

Streaming time series data from brokers and services that are on-premises or in the cloud to a cloud-based database is a resource-intensive process requiring third-party software and heavy customizations. Today we’re announcing InfluxDB Native Collectors to make it easy for developers to collect, process, and analyze data by subscribing directly to supported message brokers.

InfluxData Brings Native Data Collection to InfluxDB

SAN FRANCISCO — August 23, 2022 – InfluxData, creator of the leading time series platform InfluxDB, today announced new serverless capabilities to expedite time series data collection, processing, and storage in InfluxDB Cloud. InfluxDB Native Collectors enable developers building with InfluxDB Cloud to subscribe to, process, transform, and store real-time data from messaging and other public and private brokers and queues with a click of a button.

Application Observability, The Next Step in Application Performance Monitoring

Today’s cloud environments rely on microservices, service meshes, containers, and orchestration tools and are too complex for traditional tools to measure and monitor performance metrics effectively. The number of interdependent services—and the inherently ephemeral nature of cloud workloads—make it challenging to identify which metrics to monitor and issues to troubleshoot down to the root cause.

Why Do You Need Smarter Alerts?

The way organizations process logs have changed over the past decade. From random files, scattered amongst a handful of virtual machines, to JSON documents effortlessly streamed into platforms. Metrics, too, have seen great strides, as providers expose detailed measurements of every aspect of their system. Traces, too, have become increasingly sophisticated and can now highlight even the most precise details about interactions between our services. But alerts have remained stationary.

5 FinTech Log Analytics Challenges Equifax Solved with ChaosSearch

Global data, analytics and technology companies such as Equifax, and their Engineering teams, depend on log analytics for a variety of operational analytics use cases, from application troubleshooting to streamlining cloud operations and regulatory compliance management. ChaosSearch is uniquely positioned to help companies like Equifax significantly reduce the time, cost, and complexity of log analytics.

Introducing Dynamic Sampling

In the monitoring industry there’s a complicated and frustrating conversation that persisted over the years: how do you deal with the enormous volume of data generated by instrumentation? On one side of the aisle, you will find a cohort of vendors and developers telling you that you have to sample data, followed immediately by another group telling you that sampling will ruin the accuracy of incident analysis. They’re both right.

Top 5 Debugging Tips for Kubernetes DaemonSet

Kubernetes is the most popular container orchestration tool for cloud-based web development. According to Statista, more than 50% of organizations used Kubernetes in 2021. This may not surprise you, as the orchestration tool provides some fantastic features to attract developers. DaemonSet is one of the highlighted features of Kubernetes, and it helps developers to improve cluster performance and reliability.

Monitor your gRPC APIs with Datadog Synthetic Monitoring

gRPC is an open-source Remote Procedure Call (RPC) framework developed by Google and released in 2016. Although gRPC is still relatively new, large organizations are adopting it in increasing numbers to build APIs that connect complex microservice meshes that use disparate languages and frameworks. gRPC-based APIs can perform requests up to seven times faster than REST APIs and enable customers to easily implement SSL authentication, load balancing, and tracing via plug-in libraries.

Understanding monitoring and observability

Roaming in the world of cloud technology not only helps you take a glance at the realm of cutting-edge technology but also helps you get familiar with concepts such as monitoring and observability. This article will cover an introduction to monitoring and the need for monitoring applications. From here, we will look at how you can utilize the data received when monitoring an application. This will allow us to understand how the concept of observability fits in with monitoring.

How to monitor Solr with OpenTelemetry

Monitoring Solr is very critical because it handles the search and analysis of data in your application. Similifying this monitoring is necessary to gain full visibility into Solr’s availability and ensure it is performing as expectedn. We’ll show you how to do this using the jmxreceiver for the OpenTelemetry collector. You can utilize this receiver in conjunction with any OTel collector: including the OpenTelemetry Collector and observIQ’s distribution of the collector.

10Web Booster: Speed Up Your WordPress Site with One Tool

When it comes to a website’s performance, we all know the universal rule: speed matters… a lot. Beyond a good user experience, it’s a key factor in what Google is specifically looking—and testing—for. If you need a refresher, here it is, straight from Google: And what exactly does Google consider fast?

Monitoring Unit Tests with OpenTelemetry in .NET

In this post, we’ll look at how you can use OpenTelemetry to monitor your unit tests and send that data to Honeycomb to visualize. It’s important to note that you don’t need to adopt Honeycomb, or even OpenTelemetry, in your production application to get the benefit of tracing. This example uses OpenTelemetry purely in the test project and provides great insights into our customer’s code. We’re going to use xUnit as the runner and framework for our tests.

Autoscaling Elasticsearch/OpenSearch Clusters for Logs: Using a Kubernetes Operator to Scale Up or Down

When we say “logs” we really mean any kind of time-series data: events, social media, you name it. See Jordan Sissel’s definition of time + data. And when we talk about autoscaling, what we really want is a hands-off approach at handling Elasticsearch/OpenSearch clusters. In this post, we’ll show you how to use a Kubernetes Operator to autoscale Elasticsearch clusters, going through the following with just a few commands.

Grafana Tempo 1.5 release: New metrics features with OpenTelemetry, Parquet support, and the path to 2.0

Grafana Tempo 1.5 has been released with a number of new features. In particular, we are excited that this is the first release with experimental support for the new Parquet-based columnar store. Read on to get a high-level overview of all the new changes in Grafana Tempo! If you’re a glutton for punishment, you can also dig into the hairy details of the changelog.

Rust Object Store Donation

Today we are happy to officially announce that InfluxData has donated a generic object store implementation to the Apache Arrow project. Using this crate, the same code can easily interact with AWS S3, Azure Blob Storage, Google Cloud Storage, local files, memory, and more by a simple runtime configuration change. You can find the latest release on crates.io. We expect this will accelerate the pace of innovation within the Rust ecosystem.

Migrating Monoliths to Microservices in Practice

There have been amazing articles on the subjects of migrating from a monolith to a microservice architecture e.g. this is probably one of the better examples. The benefits and drawbacks of the architectures should be pretty clear. I want to talk about something else though: the strategy. We build monoliths since they are easier to get started with. Microservices usually rise out of necessity when our system is already in production.

Why and How to Monitor AWS Elastic Load Balancing

When building systems that need to scale above a certain number of users, we usually can’t stay on one machine. This is where cloud providers like AWS usually come into play. They allow us to rent VMs or containers for small intervals. This way, we can start a few different machines when more traffic hits, and when it goes down later, we can simply turn off our extra capacity and save money. The question is, how does all this traffic get to our new machines? AWS Elastic Load Balancing!

Sponsored Post

Modern Observability and Digital Transformation

For most businesses, effective digital transformation is a key strategic objective, and as computing infrastructure grows in complexity, end-to-end observability has never been more important to this cause. However, the amount of data and dynamic technologies required to keep up with demand only continues to increase, and current tools are not equipped to handle it- with any discrepancies resulting in rising costs and reduced competitiveness.

Why You Shouldn't Use OpenTracing In 2022

OpenTracing was an open-source project developed to provide vendor-neutral APIs and instrumentation for distributed tracing across a variety of environments. As it is often extremely difficult for engineers to see the behaviour of requests when they are working across services in a distributed environment, OpenTracing aimed to provide a solution to heighten observability.

An introduction to OpenTelemetry Metrics

OpenTelemetry is a collection of APIs, SDKs, and libraries that provide an open source observability framework for instrumenting, generating, collecting, and exporting telemetry data like metrics, traces, and logs. It is incubated under Cloud Native Computing Foundation (CNCF), the same foundation which incubated Kubernetes. OpenTelemetry is quietly becoming the world standard for instrumenting cloud-native applications.

InfluxDB Python Client Library: A Deep Dive into the WriteAPI

InfluxDB is an open-source time series database. Built to handle enormous volumes of time-stamped data produced from IoT devices to enterprise applications. As data sources for InfluxDB can exist in many different situations and scenarios, providing different ways to get data into InfluxDB is essential. The InfluxDB client libraries are language-specific packages that integrate with the InfluxDB v2 API. These libraries give users a powerful method of sending, querying, and managing InfluxDB.

Product Update - CLI Onboarding Wizard Now Available

We love to write and ship code to help developers bring their ideas and projects to life. That’s why we’re constantly working on improving our product to meet developers where they are, to ensure their happiness, and accelerate Time to Awesome. This week, we are covering a featured product release that we think will save you time and effort when onboarding to time series and InfluxDB.

The Ultimate OpenTelemetry Guide for Developers

OpenTelemetry is a free and open-source software initiative with the objective of supplying software developers with the means to create distributed systems. OpenTelemetry was developed by engineers at Google, and developers have the ability to utilize it to create a standard foundation for the construction of distributed systems. The goal is to enable developers to write code once and then deploy it in any location of their choosing.

How Denmark's Energinet uses Grafana Enterprise to monitor underwater energy cables - and do detective work

If an energy cable running through the waters surrounding Denmark gets damaged by a passing vessel, does it make a sound? Yes. . . and it’s the ping of a Grafana alert at the offices of Energinet, an independent public enterprise owned by the Danish Ministry of Climate, Energy, and Utilities.

IT Asset Disposition: An MSP Opportunity That's Anything But Trash

Computers double in power every couple of years, according to Moore’s Law. That’s good news for companies and their people—continually faster and smarter gear can power higher productivity. But there’s also a downside to Moore’s Law: devices go obsolete quickly. The average server lifespan is only two to four years. And network switches often last just three to five years before breakdowns become a common concern.

What is OpenTelemetry?

OpenTelemetry is a collection of tools and APIs for collecting, processing, and exporting telemetry data from software. It is used to instrument applications for performance monitoring, logging, tracking, tracing, and other observability purposes. What is Telemetry? The word is derived from the Greek “tele” meaning “remote,” and metron meaning “measure.” So, it’s the collection of metrics and their automatic to a receiver for monitoring.

How to Manage a Network: 10 Essential Steps

In a perfect world, understanding how to manage a network would be a breeze. On your first day of managing a network, you’d find tons of documentation on the IT infrastructure waiting for you. Login credentials would be securely recorded and ready for review. Sadly, we don’t live in a perfect world. That’s why managing a new network can be tough—especially if you’re joining a brand new IT team or taking on a new client and aren’t sure what’s been done before.

Getting Started With Syslog in Auvik

When something goes wrong in your network, you often don’t find out about it until your users are affected, and you’re left scrambling to identify the issue and understand its root cause. The faster you find out about a network issue and why it’s happening, the quicker you can implement the right fix and spare your network users from unnecessary downtime.

Choosing the Best Virtual Desktop Infrastructure (VDI) Technology for Your Enterprise

Virtual desktop infrastructure (VDI) is a technology that refers to the use of virtual machines to provide and manage virtual desktops. Users access virtual desktops from their laptops, desktops, thin clients, or mobile devices from anywhere. Virtual desktops are hosted in a data center, on servers, and all the necessary processing is done on the server that hosts the virtual desktops.

Mezmo Named to Inc. 5000's List of Fastest Growing Companies in the Nation

Inc. is shining a light on Mezmo as one of the fastest growing companies in the nation. We are truly honored to be featured alongside innovative brands like Sentry and Calendly, who are building the future of tech. Our position on the list at number 695 reflects our 900% growth in revenue and 300% growth in the size of our team from 2018 to 2021.

That Rogers Outage is Going to be More Expensive Than You Think

On July 8 of 2022, the Canadian telecom company Rogers Communications suffered a major outage that impacted most of Canada for almost two days. This wasn’t completely unprecedented (they’d had an outage in 2021 that impacted their wireless servers for several hours) but the breadth and severity of this one is going to end up costing them far, far more than it seems at first glance.

A preview of our upcoming redesign

Earlier this year, we announced that one of our goals for this year is to bring the UI of Oh Dear to the next level. Behind the scenes, our team is working hard on a complete rewrite of our marketing website and app. We're currently targeting the end of September timeframe to launch our redesign. In this blog post, we'd like to give you a preview of the redesign.

New in Grafana 9.1: Trace to metrics allows users to navigate from a trace span to a selected data source

Traces, logs, and metrics provide inherently different views into a system, which is why correlating between them is important. With features like exemplar support, trace to logs, and span references, you can quickly jump between most telemetry signals in Grafana. With the release of Grafana 9.1, we’re improving Grafana’s ability to correlate different signals by adding the functionality to link between traces and metrics.

How to Supercharge Your Website Monitoring in 5 Minutes or Less

I’m a recent entrant to the Website Monitoring game, but there is one thing I realized straight away: A Monitoring tool is only as good as it’s configured to be. Website monitoring is at its best when it’s reliable, informative, and efficient. When it gives you the information you need, when you need it, and the peace of mind to say “if I’m not being alerted, I know it’s still working.”

Using N-central for Server Hardware Monitoring

While it is fair to say that in recent years we’ve seen a shift to servers being deployed in the cloud through Microsoft Azure or AWS, I’m sure if you’re reading this today you still have a large percentage of physical servers under your management, including Hyper-V and ESXi hosts. N-central’s ESXi monitoring should automatically detect and monitor the hardware in these boxes, but what about the rest?

Shine Some Light on Your SNS to SQS to Lambda Stack

The combination of SNS to SQS to Lambda is a common sight in serverless applications on AWS. Perhaps triggered by messages from an API function. This architecture is great for improving UX by offloading slow, asynchronous tasks so the API can stay responsive. It presents an interesting challenge for observability, however. Because observability tools are not able to trace invocations through this combination end-to-end. In X-Ray, for example, the trace would stop at SNS.

Goliath Technologies Achieves Record First-Half 2022 Growth

Philadelphia, PA – August 10, 2022 – Goliath Technologies, a leader in end-user experience monitoring and troubleshooting software for hybrid cloud environments, announced today that they achieved record revenue growth during the first half of the calendar year 2022, up more than 45% YOY 2021.

Concurrent Users vs. Total Users Explained

How many concurrent users do you need to load test your website? Total Users (aka Total Sessions count) metric is commonly used in performance testing to answer this question. However, it is not as straightforward as it may appear. Learn the difference between Total Users and Concurrent Users from this short video.

High-Scale Monitoring: Lessons from Broadcom's DX UIM Deployment

While many people know us from our semiconductor and infrastructure software solutions, few have visibility into what goes on behind the scenes to support Broadcom’s global business. Within the Broadcom Software division, the Broadcom Global Technology Organization (GTO) is responsible for managing an extensive IT infrastructure, one that spans 18 data centers, 100 sites, and 400 R&D labs.

Battle the Ransomware Scourge with Deep Network Insight

Ransomware is the gift that keeps on giving. Old as it is (33 years) ransomware is constantly morphing into new exploits. The reason is simple. Ransomware works and too often cybercriminals walk away with bags of money (or piles of Bitcoin, anyway). “Following the World Health Organization's AIDS conference in 1989, Joseph L. Popp, a Harvard-educated biologist, mailed 20,000 floppy disks to event attendees.

How to Troubleshoot Intermittent Internet Connection

Dealing with an intermittent Internet connection is very frustrating. Does your Internet connection keep disconnecting and reconnecting when you’re watching your favourite Netflix show or chatting with your colleagues on Zoom? In this article, we’re teaching you how to troubleshoot intermittent Internet issues with Network Monitoring.

Stack Trace Line Numbers for Unity Events

In 2018 we launched the Sentry Unity SDK, but at the time, we couldn’t crack how to display stack trace line numbers for C# exceptions with IL2CPP scripting backend. And until a recent release of Unity, we thought it wasn’t possible. But here at Sentry we often do the impossible… or at least the improbable. Like adding features to our JavaScript SDK while making it smaller at the same time.

Monitor your Dataflow pipelines with Datadog

Dataflow is a fully managed stream and batch processing service from Google Cloud that offers fast and simplified development for data-processing pipelines written using Apache Beam. Dataflow’s serverless approach removes the need to provision or manage the servers that run your applications, letting you focus on programming instead of managing server clusters. Dataflow also has a number of features that enable you to connect to different services.

Networking Automation Software: Pros & Cons

What is network automation software? It’s something we’re going to be seeing a lot of in the near future. Networks worldwide continue to grow at astonishing rates to supply organizations with the access and bandwidth they need to operate. But in a survey by Oracle, 76% of respondents said network complexity was one of management’s biggest challenges. As network growth increases—and complexity along with it—we’ll need efficient and scalable ways to handle it.

How Helios integrates with Cypress to provide backend visibility into your UI testing

Testing web applications from the user interface (UI) is a must for every customer-facing product, from e-commerce portals to cyber security dashboards. Often, a broken or inefficient UI experience can make or break whether end users adopt a product quickly and trouble-free. This is why developers have embraced UI testing as a critical part of their development process.

How does Core Web Vitals impact SEO?

Google announced a new set of metrics known as “Core Web Vitals” which will be used as ranking factors in 2021. The announcement is part of Google’s ongoing effort to make the web more user-friendly and to help site owners improve the experience for their visitors. This is big news for site owners and SEO's who are always looking for ways to improve their site’s ranking on Google.

Honeycomb Play: Test Drive Honeycomb Without Signup or Setup

Honeycomb Play is an interactive sandbox that lets users explore Honeycomb’s data-enriched UI through a guided scenario. The hands-on experience takes a deep dive into how Honeycomb enables you to identify issues, assess their impact, and diagnose their causes for remediation. There is no requirement to sign up—simply dive in and get started right away!

Pros and Cons of Horizontal vs. Vertical Scaling

Have you ever dealt with an application workload suddenly increasing and the demand on SQL Server begins to increase exponentially as a result? Maybe you’ve never had one of those “my solution to X problem just went viral, and everybody’s signing up” moments, but inevitably, at some point in your career, you’ll have to deal with a scalability limitation affecting performance and the business will look to you for a solution.

Getting Started With Azure Database for PostgreSQL

Open-source software (OSS) relational database management systems (RDBMSes) are becoming incredibly popular in the cloud computing world. In this article, I’ll discuss one of the most popular OSS relational databases, PostgreSQL, and the options you have for this database in the Microsoft Azure cloud. It’s worth noting these products are large with many detailed aspects, so consider this article a cursory, high-level overview of each offering.

New in Grafana 9.1: Share your Grafana dashboard with anyone via public dashboards

We’re excited to announce the launch of a new feature we’ve been working on in Grafana 9.1: public dashboards 🎉. The public dashboards feature will allow you to share your Grafana dashboard with anyone, even if they’re not part of your Grafana organization. Historically, the only way that someone could share a dashboard externally was taking a one-time snapshot 📸, or disabling all authorization for their Grafana instance 😬.

Tame the Internet with DX NetOps 22.2 Network Monitoring Software

DX NetOps 22.2 optimizes network operations with industry-leading visibility, scale and modern network coverage beyond the network edge to quickly and easily isolate end-user experience impact of network performance issues. Recent research revealed that 67% of companies cite internet and cloud network paths as monitoring blind spots. Furthermore, 71% of companies say that adoption of new network technologies is delayed by inadequate network monitoring software.

Time Series Forecasting With TensorFlow and InfluxDB

This article was originally published in The New Stack and is reposted here with permission. You may be familiar with live examples of machine learning (ML) and deep learning (DL) technologies, like face recognition, optical character recognition OCR, the Python language translator, and natural language search (NLS). But now, DL and ML are working toward predicting things like the stock market, weather and credit fraud with astounding accuracy.

InfluxDB's Strengths and Use Cases Applied in Data Science

This article was written by Shane from Infosys. Infosys is a global IT Leader, headquartered in India, with over 200,000 employees and a focus on digital transformation, AI/ML, and Analytics. Our organization faces challenges when working with data to assist with proactive anomaly detection, triaging incidents to accommodate for data and volume growth, and maintaining high availability and SLA’s for a near 100% uptime.

You've instrumented performance...now what?

Sarah has an app and turned performance monitoring on and has a slowdown but doesn’t know where to start to fix it… Ben has the guide to diving into Performance with Sentry. You have a laptop, computer, or comodor64. Together, we will discover: Join Sarah Guthals, Director of Developer Relations, and Ben Pepper, Solutions Engineer, in this interactive workshop and your future self will thank you.

Are Your Engineers Gonna Need A Bigger Boat?

If you asked your engineering team how well they can handle all of the security and observability data they’re managing, would you get a resounding “Yeah boss, we’re good to go!” in response? Possible, but unlikely. Chances are they feel like they’re stuck on a boat that’s taking on water, spending their day using tiny buckets to scoop some of it out, with no way to plug any of the leaks.

AIOps: Hype vs. Reality

What is AIOps? How does an AIOps platform help your observability practice? AIOps platforms analyze telemetry and events, and identify meaningful patterns that provide insights to support proactive responses. AIOps platforms have five characteristics:1 The above is Gartner’s definition and is part of the Gartner® “Market Guide for AIOps Platforms.” The Gartner definition is also aligned with our view.

Moving from an IT and Security Data Admin to an Observability Engineer

Join Ed Bailey, Nick Heudecker, and Jordan Perks as they discuss what it means to transition from acting simply as an IT and security data administrator to becoming a true observability engineer. In your role as an observability engineer, you’ll guide an organization on observability data best practices, enhance existing tool functionality, help control cost, and improve overall compliance.

Make Your Hybrid Work Model Successful with Digital Experience Monitoring

According to a new study, the hybrid work model reduces attrition by a third. As the pandemic shifted work to remote, with some companies going fully remote work for over two years, many organizations are looking to continue to allow hybrid work environments. In 2022, more than 90% of midsize companies plan to implement a hybrid work structure. One of our customers practices this model and is always interested to know what new features can enhance the digital experience of their distributed workforce.

Introducing OpUtils' IP Request tool

Are multiple IT operators accessing, utilizing, or managing your network address space? If so, then you might have noticed that one of the time-consuming network management tasks you are undertaking regularly, is allocating IP addresses to the IT operators. This is an inevitable task since, as your network scales with new physical components or technology implementations, your operators require new IPs to enable network connectivity.

AIOps Feature Scape: How you can Accelerate AIOps Data Integrations with Insane New Robotic Data Automation Fabric (RDAF)

This is the first Feature Byte in the AIOps series. The idea of the Feature Byte series is to talk about key operational tasks and processes in AIOps, and how CloudFabrix Data-Centric AIOps platform features help implement such tasks. Look for more such feature bytes over the next few weeks.

Three NuGet packages to improve exceptions in .NET/C#

We love exceptions. Not in the oh-no-my-website-crashed kind of way, but all of the possibilities provided by exceptions and a good exception handling strategy. In this post, I'll introduce you to three different NuGet packages that will help you when dealing with exceptions in C#. .NET comes with a set of exceptions as part of the C# language. You've probably tried creating your own exceptions too by extending System.ApplicationException or similar.

Grafana 9.1 release: New Grafana panels, RBAC for plugins, public dashboards, and more!

Grafana 9.1 is here! Get Grafana 9.1 We’ve made a number of improvements to Grafana’s usability, data visualization, and security. For a full list of new features and capabilities, check out our What’s New in Grafana 9.1 documentation. You can get started with Grafana in minutes with Grafana Cloud. We have a generous free forever tier as well as plans to suit every use case — sign up for free now. Here are some of the highlights in Grafana 9.1.

The Three-Month Fix: How AbbVie Kept Their VDI Users Up and Running

The complexity of today’s workplace technology means that all of our environments are incredibly unique. Two organizations may use the same platforms and applications, but the tactics we use to implement these tools are all unique to our own goals and business needs. But all of us who work in IT and engineering can agree: our companies’ success hinges on our ability to keep our environments running smoothly. I’m a senior engineer at the pharmaceutical company AbbVie.

Top API Metrics for Different Teams That You Should Monitor

Building and utilizing modern applications now essentially requires APIs. They are a crucial component of every company's automated workflow, and as more customers depend on your APIs to power their applications, the demand for them to be trustworthy is growing. Your business will suffer if its performance, availability, or health degrades, thus proactive API monitoring is essential to ensure its dependability. We'll go through the most important API metrics in this article.

Sustainable Technology: Can Clouds Save The Planet?

Sustainable technology has become a hot topic in the tech world over the past few years. From the escalating environmental impact of heat and water-intensive activities like crypto mining and chip production to carbon taxes to sustainability becoming a pillar in the AWS Well-Architected Framework, there’s plenty of issues to talk about, but conversations become marked quieter when you shift to discussing solutions.

Kubernetes 1.25 - What's new?

Kubernetes 1.25 is about to be released, and it comes packed with novelties! Where do we begin? This release brings 40 enhancements, on par with the 46 in Kubernetes 1.24 and 45 in Kubernetes 1.23. Of those 46 enhancements, 13 are graduating to Stable, 10 are existing features that keep improving, 15 are completely new, and two are deprecated features.

Welcome To The Experience-Driven NOC: Track Network Path Deviation From Normal Performance

On your journey to the Experience-Driven NOC, we illustrate the importance of tracking deviation from normal network path performance behavior. This DX NetOps capability provides operations even more clarity into network performance impact on user experiences with focused triage on what matters vs performance blips that do not require immediate attention.

Welcome to the Experience-Driven NOC: Network Path Performance Metrics

On your journey to the Experience-Driven NOC, we make it easy to view and modify the network path metrics collected by AppNeta inside the DX NetOps portal for your Experience-Driven NOC. Metrics like percentiles and projections provide even more observability like capacity planning besides many other insights and can be used in dashboards and reports.

Centralizing Log Data to Solve Tool Proliferation Chaos

As companies evolve and grow, so do the number of applications, databases, devices, cloud locations, and users. Often, this comes from teams adding tools instead of replacing them. As security teams solve individual problems, this tool adoption leads to disorganization, digital chaos, data silos, and information overload. Even worse, it means organizations have no way to correlate data confidently. By centralizing log data, you can overcome the data silos that tool proliferation creates.

Welcome to the Experience-Driven NOC: Network Path Change Alarms and Drill Down Context Pages

On your journey to the Experience-Driven NOC, DX NetOps 22,2 is enabling network operations centers to utilize their standard operating procedures and workflows to triage the performance of the entire network path of any user experience - over managed or unmanaged networks. As with any workflow, we start with an alarm and enable operations to drill down, in-context to the offending network device; giving operators enhanced visibility into network path performance along with key KPIs for focused troubleshooting on any and all user experience impact. For more info, visit broadcom.com/netops

Welcome to the Experience-Driven NOC: Configure DX NetOps to Collect AppNeta Experience Metrics

On your journey to the Experience-Driven NOC, in this video we show you how to configure the DX NetOps gateway to connect to your AppNeta instance with all the relevant information needed to consume experience metrics into the NetOps portal. For more info, visit broadcom.com/netops

Advanced Debugging and Monitoring for Serverless Backends

Serverless backends have different monitoring challenges when compared with traditional applications, mostly due to the distributed and proprietary nature of serverless. Making monitoring and debugging efficient for serverless requires a unique set of tools and techniques. In this article, we’ll discuss the challenges of debugging serverless backends and how to utilize third party tools to improve the monitoring process.

What's Missing From Almost Every Alerting Solution in 2022?

Alerting has been a fundamental part of operations strategy for the past decade. An entire industry is built around delivering valuable, actionable alerts to engineers and customers as quickly as possible. We will explore what’s missing from your alerts and how Coralogix Flow Alerts solve a fundamental problem in the observability industry.

How to Monitor SAP System Connection Health and Create Alert Mechanism?

As a Middleware Administrator, you may need to monitor the status of SAP due to a potential exception that you cannot predict. In order to do this, you may need to connect to the SAP system somehow. In this blog post, we’re going to show you how to monitor SAP through to Oracle WebLogic and Java aspects. You are able to connect to the SAP system using the WebLogic SAP Resource Adapter.

5 best practices for optimizing IP address management

Ever discovered an IP issue only after an end user reported it? If yes, you are not alone! Many network admins without a proactive monitoring solution in place have to constantly deal with recurring IP issues such as IP conflicts and subnet overutilization. As simple as it sounds, IP address management can be a tricky endeavor without the right strategy and management tools. Managing hundreds of IPs across multiple subnets and supernets can be tedious.

Manage Service Catalog entries efficiently with the Service Definition JSON Schema

The Datadog Service Catalog helps you centralize knowledge about your organization’s services, giving you a single source of truth to improve collaboration, service governance, and incident response. Datadog automatically detects your APM-instrumented services and writes their metadata to a service definition before adding them to the catalog.

How Veterans United Home Loans uses Grafana Cloud to help military families become homeowners

Veterans United Home Loans is the top VA lender for home buyers in the United States and has been making the dream of homeownership a reality for veterans and military families for more than two decades. A big part of making that dream come true is keeping their services – including both internal applications and a robust digital experience for their borrowers – highly performant.

DX NetOps 22.2 Expands Industry-Leading Observability for Fortinet SD-WAN

The latest release of DX NetOps 22.2 continues to expand monitoring coverage of the most popular SD-WAN vendors in the market - now supporting Fortinet SD-WAN. The multi-vendor SD-WAN monitoring solution helps operations teams quickly understand how their Fortinet SD-WAN environment is meeting application delivery SLAs and how current performance compares to baseline norms.

Resiliency As the Next Step in the DevOps Transformation

We’ve reached the point in the DevOps transformation where efficiency and automation are no longer the highest objectives. The next step is engineering past automation and towards fully autonomous, self-healing systems. If you aren’t conversing about building this type of resilience into your systems and applications, there’s never been a better time than now to start.

IT Salaries: Trends, Roles, & Locations for 2022-2023

IT roles have never been more in demand and IT salaries have never been higher, according to recent reports and data sources. Whether you are hiring, looking for a career change, or simply work in tech, it’s important to stay up-to-date on the state of employment in the industry. This blog post will review, roundup, and summarize some of the latest trends for IT salaries and demand by role and location (among other variables) to help you get a clear view of the landscape.

State of Cloud Cost Report 2022

Cloud migration efforts continue to grow today as organizations move into a post-pandemic work environment. According to McKinsey & Company, by 2024, most enterprises aspire to have $8 out of every $10 for IT hosting go toward the cloud. In a survey by Morgan Stanley, CIOs say cloud computing will see the highest rate of IT spending growth in 2022.

Monitoring Performance at Moonbeam from Day One

As someone who has seen the devastating effects of poor performance monitoring firsthand, I can attest to the importance of doing it right from the start. If your users are experiencing latency issues and you’re not aware of them, that’s a big problem. At one of my previous jobs, we ended up paying out millions of dollars in SLA violation fees because we didn’t have proper monitoring.

Kubernetes Metrics Server | How to deploy k8s metrics server and use it for monitoring

Modern digital businesses have adopted cloud technology and distributed architectures to enable on-demand scaling of resources. Containerization technologies like Kubernetes and Docker have made it possible to handle customer demands at scale. However, orchestrating a complex microservices architecture with Kubernetes is challenging. Monitoring your Kubernetes cluster can give you insights to better manage your cluster.

Welcome to the Experience-Driven NOC

Experience is the new benchmark for network operations today. That is why we are proud to deliver DX NetOps 22.2 that takes our decades of expertise in network visibility, scale and modern network coverage and expands these capabilities beyond the network edge, to home wireless, ISP, cloud and SaaS environments; where the user experience lives now. For more information, visit broadcom.com/netops

Product Update - Task Management at Scale and Invokable Scripts from the Tasks API

Thanks to Vinay Kumar for being a key contributor to this article. We love to write and ship code to help developers bring their ideas and projects to life. That’s why we’re constantly working on improving our product to meet developers where they are, to ensure their happiness, and accelerate Time to Awesome. This week, we are covering a featured product release that we think will save you time and effort when building with time series, InfluxDB – and specifically – Tasks.

Authors' Cut-Not-So-Distant Early Warning: Making the Move to Observability-Driven Development

This is how the developer story used to go: You do your coding work once, then you ship it to production—only to find out the code (or its dependencies) has security or other vulnerabilities. So, you go back and repeat your work to fix all those issues. But what if that all changed? What if observability were applied before everything was on fire? After all, observability is about understanding systems, which means more than just production.

Using Grafana and machine learning to analyze microscopy images: Inside Theia Scientific's work

At GrafanaCONline 2022, Theia Scientific President, Managing Member, and Lead Developer Chris Field and Volkov Labs founder and CEO Mikhail Volkov — a Grafana expert — delivered a presentation about using Grafana and machine learning for real-time microscopy image analysis. Real-time microscopy image analysis involves capturing images on a microscope using a digital device such as a PC, iPad, or camera.

Introduction to Network Detection and Response

With cyber threats on the rise each year, ensuring the security of your IT estate and being able to respond quickly to any potential threat is more crucial than ever. This is why Network Detection and Response tools play such a critical role in your security posture. We will introduce you to our Network Detection & Response tools, exploring the benefits of adding them to the solutions already deployed in your network infrastructure.

Network Availability Monitoring Tools

For any digital service provider, there’s nothing worse than remaining in the dark and responding to network issues only when a customer rings in with a complaint. However, the planning and management needed to remain ahead of such network issues requires consistent monitoring. Network availability monitoring is a crucial aspect of network management and administration today. Here’s a detailed guide on network monitoring and network availability monitoring tools.

AWS Monitoring Tool

Businesses today know what a cloud migration brings to the table regarding growth and optimization. AWS is the most popular cloud platform for hosting applications of varying complexity. But, with businesses aiming to upscale faster, the deployments tend to become difficult to track. This is one of the many reasons businesses need AWS monitoring tools. This blog will discuss the industry’s best practices for AWS monitoring.

7 Most Important Key Metrics of Server Monitoring Software

Modern-day servers are robust enough to accommodate as many applications and processes as possible. Still, there is a limit to how much load a server can handle. If your business does not heed the server constraints in time, you are bound to suffer from operational loss due to server downtimes. To closely monitor your server health, you must track specific metrics regularly. Here are some server monitoring metrics that every business should report and analyze.

How Modern Infrastructure is Impacting Application Availability at Scale

The complexity of modern information technology (IT) infrastructures has grown exponentially and changed the way software companies operate and deliver products and services. The days of a single application server and a simple delivery path are long gone. Today’s application development and delivery process can encompass multiple platforms, cloud vendors, code libraries and customer bases.

Track your test coverage with Datadog RUM and Synthetic Monitoring

The modern standards of the web demand that user-facing applications be highly usable and satisfying. When deploying frontends, it’s important to implement a comprehensive testing strategy to ensure your customers are getting the best possible user experience. It can be difficult, however, to gauge the effectiveness of your test suite. For instance, all of your tests may be passing, but they might not cover a specific UI element that is crucial to a critical workflow.

"Hey Avantra, refresh my QA systems"

I remember just about three years ago sitting in a companywide meeting in a conference room at the Museum of Modern and Contemporary Art in Lisbon, Portugal (Museu Coleção Berardo, Lisboa). Our CTO, Bernd Engist, was giving us a presentation about some new features we had recently developed on automating the start/stop process of an SAP system.

Coastal municipalities monitor their beaches to measure the environmental impact of tourism

The Spanish tourism sector is preparing for a record summer in which they hope to emulate the summer periods prior to the coronavirus pandemic. The government has already advanced that 90% of the foreign tourism that visited the country in 2019 will be recovered. That year, 84 million foreigners were received, a figure that is expected to be reached this year, taking advantage of the high temperatures and the national tourist offer.

Citrix Monitoring Masterclass with George Spiers - Q&A

Citrix monitoring refers to the ability to monitor Citrix services end-to-end. It includes the ability to monitor user experience – from logon time to application launch time to screen refresh latency so administrators can easily monitor and track if they are meeting their service levels (SLAs).

Seven Tools to Help You Become a Better Serverless Developer

Serverless technologies let us do more with less effort, time and energy. They let us focus on creating user value and let the cloud handle undifferentiated heavy-lifting like scaling and securing the underlying infrastructure that runs our code. Serverless technologies have allowed me to accomplish tasks as a solo engineer that used to take a whole team of engineers to accomplish, and I’m able to complete these tasks in a fraction of the time and cost to my customers.

Performance Monitoring with API

If you are in a room with 20 engineers and you ask, “explain what an API is to a non-technical person”, you will get 20 different analogies. An API is like the on button to your TV connecting you to a variety of shows and systems, or an API is like a waiter taking your order and serving you from the kitchen. An API is like a library card catalog, or it’s simply a tool that connects you to other tools.

Snooze your alert policies in Cloud Monitoring

Does your development team want to snooze alerts during non-business hours? Or proactively prevent the creation of expected alerts for an upcoming expected maintenance window? Cloud Alerting in Google's Cloud operations suite now supports the ability to snooze alert policies for a given period of time. You can create a Snooze by providing specific alert policies and a time period. During this window, if the alert policy is violated, no incidents or notifications are created.

Major Hosted VoIP Provider, bravad TI, Choose Obkio for End-User Network Monitoring

Learn how Obkio works with technology agency & major hosted VoIP Service Provider, bravad TI, to create a streamlined network monitoring and troubleshooting process to optimize VoIP performance & improve the end user experience.

GitHub Browser check synchronization goes into public beta

One of our goals here at Checkly is to make it easier for developers to ship excellent software. But let’s face it, getting features out is only a tiny fraction of the story. Fast-moving development teams also break things. And the more things you build, the more things can go wrong. And trust me, they will. This is where API and end-to-end monitoring helps. Define automated test suites that check all your properties constantly and guarantee that everything’s up and running. All the time.

Grafana usage insights: How to track who is sending what metrics to your stack in Grafana Cloud

We are happy to announce the release of the Grafana usage groups feature in Grafana Cloud. This new feature — which is available in the Grafana Cloud Advanced plan — helps centralized observability teams and administrators track and attribute metrics usage back to groups that exist within a single shared Grafana stack. Ultimately, Grafana usage groups can help with governance and cost control.

What's The Best Employee Monitoring Software - ActivTrak or Insightful?

In the sea that is the SaaS market, there is an abundance of software for employee monitoring. This software, as the name suggests, is used for time tracking, monitoring, project management, and implementing safety measures. Widely used by companies that want to ensure that their employees are actually getting work done while on the clock, but that’s not where the benefits of using these tools end.

Understanding AS relationships, outage analysis and more Network Operator Confidential gems

The objective of Network Operator Confidential is to share our global internet market insights from recent months. Kentik, and our customers, have access to views and analysis of global internet traffic that no one else can match. In our first Network Operator Confidential webinar, I was joined by Doug Madory, Kentik’s director of internet analysis, and Grant Kirkwood, founder and CTO at Unitas Global.

How Site Uptime Impacts SEO (Hint: It's a REALLY Big Deal)

It is arguably the most important 3-letter acronym on the digital marketing landscape. No, it’s not ROI. It’s SEO. Consider that: Clearly, effective SEO is extremely important. And for many businesses — especially smaller companies that are competing against big, established enterprises — it’s a matter of survival. However, for some decision-makers outside of the digital marketing world, the link between SEO and site uptime is less clear. Let’s fix that.

AppDynamics Cloud: Kubernetes Overview

Kubernetes® monitoring with AppDynamics Cloud provides you with the visibility you need into your Kubernetes infrastructure. It provides you the capability to monitor metrics on clusters, namespaces, workloads, pods and ingress controllers. You also have the capability to cross validate metrics from your servers and network interfaces as well as cloud providers. Monitor your pod resources for performance issues or identify needs to scale. Kubernetes, as well as App Service Monitoring, is supported by using an OpenTelemetry™ Operator.

Memory Profiling for Java Applications, a Splunk APM Product Walkthrough

Splunk’s Product Manager Priit Potter walks you through how to identify memory bottlenecks in Java applications, in this detailed product walkthrough. See how Priit troubleshoots his own application, visualizes memory performance problems, and uses flame graphs to detail the line of code responsible for the problem, all with the help of Splunk Application Performance Monitoring.

How Cloud Network Monitoring Is Critical To Business Success

A rising number of businesses are adopting and utilizing cloud services and capabilities with remarkable success. But embracing cloud tools and services often brings unexpected changes for business leaders and IT teams, especially because of the way in which cloud adoption has altered how networks are monitored and managed.

One-stop Open Source Observability is now a reality with Log Management in SigNoz - SigNal 15

Welcome back to our monthly product updates - SigNal! Our team shipped some major upgrades to SigNoz last month. We’re elated to share that SigNoz is now available with log management. It’s a major milestone in our journey of democratizing observability for developer teams of all sizes. We also shipped an intuitive alerts builder, attended conferences, and played new games in our Friday chit-chats 🥳 A lot of new open source contributors also helped us in making SigNoz better.

Alerting: A Key Part of Application Performance Monitoring

In today’s digital world, users expect to have a seamless experience in their day-to-day applications. To achieve such reliability and stability in our application, information about the health and performance of an application has become necessary for developers to gain insights and fix bottlenecks to provide a seamless user experience. One of the best ways to gain such insights into an application is to use a monitoring system.

IT Tool Rationalization and You

If you ask the experts, they will tell you that companies have too many IT tools for monitoring their environments. Monitoring tools for network, infrastructure, application, wireless, endpoint, cloud, etc. proliferate all organizations. According to research by Gartner, more than a third of organizations surveyed have more than 30 monitoring tools. More than half of organizations surveyed have at least 11 tools. Sounds like a good argument for IT tool rationalization. But is this really the case?

Data Legends Podcast with Wes Gelpi: Special 2 Part Series

Leading a team of data and analytics professionals isn’t easy; it takes more than just understanding the goal. It’s about the journey and how the people on the journey collaborate. Wes Gelpi, Director of Research & Development at SAS, joins us in a special 2-part episode. Gelpi has a rich history of taking challenging situations and running with them.

What Is Distributed Tracing

Systems and applications alike have become progressively distributed as microservices, open-source tools, and containerisation have gained traction. In order to actively monitor and respond quickly to issues that arise in our environment, distributed tracing has proven to be vital for businesses such as Uber, Postmates, Hello Fresh and TransferWise. It is, however, important to clarify what distributed tracing actually means.

An Introduction to Database Sharding With SQL Server

One of my favorite consulting questions to ask when dealing with a scalability problem is, “If you could change the system design with the knowledge you have today, what would/wouldn’t you change and why?” Sometimes it’s best to ask this question on a one-on-one basis with different developers, DBAs, report writers, and architects to get honest answers that aren’t intimidated by the other people sitting in a meeting or to avoid potential debating and arguing over a solution.

An Engineer's Bill of Rights and Responsibilities

Power has a way of flowing towards people managers over time, no matter how many times you repeat “management is not a promotion, it’s a career change.” It’s natural, like water flowing downhill. Managers are privy to performance reviews and other personal information that they need to do their jobs, and they tend to be more practiced communicators.

What is Website Uptime? Why is Website Uptime Monitoring Important?

Website uptime is one of the most crucial metrics to monitor, particularly if your website is essential for generating revenue. If no one can access a website, all the work and effort you put into getting a website up and running can be gone. With website uptime monitoring, you can keep track of when and how long your website is unavailable. Website uptime monitoring is a crucial component of website management.

The updated Docker integration in Grafana Cloud now supports logs and metrics

More than 17 million developers use Docker to build, ship, and run applications separate from their infrastructure in order to deliver software faster and more efficiently. With the rising popularity of containerized applications, however, it has become increasingly more complex and difficult to observe and monitor applications running across multiple containers.

Bringing business context to network analytics

Kentik brings real-world business context to the telemetry we collect and the analytics we provide. That’s the overarching theme I got from Networking Field Day: Service Provider 2. As I watched and listened to each presentation, it was pretty obvious to me that Avi, Steve, Doug, and Nina, all technical powerhouses, were a little less focused on packets and a little more focused on how we can improve network operations and a service provider’s ability to make smart business decisions.

How to Monitor SAP Hana with OpenTelemetry

SAP Hana monitoring support is now available in the open source OpenTelemetry collector. You can check out the OpenTelemetry repo here! You can utilize this receiver in conjunction with any OTel collector: including the OpenTelemetry Collector and observIQ’s distribution of the collector. Below are quick instructions for setting up observIQ’s OpenTelemetry distribution, and shipping SAP Hana telemetry to a popular backend: Google Cloud Ops.

An Introduction to OpenTelemetry and Observability

Cloud native and microservice architectures bring many advantages in terms of performance, scalability, and reliability, but one thing they can also bring is complexity. Having requests move between services can make debugging much more challenging and many of the past rules for monitoring applications don’t work well. This is made even more difficult by the fact that cloud services are inherently ephemeral, with containers constantly being spun up and spun down.

Performance Monitoring in Next.js Applications

Performance monitoring is an essential part of development. It’s usually one of the first things you’d want to do after setting up an existing project or getting started with a new one. Without monitoring performance, it will be challenging to detect post-development (production issues) issues in your application or how to resolve them. You may end up wasting time attempting to fix something that was never broken.

BindPlane OP Reaches GA

Today we’re excited to announce BindPlane OP – the first observability pipeline built for OpenTelemetry – is out of beta and now generally available. You can download the latest version here. Two months ago we released BindPlane OP in beta, and while we were confident we had something special, the response surpassed all of our expectations.

Announcing the Winners of the Cribl Packs Contest

It’s time for the Black Hat conference in the United States, so we’re onsite meeting with customers and prospects looking to untangle their data from the grip of vendors holding their data hostage. We aim to start a rebellion against this lock-in and encourage customers to focus on radical choice and control with their observability data. Pushing back against “The Empire” is challenging, but you can achieve it with Cribl Stream and Edge.

Autoscale your Kubernetes workloads with any Datadog metric

Editor’s note: This post was updated on August 9, 2022, to include a demonstration of how to enable highly available support for HPA. It was also updated on November 12, 2020, to include a demonstration of how to autoscale Kubernetes workloads based on custom Datadog queries using the new DatadogMetric CRD.

Monitoring Rails applications with Datadog

Rails is a Ruby framework for developing web applications. It favors the Model-View-Controller (MVC) architecture and includes generators that create the files needed for each MVC component. Rails applications consist of a database, an application server for running application code, and a web server for processing requests. Rails provides multiple integrations for its supporting database (e.g., MySQL and PostgreSQL) and web server (e.g., Apache and NGINX).

Why AIOps may be necessary for the future of engineering

Machine learning has crossed the chasm. In 2020, McKinsey found that out of 2,395 companies surveyed, 50% had an ongoing investment in machine learning. By 2030, machine learning is predicted to deliver around $13 trillion. Before long, a good understanding of machine learning (ML) will be a central requirement in any technical strategy. The question is — what role is artificial intelligence (AI) going to play in engineering?

OpenTelemetry Architecture: Understanding Collectors

Telemetry data is a powerful tool for understanding the behavior of complex systems. OpenTelemetry provides a platform-agnostic, open-source way to collect, process, and store telemetry data. This post explores the OpenTelemetry collector architecture, specifically focusing on the Collectors component. We'll look at how collectors work and how they can be used to process telemetry data from any system or application. We'll also discuss some benefits of using OpenTelemetry for your telemetry needs.

Rerouting of Kherson follows familiar gameplan

Since the beginning of June this year, internet connectivity in the Russian-held Ukrainian city of Kherson has been rerouted through Crimea, the peninsula in southern Ukraine that has been occupied by Russia since March 2014. As I explain in this blog post, the rerouting of internet service in Kherson appears to parallel what took place following the Russian annexation of the Crimean peninsula.

What is an MSP SLA? Everything You Need to Know

As a managed service provider, your success depends largely on the satisfaction of your customers. If they’re happy, they’ll stay with you and recommend you to others. But if they’re not, they’ll take their business elsewhere. Sometimes, some customers are never satisfied no matter how hard you try. Instead, they’re always finding something to complain about, which can be frustrating.

TL;DR Python Client Library

InfluxDB has over a dozen different client libraries to help developers work with time series data in whatever programming language they like best. The Python client library is one of our most popular options. It’s simple to learn, and working with InfluxDB in a language you’re comfortable with helps you get started doing powerful time series analysis quickly.

Top Tips for NodeJS and Debugging on AWS Lambda (Part 2)

This is the second post in a 2 part blog series on debugging, monitoring and tracing NodeJS Lambda applications. If you haven’t yet seen part 1, check it out here (it’s a great read!) Now let’s get back into our post with one of the most commonly experienced issues when it comes to Lambda functions, Cold Starts.

Video: How to get started with MongoDB and Grafana

MongoDB is one of the most popular NoSQL databases in the world, used by millions of developers to store application metrics from e-commerce transactions to user logins. The MongoDB Enterprise plugin for Grafana — which is available for users with a Grafana Cloud account or with a Grafana Enterprise license — unlocks all of the data stored in MongoDB as well as diagnostic metrics for monitoring MongoDB itself for visualization, exploration, and alerting.

To Observability and Back Again: A Context's Journey

How do you pass context from events that concern Security teams to Development teams who can make changes and address those events? Often this involves a series of meetings and discussion that can take days or weeks to filter down from security event to developer awareness. Compounding the problem, developers generally do not have access to Splunk Core, Cloud or Enterprise indexes used by security teams, and indeed, may use only Splunk Observability for their metrics, traces and even logs.

Improving DevOps Performance with DORA Metrics

Everyone in the software industry is in a race to become more agile. We all want to improve the performance of our software development lifecycle (SLDC). But how do you actually do that? If you want to improve your performance, first determine what KPI you’d like to improve. DORA metrics offer a good set of KPIs to track and improve. It started as a research by the DevOps Research and Assessment (DORA) and Google Cloud (which later acquired DORA), to understand what makes high performing teams.

Lessons Learned From Building a Company and Raising Kids

When I had my first child almost six years ago, I expected that most of my time would be spent in the role of a teacher rather than a student. I have two kids now — and I’m certainly teaching them as much as I can as they grow and learn to navigate the world — but if someone were keeping score, my kids might end up on top when it comes to who’s taught who more. Another thing that surprised me is how similar building a family is to build a company from the ground up.

Tensu: An Open Source Text UI for Sensu Go

A Two Sigma engineer explains why we built Tensu, an open source TUI (text user interface)-based program for interacting with Sensu Go’s observability pipeline and backend API. In this article we will be putting a spotlight on Tensu, an open source terminal-based dashboard for interacting with and responding to events from the Sensu Go observability pipeline and backend API.

What is Infrastructure as Code?

Cloud services were born at the beginning of 2000 with companies such as Salesforce and Amazon paving the way. Simple Queuing Service (SQS) was the first service to be launched by Amazon Web Services in November 2004. It was offered as a distributed queuing service and it is still one of the most popular services in AWS. By 2006 more and more services were added to the offering list.

Deutsche Bergbau-Museum Bochum

The Deutsche Bergbau-Museum Bochum (DBM), or the German Mining Museum, is one of the premier locations to show those interested in the history of mining. Over its 100+ years of operation, the DBM has evolved its exhibits to use multimedia players and other digital devices connected to its network. However, the museum’s three-person IT staff faced a series of problems, including a lack of insight into where network outages occurred and no alert notifications. After working alongside P&W Netzwerk, DBM deployed Progress WhatsUp Gold to manage the broad network environment cost-effectively.

Proactive Microsoft Teams Monitoring

If your business depends on Microsoft Teams for collaboration and communication, then you must proactively monitor the health of the Teams service, your networks, and everything-in-between. Only Exoprise can synthetically monitor ALL of Microsoft Teams alongside our low-level real-use packet monitoring to complete coverage in ONE platform.

How Datadog's Technical Solutions team uses RUM, Session Replay, and Error Tracking to resolve customer issues

Organizations across a wide range of industries share a common goal: deploy stable applications that support their customers’ needs. Many of these organizations rely on the Datadog platform to get complete visibility into the health and performance of their applications, and we understand how important it is that our services are reliable. That’s why we leverage our own products to ensure that the platform works as expected.

Sponsored Post

Datadog & Speedscale: Improve Kubernetes App Performance

By combining traffic replay capabilities from Speedscale with observability from Datadog, SRE Teams can deploy with confidence. It makes sense to centralize your monitoring data into as few silos as possible. With this integration, Speedscale will push the results of various traffic replay conditions into Datadog so it can be combined with the other observability data. Being able to preview application performance by simulating production conditions allows better release decisions. Moreover, a baseline to compare production metrics can provide even earlier signals on degradation and scale problems. Speedscale joined the Datadog Marketplace so customers can shift-left the discovery of performance issues.

9 Top Frontend Application Monitoring Tools to Catch Errors

Most developers are familiar with the concept of tracking an application's performance. We've all had to undertake our performance debugging at some point. It typically occurs when there is a significant problem with potential cost or user impact. We don't take the time to examine the application's performance in various scenarios till after that. Of course, you can and should monitor various components of the application separately.

ALB vs. ELB

Load balancers are critical components in AWS systems, and selecting the most suitable option might prove confusing for some users. Choosing the right option enables users to distribute various tasks across resources, resulting in an optimized process. Operating a network without load balancers may result in significant delays in web services during a spike in user requests.

Monitoring smart city IoT devices with Grafana and Grafana Loki: Inside the Fuelics observability stack

For smart cities of the future, monitoring infrastructure metrics like fuel and water levels is vital to optimizing operations. Fuelics PC designs and deploys battery-operated narrowband IoT (NB-IoT) sensors that monitor fuel, water, waste, and even parking capacity at the edge, then transmit that data to the cloud for easy viewing and monitoring.

The Real Opportunity for Improving Outcomes with Monitoring and Observability

If you were pulled into a meeting right now and asked to give your thoughts on how to achieve better outcomes with monitoring and observability, what would you recommend? Would you default to suggesting that your team improve Mean Time To Detect (MTTD)? Sure, you might make some improvements in that area, but it turns out that most of the opportunities lie in what comes after your system detects an issue. Let’s examine how to measure improvements in monitoring and observability.

Innovate at Speed With DX NetOps 22.2 Network Monitoring Software

Remove any bottlenecks to your network deployments with continuous, end-to-end modern network monitoring software. Modern network architectures like SD-WAN have emerged as game-changing solutions that provide a mechanism for improving traffic management, deployments, and automation.

Quick Bytes - Lumigo Alerts

Lumigo Alerts allows you to create customized alerts for anything lumigo monitors, including event-based alert (e.g., timeout) and key metrics (e.g., error rate). The alerts can delivered via email or a multiple of platform integration options like slack, Microsoft teams and more. Make sure to subscribe so you don't miss out on any new livestreams and observability content! With one-click distributed tracing, Lumigo lets developers effortlessly find and fix issues in serverless and containerized environments

Lumigo for Colorcast: A Game of Metrics

At sports tech company Colorcast, debugging was running out the clock for their developers. They made a game-time decision to automate their tracing and monitoring, knocking out their average time to debug. Watch how they did it with Lumigo. Make sure to subscribe so you don't miss out on any new livestreams and observability content! With one-click distributed tracing, Lumigo lets developers effortlessly find and fix issues in serverless and containerized environments

Monitor Your AWS AppSync GraphQL APIs with Simplicity

TL;DR: Dashbird launched observability for AWS AppSync. Additionally to AWS Lambda, SQS, DynamoDB, API Gateway, ECS, Kinesis, Step Functions, ELB, SNS, RDS, OpenSearch, and HTTP API Gateway you can now get detailed insights and metrics in the Dashbird app for AWS AppSync. Since Facebook released its previously internally used query language GraphQL in 2016, it has seen an outstanding increase in adoptions for all kinds of applications.

Monitor 3rd-party outages in PagerDuty

We’ve integrated IsDown with PagerDuty so you can manage alerts in the same place you manage all your other alerts. The PagerDuty integration is part of our strategy to make it easy to monitor all the business dependencies that companies nowadays have. We live in a world where SaaS rules the world, and companies prefer to buy vs. build. But with that comes the problem of monitoring all these dependencies, which are critical to daily operations.

17 Best DevOps Tools to Use in 2022 for Infrastructure Automation and Monitoring

You must adopt proper infrastructure automation if you want to enable your teams to achieve faster application delivery while eliminating human errors. Automation of servers, deployment environments, configuration management, and deployments play a vital role in getting a competitive advantage for your product. Monitoring both the infrastructure and application is equally important as well. In this article, we will discuss top tools for infrastructure automation and monitoring.

What is AWS Kinesis?

Amazon Web Services (AWS) Kinesis is a cloud-based service that can fully manage large distributed data streams in real-time. This serverless data service captures, processes, and stores large amounts of data. It is a functional and secure global cloud platform with millions of customers from nearly every industry. Companies from Comcast to the Hearst Corporation are using AWS Kinesis.

Elasticsearch on Docker Tutorial | Elastic Docker Containers Configuration - Sematext

In this Elasticsearch/Docker tutorial, we will install and run an Elasticsearch cluster on a single Docker host. We will pull an Elasticsearch Docker image (and Kibana), create a Docker network for the cluster, and deploy it on a local host. Containerizing instances of Elasticsearch helps create a scalable and mobile infrastructure, while not sacrificing system performance. Follow along to create and configure a truly open-source Elasticsearch cluster in Docker.

EnerKey Reduces Energy Consumption in Commercial Buildings Using InfluxDB

Commercial buildings produce 16% of carbon dioxide emissions in the US, and the EPA estimates that 30% of the energy used by these buildings is wasted. Energy efficiency in commercial buildings is a vital aspect of the transition to greener systems worldwide to fight climate change.

Goats on the Road: What Customers Are Telling Us

The best part of my job is talking with prospects and customers about their logging and data practices. I love to talk about everything they are currently doing and hope to accomplish so I can get a sense of overall goals and understand current pain points. It’s vital to come up with solutions that provide broad value across the enterprise and not just a narrow tactical win with limited impact.

Datadog acquires Seekret

APIs are integral to the success of modern enterprises across a wide range of industries, such as finance, logistics, and manufacturing. They not only enable developers to build powerful business solutions by integrating with external applications, but also facilitate communication between internal services. This means that the ability to build reliable, highly-performant APIs—and govern their behavior and performance—is more important than ever.

Networking 101: What is a VLAN?

The idea of a VLAN, or Virtual Local Area Network, is simple enough, right? It’s hard to imagine any work in a modern networking/IT space without having encountered, set up, or managed a VLAN. Well if that’s all you needed to know about VLANs, we’d be done here, but there’s a bit more to it. To cover our bases: In most modern networks the primary purpose of the LAN is to provide connectivity to a wider network, most notably the internet.

Icinga Camp Berlin 2022: Meerkat - a new Icinga Dashboarding tool by Dave Kempe

Meerkat is an Open Source dashboarding tool, written in Go and javascript. It allow users to drag and drop Icinga API objects onto a background, plays sounds and even embed videos. Dave will give a tour of its features and a guide on setup and usage, with real-world examples.

Icinga Camp Berlin 2022: Current State of Icinga DB by Eric Lippmann

In recent years, the number of servers, virtual machines, services, applications, etc. that our customers and users monitor with Icinga has increased significantly. For very large environments, the IDO can be a performance bottleneck. With Icinga DB we’ve rethought everything to allow users to monitor massive amounts of data and bring exclusive features that weren’t possible before.

Icinga Camp Berlin 2022: Blerim Sheqa How Companies use Icinga by Blerim Sheqa

During the past months we’ve been in direct contact with enterprises to understand their Icinga story. As result we created multiple customer stories which differ in their use case. I want to exemplify how Icinga meets different requirements of organizations and helps them cover their monitoring demands.

How to Overcome Datadog Log Management Challenges

Datadog has made a name for itself as a popular cloud-native application performance monitoring tool, measuring a system’s health and status based on the telemetry data it generates. This telemetry includes machine-generated data, such as logs, metrics and traces. Cloud based applications and infrastructure generate millions (even billions) of logs – and analyzing them can generate a wealth of insights for DevOps, security, product teams and more.

How to deploy Grafana Enterprise Metrics on Red Hat OpenShift

Here at Grafana Labs, we’re always looking for ways to provide our customers with a choice of platforms where they can run Grafana Enterprise Metrics (GEM). As part of that mission, we’re pleased to announce that we’ve added Red Hat OpenShift 4.x support to GEM. GEM, as you may know, is a leading enterprise metrics solution.

What Are Semantic Conventions in OTEL?

OpenTelemetry (OTEL) is a big data platform that enables the collection and analysis of large-scale telemetry data. Many companies have adopted it for use in their products. In this post, we’ll discuss semantic conventions in OpenTelemetry and how they are used to make data processing easier. We’ll also discuss the different types of semantic conventions and their importance.

The Journey to Intelligent Payment Operations

In today’s payments ecosystem, the ability to monitor and use payment data effectively represents a real and essential competitive advantage. Intelligent operations should be a strategic goal for the entire company, and when executed properly, will enable you to build a future-proof payment operations infrastructure.

Data Observability Explained: How Observability Improves Data Workflows

Organizations in every industry are becoming increasingly dependent upon data to drive more efficient business processes and a better user experience. As the data collection and preparation processes that support these initiatives grow more complex, the likelihood of failures, performance bottlenecks, and quality issues within data workflows also increases.

Web Browser Update Problems: How to Monitor Website Performance Anomalies Caused by New Browser Versions

When new web browser versions are released, new bugs are inevitably introduced, which can degrade a website’s performance and increase the overall page load time. This can severely impact a user’s engagement and a business’s bottom line.

Static vs Dynamic Alert Thresholds for Monitoring

Every modern monitoring product will have some capabilities to leverage thresholds of some sort to automatically raise alerts when critical metrics pass a value that indicates something of concern may be occurring, such as a performance slowdown, resource constraint, or availability issue.

AIOps is Dead! Long Live AIOps!

Artificial Intelligence (AI) is all the rage these days. Everywhere you look, companies are promising to solve your ills by applying AI to whatever problem you’re trying to solve. It doesn’t seem to matter what area you are in; medical, research, education, technology, software or anything else. Someone, somewhere is offering an AI-based tool that will solve all your problems.

How a global leader in car audio technology reduced its downtime by 40% in 8 months with Applications Manager

As a pioneer of audio electronics solutions for all in-vehicle communication demands, this organization has delivered an enhanced driving experience to more than 50 million vehicles across the globe. With over 60 years of experience, this organization has been a trailblazer in the audio electronics industry. It has been recognized by Guinness World Records, NASSCOM, and many more for its contributions to the world of audio technology.

Expedite infrastructure investigations with Kubernetes Anomalies

Modern Kubernetes environments are becoming increasingly complex. In 2021, Datadog analyzed real-world usage data from more than 1.5 billion containers and found that the average number of pods per organization had doubled over the course of two years. Organizations running containers also tend to deploy more monitors than companies that don’t leverage containers, pointing to the increased need for monitoring in these environments.

5 Tips to Optimize Your Synthetic Monitoring

Synthetic monitoring is a useful tool that ensures your site is both UP and performs well, and configuration matters. Optimized synthetic monitoring looks for necessary elements along a focused goal pathway. A poorly configured check can add precious seconds to a Transaction and trigger unwanted Global Timeout errors. Today, we’re going to do a deep dive on tips and tricks used by Uptime.com Support and Development teams to improve and optimize the Transaction checks we use everyday.

Protect your StatusGator Account with Two-Factor Auth

StatusGator now supports Two-Factor Authentication, often called 2FA, a more secure way of signing into your account. Using an authenticator app such Google Authenticator, Authy, or a password manager like 1Password, you can now protect your StatusGator account with a second authentication factor, a one-time password (OTP) that you enter after signing in.

Outer Joins in Flux

Joins are a common transformation in any query language, and as part of the effort to make Flux an increasingly valuable tool for our users, the engineers on InfluxData’s query team created, and continue to maintain, two separate join functions. And while these solutions have met some of our users’ needs, they both lack one key feature: support for outer joins.

The Benefits of a More Modern, Agile, & Efficient Operations for MSPs

In today's business and IT environment, marked by constant change and transformation, if you’re a managed service provider (MSP), you are probably struggling to keep up. You’re faced with the need to modernize your operations, but the speed at which you can do so is constrained by years of accumulated complexity and chaos.

Obkio Network Monitoring App Tour

Obkio’s Network Monitoring SaaS app was born from a need within the industry to simplify network performance monitoring for modern, decentralized networks. What are some of Obkio’s features, and how can you use Obkio to troubleshoot network problems? We’re showing you how in this network monitoring app tour - told through screenshots.

Tracking Core Web Vitals with Honeycomb and Vercel

Google’s Core Web Vitals (CWVs) are used to rank the performance of mobile sites or pages. It’s easy to see when your CWV scores are low, but it’s not always clear exactly why that’s happening. In Honeycomb’s new guide, Tracking Core Web Vitals with Honeycomb and Vercel, you can learn how to capture, analyze, and debug your real-world CWV performance using a free Honeycomb account.

What is OpenTelemetry? What OTEL means for Observability in 2022

The growth of technology has led to more efficient and relevant digital experiences, and customers continue to expect more out of those interactions. That’s true no matter their location and no matter which device they choose to use. Companies that cannot provide these kinds of personalized interactions for their customers find themselves falling behind the competition as technology continues to advance.

Founder & Friends: How to get the most out of Raygun (Live Demo)

In our first-ever episode of Founder & Friends, we go straight to our VP of Product, Zheng Li, for best-practice tips and advice for getting the most out of Raygun.‍ Raygun is known for its quick and easy setup, but there are some simple and lesser-known ways to set you and your team up for success. In this live demo, we make sure your account is optimized for your workflow, captures quality diagnostics (without the noise), and quickly alerts you to problematic code.

Synthetic web tests - Moving up the stack

In this short explainer and demo, Kentik's Phil Gervasi shows how Kentik is moving up the stack to monitor application activity on the network. Phil explains the differences between proactive and passive network monitoring and demonstrates three new synthetic tests that relate to app performance monitoring. Kentik's suite of application-focused synthetic tests give you proactive visibility into application activity on your network. Using the HTTP test, Page Load test, and Synthetic Transaction Monitoring, you can monitor a user's digital experience and troubleshoot problems as they happen.

Splunk Snags Six 'Best of' Awards From Customer Reviews on TrustRadius

Splunk is honored to be the recipient of a series of six new awards from TrustRadius—all based on customer reviews. In this round, TrustRadius grants its “Best of” Awards to the top three products per Best Feature Set, Best Value for Price and Best Relationship in each respective category.

Monitorama 2022: the good, the bad and the beautiful (Part 2)

ICYMI, this year’s Monitorama marks a return to the in-person event following a pandemic hiatus. In Part 1 of this series, I shared what it was like to navigate a tech conference in the post-pandemic world and the most engaging themes of the conference including tracing, SLIs, OpenTelemetry, and more. Now for Part 2, let’s dive into the talks. As usual the Monitorama talk selection team did a bang-up job. Every talk was interesting, but a few jumped out at me for some very specific reasons.
Sponsored Post

The top 12 APM tools for 2022

Application performance monitoring (APM) tools give you insight into the server-side performance of your website or application. From increased uptime and improved user experiences to reduced risks and decreased expenses, it provides an array of business benefits that help you move faster than competitors and deliver more value to users. So it comes as no surprise that, according to analysis by Emergen Research, the global market for application performance monitoring (APM) tools is expected to reach $15B in 2028, an impressive uptick from 2020's $6.54B.

Simulate business critical user journeys with synthetic monitoring

Today’s user journey is much more complicated than ever and has completely changed the perception of how business critical functions are managed and maintained to support customer expectations for flawless digital experiences. The journey that users take when visiting your website will vary depending on your business model. An e-commerce site, for example, might involve user interactions that go from product selection to shopping cart to payment transaction.

Automate incident response workflows with Eventarc and Datadog

Eventarc is a Google Cloud offering that ingests and routes events between GCP products, such as Cloud Run, Cloud Functions, and Pub/Sub, making it easy to build automated, event-driven workflows in complex environments. By taking care of event ingestion, delivery, authorization, and error handling, Eventarc reduces the development overhead that is required to build and maintain these workflows and helps you improve application resilience.

Simplify microservice governance with the Datadog Service Catalog

Moving from a monolith to microservices lets you simplify code deployments, improve the reliability of your applications, and give teams autonomy to work independently in their preferred languages and tooling. But adopting a microservices architecture can bring increased complexity that leads to gaps in your team members’ knowledge about how your services work, what dependencies they have, and which teams own them.

Middleware technologies connect the enterprise

The explosion of APIs, devices, applications, and data sources has complicated the task of building connectivity across the enterprise. As organizations are connecting to applications outside of their four walls, they risk becoming fragmented. Moreover, existing on-premise systems, such as AS/400 and ERPs, need to be able to communicate both internally and externally.

Top 15 Best Website Performance Monitoring Tools & Software of 2022

Website performance is important as it directly impacts your business bottom line, this is why picking the right website monitoring service crucial! They perform regular tests and alert you whenever your site is down, making it easier for you to spot track down issues and solve them. There are lots of options out there from simple uptime monitoring or transaction monitoring to complex web performance monitoring solutions.

Monitor Alpine Linux ARM Hosts with AppSignal

Today, we're launching ARM support for machines running Alpine Linux. This feature is available for our Ruby and Elixir users! We hope to add support for Alpine Linux ARM to our Node.js package in the future. The ARM CPU architecture is becoming more and more popular. As it powers people's development machines and production servers, we decided to add it to the list of the operating systems we support.

5 best practices for cloud cost optimization that you should never miss- Site24x7 CloudSpend

Before we jump into cloud cost optimization, let us address the elephant in the room. Businesses are moving to the cloud but are struggling with unpredictable cloud bills. If you are a business owner who has moved to the cloud recently, you need to understand each cloud touchpoint and get a transparent view of your cloud services. When it comes to cloud cost optimization, there are many tools and techniques that organizations can adopt. Most of these can only take you so far.

A Complete Guide to Tomcat Monitoring: How to, Metrics & Choosing the Best Tools

The Apache Tomcat is an open-source implementation of the Jakarta Servlet, Jakarta Server Pages, Jakarta Expression Language, Jakarta WebSocket, Jakarta Annotations and Jakarta Authentication specifications, all being a part of the Jakarta EE Platform. That is the official description of Apache Tomcat.

Introduction to Cloud Native

User experience is the pinnacle of cloud technology. With cloud data centers handling 94 percent of all workloads, cloud optimization is vital. Users need fast, agile, scalable, and stable solutions over the long term. But how do you build these solutions? This is where cloud-native technology comes in. Cloud native computing provides the foundation for building, designing, running, and managing applications in the cloud.

How to Import/Export Orion Modern Dashboards

The flexibility available in Modern Dashboards on the SolarWinds Orion Platform is nothing short of amazing. The dashboards are quick and easy to build and share. We'll guide you through the process of how to import a Modern Dashboard from THWACK and how to export your own to share with the SolarWinds community.

How activist engineers use Grafana Cloud to improve global air quality

With climate change and other environmental factors causing pollution rates and ground-level Ozone levels to climb, poor air quality is an increasingly growing global problem. In fact, fossil air pollution is responsible for 1 out of 5 deaths worldwide, according to a 2021 study conducted by Harvard University.

What is eBPF and what are its use cases

With the recent advancements in service delivery through containers, Linux has gained a lot of popularity in cloud computing by enabling digital businesses to expand easily regardless of their size or budget. These advancements have also brought a new wave of attack, which is challenging to address with the same tools we have been using for non cloud-native environments.

Top Tips for NodeJS Tracing and Debugging on AWS Lambda (Part 1)

In this two post series, we are going to explore some ways to trace and debug NodeJS Lambda applications. Delving into some methods to look further into resources utilized to and some methods to optimize code. AWS Lambda, an event-driven compute service first introduced roughly eight years ago, changed how we build out cloud applications as an industry.

What Is Database Monitoring, and Why Is It Still Important?

The digital database has come a long way since its infancy in the 1960s. Modern databases do much more heavy lifting than their simpler predecessors and have become sophisticated storehouses for both unstructured and structured data. Businesses still rely heavily on databases, and with advances in database monitoring technology, teams can protect their data like never before.

Introducing Splunk Operator for Kubernetes 2.0

The Splunk Operator for Kubernetes team is extremely pleased to announce the release of version 2.0! This represents the culmination of many months of work by our team and continues to deliver on our commitment to provide a high-quality experience for our customers wishing to deploy Splunk on the Kubernetes platform.

Pros and Cons of Installing the OpenTelemetry Collector

The OpenTelemetry Collector is an application written in Go. The GitHub readme does a great job of describing it: So the OpenTelemetry collector is a Go binary that does exactly what its name implies: it collects data and sends it to a back-end. But there’s a lot of functionality that lies in between. What a neat service! A local destination for data that handles the final sending of Open Telemetry information to your back end.

The most complete comparison: Pandora FMS Open Source vs Pandora FMS Enterprise

Pandora FMS Open Source is not a freemium software, it is not bloatware nor shareware (*Wink for those born before the 80s). Pandora FMS is licensed under GPL 2.0 and the first line of code was written in 2004 by Sancho Lerena, the company’s current CEO. At that time, free software was in full swing and MySQL was still an independent company, as was SUN Microsystems.

Alerting Techniques for an observable platform

Observable and secure platforms use three connected data sets: logs, metrics, and traces. Platforms can link these data to alerting systems to notify system administrators when an event requires intervention. There are nuances to setting up these alerts so the system is kept healthy and the system administrators are not chasing false positive alerts.

Monitor your GitHub Actions workflows with Datadog CI Visibility

GitHub Actions provides tooling to automate and manage custom CI/CD workflows straight from your repositories, so you can build, test, and deliver application code at high velocity. Using Actions, any webhook can serve as an event trigger, allowing you, for example, to automatically build and test code for each pull request. Datadog CI Visibility now provides end-to-end visibility into your GitHub Actions pipelines, helping you maintain their health and performance.

Sponsored Post

Five Reasons to Make the Shift to Service Monitoring with AIOps

Improvements in the performance and accessibility of technology have changed our expectations for how applications should work and, by extension, the way we work. For example, three years ago only 6% of workers were remote. According to the 2021 Upwork "Future Workforce Report," that number is now 22%, and remote workers are expected to reach 28% of the workforce by 2025. As more and more people are let loose from their office tethers, they bring with them a belief that their organization's services and applications should work as they did before. What's more, expectations extend from the workplace to the marketplace.

Sponsored Post

How to Choose a Microsoft Teams Monitoring Solution

When end-users constantly complain about bad network quality or poor audio video conference experience, you know it's time to shop for a Team's monitoring tool. And your Teams monitoring solution needs to be proactive, provide insight into hybrid work environments, and support real-time diagnosis of network issues for your end-users no matter where they work from. Business leaders rely more than ever on technology teams to deliver a successful company ROI. But problems with complex Teams deployment and unsatisfied workers can lead to increased costs and lower productivity.

Datadog on Informed Product Development

Datadog is an observability and security platform. That means that our users may be in a high stress situation: debugging an issue in production, managing an incident or responding to a security threat. Having a good UX is particularly critical in those cases. User interviews are very helpful, but after a product has been released in production we are able to gather a lot more data to understand how customers interact with it and make decisions about how it can be improved.

How to Leverage Cribl and Exabeam: Parser Validating

Organizations leverage many different cybersecurity and observability tools for different departments. It’s common to see the IT department using Splunk Enterprise, while the SOC uses Exabeam. Both of these tools use separate agents, each feeding different data to their destinations. Normally this isn’t a problem unless you’re talking about domain controllers. Domain controllers only allow a single agent, meaning you can’t feed two platforms with data.

Cribl.Cloud Simplified with Consumption Pricing

One year ago, we launched Cribl.Cloud as a cloud-hosted option for our industry-leading data pipeline product, Cribl Stream. Customers had a choice of either deploying on-premises with a subscription-based tiered license model or opting for our cloud service with a similar tiered billing model. Fast-forward one year, and Cribl is now a multi-product company with several unique observability products (Stream, Edge, AppScope, and soon Search) to offer our customers.

Importance and Tips of Condition-Based Monitoring Maintenance

Usually, maintenance is performed based on two factors first is when asset failure has occurred and second is scheduled maintenance. In both cases, asset maintenance is done when the asset is in good condition or when it is too late. This is where condition-based monitoring maintenance comes into play. This maintenance is proactive, and maintenance is given to assets when it is required. For this purpose, assets are inspected regularly, several asset tracking techniques are used such as IoT.

Sponsored Post

New Modern Data Stack for AIOps as a Service

Data laying all around an enterprise’s premises and over the cloud is of no use unless it forms part of a bigger and clearer picture. This is what a data stack does by helping enterprises leverage data to its fullest potential- it turns raw data into insights that can be acted on and lead to business benefits. The complicated modern enterprise of today cannot make do anymore with the obsolete ways of data management.

Grafana Alerting video: How to create alerts in Grafana 9

With the Grafana 9.0 release, we rolled out the new and improved Grafana Alerting experience, which is now the default alerting system across all of our products. Along with introducing significant improvements to Grafana Alerting based on community feedback and more robust alerting documentation to guide our users, we also created easy-to-follow video tutorials to help you get started with creating alerts.

Nexthink Named a Leader in Forrester Wave Report!

We’ve got exciting news: The Forrester Wave™: End-User Experience Management, Q3 2022 report has been released – and Nexthink has been named a leader in End-User Experience Management! In case you’re unfamiliar, this report provides a comprehensive evaluation of the nine most significant end-user experience management (EUEM) providers by one of the world’s leading research and advisory firms.

What Is Helm in Kubernetes?

Helm is a deployment tool that simplifies installing, configuring, and managing Kubernetes clusters. Anyone familiar with writing Kubernetes manifests knows how tedious it is to create multiple manifest files using YAML. Even the most basic application has at least 3 manifest files. As the cluster grows, the more unwieldy the configuration becomes. Helm is one of the most useful tools in a developer’s tool belt for managing Kubernetes clusters.

Announcing the General Availability of Synthetic Monitoring Within Splunk Observability

Today we’re proud to announce the general availability of best-in-class Synthetic Monitoring capabilities within Splunk Observability Cloud. Now, IT and engineering teams can proactively measure, monitor and troubleshoot their critical user flows, APIs and services, connected across Splunk Observability.

Postcard From .conf22: Customers Inspire Our Latest Release

They say, “What happens in Vegas, stays in Vegas,” but I wanted to highlight the role our customers played at last month’s.conf22, our annual users’ event at the MGM Grand. It was awesome meeting customers in person again, and connecting virtually with thousands more. We had a terrific turnout with 8,200+ customers and partners representing 113 countries and more than 6,500 organizations.

Prometheus Query Tutorial with examples

Monitoring tools are only as good as the range and visibility they offer admins into applications’ performance. Prometheus is a metrics monitoring tool that provides a pull-based system to collect and monitor time-series samples. Once the data is collected and stored, you can use Prometheus Queries to interact with the data: select and aggregate across the provided dimensions. This article takes the reader from concept to content state about the Prometheus Query language.

Get better visibility into DevOps performance in one place with Atlassian integrations

Every company is a software company and every company wants to get better at it. That’s why Sumo Logic built a set of integrations with Atlassian DevOps solutions. Leveraging data from Atlassian, Sumo Logic now enables you to visualize the key, actionable insights behind the DevOps Research Assessment (DORA) metrics to continuously improve your software delivery performance. Sumo Logic’s observability platform presenting Atlassian data brings the following benefits, to name a few.