Operations | Monitoring | ITSM | DevOps | Cloud

September 2022

Monitor Azure Container Apps with Datadog

Azure Container Apps is a serverless platform that enables you to deploy containerized applications and microservices—regardless of their code or framework—without managing any underlying cloud infrastructure or orchestrators. By using serverless containers, Azure Container Apps can automatically scale based on HTTP requests or events supported by Kubernetes event-driven autoscaling (KEDA) in order to accommodate peak demand and meet your budgeting goals.

Run Datadog Synthetic tests in Azure Pipelines

Continuous integration (CI) demands continous testing: shifting left helps prevent faulty code from spreading, which is one of the core aims of CI. Datadog’s new Azure DevOps extension enables you to seamlessly incorporate integration and end-to-end tests into existing CI/CD workflows on Azure Pipelines, a dedicated CI/CD service that automatically runs builds, performs tests, and deploys your services and applications via cloud-hosted pipelines.

Why OpManager is the perfect tool to detect and troubleshoot Wi-Fi performance issues

Wi-Fi has untangled humanity, quite literally. Those jumbled Ethernet cables with their confusing ports are banished to backstage. Now, all we need is one connection to a Wi-Fi router and voila! Internet. Wi-Fi networks simplify your IT infrastructure. WLAN networks makes an office look less like a scene from The Matrix and more like a creative workspace, but it can bring an array of difficulties as well. Unlike the sturdy Ethernet cable, Wi-Fi connections fluctuate.

RabbitMQ vs. Kafka

Processing, storing, and sending data is at the heart of how we communicate and get business done. This involves implementing various applications, software, and mobile devices that together form an intricate web to process data and information. Programmers will often use message brokers to facilitate this constant flow of information. Message brokers and Pub/Sub messaging systems are instrumental in allowing applications, services, and systems to communicate effectively.

GripMatix releases new SCOM MP for Citrix DaaS

We are happy to announce that we have released a brand-new SCOM Management Pack for monitoring Citrix Desktop as a Service (DaaS). Citrix DaaS is a Citrix Cloud service that allows you to securely deliver virtual apps and desktops from on-premise data centers and public cloud platforms in a hybrid deployment.

Create Alerts on Cloud Monitoring

Are you interested to know about alerts in Cloud Monitoring? Would you like to know how to create metric based alerts for Google cloud products through cloud monitoring? In this video we introduce you to Alerts in Cloud Monitoring, how it works, the different types of alerting policies. Watch this video to learn how to create metric based alerts for Google cloud products.

InfluxDB is Once Again a Leader in G2's Fall 2022 Reports

G2 has released their Fall 2022 reports, and we are thrilled to share that InfluxDB – the purpose- built time series platform, has once again ranked #1 in the G2 Grid for Time Series Databases. InfluxDB has also held its leading position in the Momentum Grid for Time Series Databases. The Momentum Grid® identifies products that are on a high growth trajectory based on user satisfaction scores, employee growth, and digital presence.

Driving Efficiency with Custom APM Dashboards

Have you ever struggled to have efficient visibility into your APM and log data? Have you ever been called on to display real-time data to your Sales or Marketing department, only to find yourself fumbling over the numbers without a way to display relevant data? Look no further! Retrace collects huge amounts of data about your application’s health and performance, then provides a customizable display in one place – your customizable APM Dashboards.

The Future of Ops Is Platform Engineering

Two years ago I wrote a piece in The New Stack about the Future of Ops Careers. Towards the end, I wrote: I described the second category as “operations engineering minus the infrastructure,” dedicated to evaluating and assembling a production stack of third-party platform providers, enabling software engineers to self-serve their services and own their own code in production. I said: That second category I was describing now has a name. We call those teams "platform engineering.".

UC Today Deep Dive: Why Microsoft Teams Performance is Critical

UC Today, the leading online publication for Unified Communications and Collaboration technology, recently did a deep dive on Martello’s Microsoft-recommended Vantage DX solution for monitoring Microsoft Teams and found out that delivering exceptional user experiences and avoiding a Microsoft Teams outage can be easy with the right solution in place.

Unity Exception Handling: A Beginner's Guide

Exceptions are the outcomes you do not usually expect in your application. But as a developer, expecting the unexpected is essential to capture exceptions and handle them appropriately. Exception handling is not only applicable to web development projects but also to Unity applications. This article brings everything you need to know as a beginner to Unity exception handling, including methods to handle exceptions, when to use them, and how to manage exceptions easier using distributed logging.

Python Performance Testing: A Comprehensive Guide

The following guest post addresses how to improve your services’s performance with Sentry and other application profilers for Python. Check out this post to learn more about application profiling and Sentry’s upcoming mobile application profiling offering. We’re making intentional investments in performance monitoring to make sure we give you all the context to help you solve what’s urgent faster.

Troubleshoot SasS Experience Packet Loss at the User Edge

In this demo, we showcase Experience-Driven NetOps from Broadcom Software that includes AppNeta digital experience monitoring and DX NetOps network monitoring software. The combined solution provides the most comprehensive visibility for true end-to-end user experience network delivery assurance for today's network operations teams.

10+ Best MongoDB Monitoring Tools and Services [2022 Comparison]

MongoDB is a cross-platform NoSQL database that uses JSON-like documents with optional schema to store data. It was designed for high availability, high performance for high-data persistance use cases, and automatic scaling. Of course, all with the right infrastructure in mind. It is usually a good choice for document-oriented use cases when you need quick prototyping or massive scale. With the massive scale comes massive traffic, though.

3 Trade-offs to Consider When Deploying Apache Kafka in the Cloud

Maximizing the value of streaming data requires carefully navigating operational tradeoffs when developing and managing cloud native applications. Organizations that are rapidly producing and processing high volumes of data — like Netflix, Salesforce, Shopify and even the United States Postal Service (USPS), are constantly applying and testing new methods to manage the complexity of data streaming in the cloud.

Validate your skills with our new Datadog Certification Program

The Datadog platform has evolved to meet the needs of organizations that are investing in cloud-based solutions and modernizations. These organizations need professionals that are highly skilled and understand how to get the most out of the Datadog platform. With our suite of products, features, and tools in mind, we wanted to offer a path for individuals to demonstrate their knowledge of our best-in-class monitoring platform and their understanding of observability best practices.

Key Observability Scaling Requirements for Your Next Game Launch: Part III

So far in our series on scaling observability for game launches, we’ve discussed ways to 1) quickly analyze large volumes of telemetry data and, 2) ensure high-quality telemetry data for more effective analysis at lower costs. The best practices in these blogs outline best practices for scaling observability during game launch day – which is necessary to ensure high performance across all infrastructure components – to ensure no lag, no glitches, and no bugs.

What's new in Sysdig - September 2022

Welcome to the September edition of What’s New in Sysdig in 2022! I’m Ayu Shah, Principal Sales Engineer based out of San Francisco Bay Area. I joined Sysdig a little over six months ago and it has been an exciting journey to say the least! I have worn many hats in my career, from Software Engineering to Sales and everything in between. I am excited to share some updates to What’s New in Sysdig for this month!

Understanding Domain-Agnostic v. Domain-Centric AIOps Platforms

No matter what we do, we’ll always be surrounded by choices. Do I save money and take the bus, or do I spend money filling up my gas tank? Do I make dinner at home, or do I eat dinner out? Whatever the outcome, it’s our needs – what we require and what we can afford – that help guide us to where we should go. Technology is no exception. Especially in AIOps.

Interlink Enterprise AIOps App - Visualize and manage operational health in a single app.

The power of Enterprise AIOps at your fingertips. The Interlink Software Enterprise AIOps mobile app meets the performance and usability levels of consumer apps, delivering single pane visibility of the operational IT health of your organization to a wide variety of personas.

How to Identify Unused, Wasted and Orphaned Azure Resources and Reduce Azure Cost?

Every article you find about reducing Azure cost will invariably mention ensuring any unused (especially PAYG) resources are identified and shut down/deleted. However very few actually will tell you “HOW” to do this. eG Enterprise has added new functionality to enable you to identify unused, wasted and orphaned Azure resources and services without the need to resort to hacking up KQL queries or PowerShell scripts.

Digital Enterprise Journal (DEJ) Names Checkly a Leader in Monitoring for Cloud Native Environments

The demand for continuous innovation and faster delivery requires a fresh approach to monitoring modern apps and APIs. As development environments grow increasingly dynamic and complex, monitoring performance through a platform that is fully programmable, handles app and API-testing, is optimized for developers, and integrates with existing tools and workflows becomes increasingly critical.

Microsoft Teams Lagging? How We Fixed the Issue in 10 Minutes.

When was the last time you had to ask someone to repeat something during an MS Teams call, or had to restart the app, just because the call quality dropped? Sounds familiar? Frustrating, isn’t it? Now imagine hundreds, if not thousands, of employees feeling this frustration at work. Think of the time lost for them, for the organization, and—more importantly—for those who’ll need to fix it. As EUC professionals, we cannot let that happen.

Papertrail and AWS SNS

Operating a modern web application requires having a good handle on what’s going on with your application’s log data. But this doesn’t mean you need to keep the logs pulled up all the time. Sometimes, you want the flexibility to send important logged events to any number of places. The integration of SolarWinds® Papertrail™ and AWS Simple Notification Service (SNS) can help you make this happen.

FAQ: SquaredUp Cloud

SquaredUp Cloud has been in development for over two years (we first previewed it at SquaredUp Live, Spring 2021). It continues our mission to unlock and summarize data – think of it like “BI for engineering”. In building SquaredUp Cloud, we drew upon what we’ve learned with our Microsoft solutions over the last ten years, and built a solution independent of any one tool, like SCOM.

Getting Started with Apache Kafka and InfluxDB

The number of applications and services increases every day as more application architectures move towards microservices or serverless structures. You can process this increasing amount of time series data with real-time aggregation or with a calculation whose output is a measurement or a metric. These metrics need to be monitored so that you can solve issues and make relevant changes in your system quickly. A change in a system can be captured and observed in many ways.

Sixth Street Breaks Down Silos and Deploys a Streamlined Logging Solution with ChaosSearch

As high-scale Financial Technology (FinTech) companies build on their data-driven capabilities and establish product-market fit, they often experience rapid growth that creates challenges for IT and DevOps teams.

Why and How to Monitor Amazon OpenSearch Service

Some time ago, AWS forked ElasticSearch, the most popular search engine on the planet. They had some struggles with the maintainer of ElasticSearch and decided it was time to part ways. So, with OpenSearch, there is now a new kid in town. Well, not new, but at least some kind of alternative.

Transform Data into Actionable Insights to Empower Digital Experiences

Today we are living and working in a world that is digital-first and hybrid by design, with cloud, SaaS and legacy technologies working together, and employees working from everywhere. In this world, a click is everything. That action comes with intent and expectation—of a flawless digital experience. These experiences are the heartbeat of the fierce and competitive landscape we all work in.

Flask Monitoring and APM Benefits

Many renowned businesses use the well-known web framework Flask. Flask is quite famous among developers for making small and full-fledged applications. It is known for being a straightforward framework to learn hence why it is popular among established organizations. Monitoring your Flask application can be a challenge. Many important operations are happening inside of them, and if anything goes wrong, it can cause some damage.

Make Alerts Meaningful Again! Minimizing Alert Noise with Netreo

Alert noise, as well as false positives or too few alerts, undermine the effectiveness of any monitoring solution. Inaccurate alerts condition users to draw poor conclusions. Too many alerts contribute to serious alerts going undetected. Too many false positive or non-actionable alerts cause the significance of all alerts to diminish over time. And too few alerts can lead to misreading system performance and missing critical problems.

Exciting News About the Cribl Certified Observability Engineer Program!

At Cribl, we want to make it as easy as possible for anyone to learn about our products. Whether you’re a potential future customer, new user at an existing customer, a partner, we believe knowledge about our products should be free and easy to consume, convenient to access at any time and at a pace desired by the learner. We're excited to announce that we've issued our 1000th certification!

Share your status dashboards with your team and customers

We’ve just released a new feature! Presenting Public Dashboards. It is one of the most requested features in the last 6 months. People wanted to have an easy way to share the dashboard with their team and customers. Now it’s possible! And the best news is that it is accessible to every plan.

Introducing Logz.io's New Metrics Integration for HashiCorp Consul with OpenTelemetry

HashiCorp Consul began as an open-source project for service discovery. It has evolved to provide other valuable functionality like secure service mesh to help secure microservice architectures based on service identity, but also the ability to achieve repeatable application deployment lifecycles via Network Infrastructure Automation and control access to the service mesh via Consul API Gateway.These features are considered the four core pillars of Consul service networking.

The Complex But Elegant Relationship Between AIOps and Observability

Digital transformation requires organizational evolution. Constant demand for rapid delivery of upgrades and new products forces change. Surely, the old days of managing monolithic applications housed in private servers are over. Applications consist of virtualized, containerized, and serverless code that’s networked via APIs across a hybrid infrastructure of public and private clouds.

Practical Guide on Setting up Prometheus and Grafana for Monitoring Your Microservices

Observability is a very important aspect of software that’s often taken for granted. You need to have visibility into what your application is doing at different levels to better understand an issue when it occurs. There are multiple open-source tools and initiatives to help you achieve improved visibility. When we talk about observability, there are three parts to consider: logs, traces and metrics.

New video: How to visualize your traces - tools and new ideas

In microservices, distributed tracing is a method for aggregating all the operations that occur in your distributed systems that were triggered by a specific request. If these traces are visualized, developers can gain insights into how their service behaves when it’s run with other services, which helps them understand why errors occur.

Anatomy of an OTT traffic surge: Thursday Night Football on Amazon Prime Video

This fall Amazon Prime Video became the exclusive broadcaster of the NFL’s Thursday Night Football. This move continued Prime Video’s push into the lucrative world of live sports broadcasting. While they had previously aired TNF, as it is known, this is the first season Amazon Prime Video has exclusive rights to broadcast these games. As you can imagine, airing these games has led to a surge in traffic for this OTT service.

Creating Custom Functions With Tips from InfluxDB University

Flux is InfluxDB’s functional data scripting language. It’s made to query, process, analyze, and act on data. It’s very powerful and is built and optimized for time series. There are so many things you can do with Flux it can be hard to know where to start. This August, InfluxDB University launched a free Intermediate Flux course taught by experts that can take your Flux skills up a notch.

What is an Observability Engineer?

What is an observability engineer? Is it your SIEM admin? How about your application performance monitoring admin? Neither? Both? Observability engineering is more than administering a tool. There is more to it than data onboarding, writing parsers, and getting data in. As an observability tool admin, you work with data producers and consumers to get data in a human-readable and searchable format from the source to the analytics system.

Getting Started with OpenTelemetry: Three Companies Check Into OTel Observability

Comprehensive observability starts with good instrumentation. OpenTelemetry, aka “OTel,” sets a unified standard, enabling you to instrument your applications once, then send that data to any backend observability tool of choice. OpenTelemetry’s standard for generating and ingesting telemetry data is slated to become as ubiquitous as current container orchestration standards. Because of this, development teams are increasingly adopting OpenTelemetry to their applications.

Defining and measuring your SLIs and SLOs

Customers expect that online services are available all the time. The truth is that outages happen to almost everyone because providing 100% service availability is challenging and costly. Creating reliable and profitable service is, amongst other things, finding the balance between application availability, costs and time to market. Faster feature delivery means less availability as constant changes to production may cause issues and introduce bugs.

Investigate critical alerts on the go with the Datadog mobile app

The Datadog mobile app provides real-time visibility into critical alerts, incidents, and application performance metrics across your entire environment, helping you troubleshoot directly from your mobile device. On-call engineers can quickly evaluate the conditions that triggered an alert, determine its urgency, and decide the next course of action—anywhere, anytime.

Inside the migration from Consul to memberlist at Grafana Labs

At Grafana Labs we run a lot of distributed databases. These distributed databases all make use of a hash ring in order to evenly distribute workloads across replicas of certain components. For a more detailed description of the architecture of our projects, check out our Mimir architecture docs.

Code-level Application Monitoring for Every Developer

The monitoring, tooling, and observability space is crowded. It’s hard to keep track of what most tools in this category originally set out to do— but if we had to guess… they were probably built to support monolithic architectures with complex systems, to give Ops and IT a way to minimize the impact of an outage.

How I monitor cloud application costs in one simple but powerful dashboard

Although there are many great tools out there to get on top of application monitoring, there’s one vital metric that’s often overlooked by us technical folks – cost. In the days of running apps on servers in private datacenters, the kit was a one-time purchase that the systems team had to deal with. But running apps in public clouds is a different story. Whether you’re running on VMs, containers in Kubernetes, or entirely serverless, execution of your code adds to the bill.

Network Log Archiving = Perfect Backwards Visibility

Network monitoring is ideal for getting a real-time view of your connected environment, and with reports, you can look back in time too. Logs are key to this rear-view mirror look, as they contain all the data for all the elements you are monitoring. But without network log archiving, you can only look back so far. Did you know that according to an IBM/Ponemon study, it takes an average of 287 days to discover and contain a data breach?

Benefits of Apache web server monitoring tools

Incorporating Apache web server monitoring into your IT infrastructure management strategy can help identify performance bottlenecks preemptively. This proactive monitoring approach provides data necessary to ensure that your web server is up to the task and make optimizations if needed. Guaranteeing your customers a smooth and hassle-free user experience could go a long way into cementing their trust towards your organization.

Optimize your .NET application performance with the Datadog Continuous Profiler

.NET is a framework built by Microsoft that simplifies the complexities of developing cross-platform applications. Using.NET, developers can create powerful applications with rapid response times and more. We’re excited to announce that the Datadog Continuous Profiler now provides general support for.NET applications, including.NET Framework, .NET Core, and.NET 5+.

8 Best Real User Monitoring Tools and How to Choose One [2022 Review]

Staying in control of your users’ digital experience and their level of satisfaction is the most important thing you can do as a software-based business. Yet, that’s impossible to do without a monitoring strategy and tools that enable you to visualize how customers interact with your app or website from their perspective.

How to Monitor Azure Virtual Desktop (AVD) Technology

One question many administrators are asking is: How can I effectively and efficiently monitor Azure Virtual Desktop technology? There are several options for monitoring Azure Virtual Desktop technology and this blog will cover some of the most popular ones. In this blog, we will focus on multi-session, native Azure virtual desktops.

Channel 4 Suffers Website Outage During The England Vs Germany Game

Last night the Channel 4 website and app suffered an outage leaving football fans unable to stream the game. The highly-anticipated game between England vs Germany took place at Wembley Stadium on Monday, September 26. Channel 4 had the rights for this match and streaming was exclusively on their website and app. Kick off was at 7.45pm, with coverage from 7:00pm. The first Channel 4 website issues reported on Down Detector came in at 7:32pm and issues continued throughout the night.

Part 6: Observability Maturity Model Summary

For decades, IT operations teams have relied on monitoring for insight into the availability and performance of their systems. But the shift to more advanced IT technologies and practices is driving the need for more than monitoring – and so observability evolved. With infrastructures and applications that span multiple dynamic, distributed and modular IT environments, organizations need a deeper, more precise understanding of everything that happens within these systems.

Virtual CISO Services: A New Revenue Stream for MSPs?

As you look to optimize your MSP’s growing business, it’s going to become more and more important to maintain an acceptable return on your investment. To do this, you’ll need to find services that increase your gross margin on every client engagement. Virtual CISO services can greatly help in this function. While solutions like Auvik already help MSPs have visibility inside of client networks, we want to take this a step further with the addition of vCISO services.

Demystifying Observability and Making it Work for You

This article is the final installment in a series that demystifies observability. The first three focused on the history of observability, dispelling myths around observability, and what observability is and what it can offer. In this last article of the series (Check out part 1), I want to offer a complete definition of observability.

Sense and Signals

Complex, distributed software systems are chatty things. Because there are many components interoperating amongst themselves and with things outside their bounds like users, those components and the systems themselves emit many information signals. It’s the goal of monitoring, logging, and observability (o11y) tools to help the systems’ “stewards,” those developers and operators tasked with maintaining and supporting them, make sense of those signals.

How to build machine learning models faster with Grafana

Armin Müller is the co-founder of ScopeSET. ScopeSET specializes in R&D work to build and integrate tools in the model-based systems engineering domain, with a track record of more than 15 years of delivering innovative solutions for ESA and the aerospace industry. Training machine learning models takes a lot of time, so we’re always looking for ways to accelerate the process at ScopeSET. We use open source components to build research and development tools for technical companies.

UiPath Robotic Process Monitoring for Splunk - Demo Walkthrough

This video provides a walkthrough of the out of the box Dashboards that come with the Splunk App for RPM. Once you have configured the data inputs you can quickly get value out of the app for monitoring your UiPath Robotic Process Automation (RPA) deployment. There is also a Splunk Alert Action built in which allows you to take action in UiPath API based on Data in the Splunk indexes.

Introducing Cloud Logging - Log Analytics, powered by BigQuery

Logging is a critical part of the software development lifecycle allowing developers to debug their apps, DevOps/SRE teams to troubleshoot issues, and security admins to analyze access. Cloud Logging provides a powerful pipeline to reliably ingest logs at scale and quickly find your logs. Today, we’re pleased to announce Log Analytics, a new set of features in Cloud Logging available in Preview, powered by BigQuery that allows you to gain even more insights and value from your logs.

How to Scale Your Alerts Beyond PromQL with Coralogix Flow Alerts

When building alerts, engineers aim to create accurate, timely, and actionable alerts. In pursuit of this goal, many engineers will leverage PromQL throughout their careers. PromQL is the query language used by Prometheus and Alert Manager to query metrics and define alerting rules. While PromQL works very well for simple use cases, as infrastructure scales, architectural patterns grow more complex, engineering practices accelerate, and alerting use cases become more multivariate.

Deploy your Next.js application on Vercel using Sentry and GitHub Actions

Thanks to the power of open source tooling and cloud services, shipping an application to production has never been that easy, In this blog post, we are going to go from bootstrapping a Next.js application to deploying it on Vercel. We will use Github Action to handle the Continuous Integration and Sentry to monitor the application once it is deployed to be warned of any problems as soon as it arrives.

Why You Need Synthetic Monitoring

Synthetic monitoring can be one of the most powerful tools in your DevOps team’s toolkit, especially for the SRE, yet is one that is often overlooked by people building out a reliability mindset. Synthetic monitoring permits you to simulate any transaction or interaction users can have in your website or app, from places around the world, as often as you’d like.

An Open Source Observability Platform | SigNoz

Cloud computing and containerization have brought many benefits to IT systems, like speed to market and on-demand scaling. But it has also increased operational complexity. Applications built on dynamic and distributed infrastructure are challenging to operate and maintain. A robust observability framework can help application owners stay on top of their software systems. In this article, we will introduce SigNoz - an open source observability platform.

The Complete Kubectl Cheat Sheet [PDF download]

Kubernetes is one of the most well-known open-source systems for automating and scaling containerized applications. Usually, you declare the state of the desired environment, and the system will work to keep that state stable. To make changes “on the fly,” you must engage with the Kubernetes API.

How to get complete CI/CD pipeline observability

It's not like it used to be back in the day! Before CI/CD, we were building on-premises, service-oriented products following system style architecture and we were able to map out the build system and end-to-end process in a PowerPoint or Visio document. Although time-consuming and inefficient, it was relatively straightforward and the build pipeline was unlikely to change drastically. But that's no longer the case.

PostgreSQL Monitoring Upgrade

Netdata for PostgreSQL monitoring just got a huge upgrade, collecting 100+ PostgreSQL metrics and displaying these across 60+ different composite charts. You can check the reference documentation for the full list of metrics, and see them running live in the demo space. If you are using PostgreSQL in production, it is crucial that you monitor it for potential issues. And the more comprehensive the monitoring the better!
Featured Post

The Economic Crunch is Here: Time to Get AIOps Right

Economic warning signs are flashing, and organisations of all sizes are balancing the need for fiscal discipline and efficiency while fighting to retain customers, when a single negative interaction can send them running to a competitor. Business digital operations are more complex than ever, compounding the problem is that companies are still adapting to remote work and pandemic-driven digitisation. Our recent report confirms that delivery teams are facing increased pressures, unreasonable business demands, and higher rates of burnout.

Nastel Recognized As Top 10 Banking Tech Solution

Banking is about financial transactions. These are executed by sending payments and instructions over middleware. If you control the middleware, then you control the business. If the middleware fails, then the company fails. If you can see and analyze the transactions going through the middleware, you can see the business itself. And if you have real-time analytics of that data and it’s automatically actioned, then you can innovate and accelerate the company’s development.

What is Distributed Tracing and How to implement it with Open Source?

Distributed tracing helps you track requests across services and understand issues affecting your application performance. In distributed cloud architecture, debugging performance issues is complicated. Distributed tracing gives visibility to teams on how a user request performs across different services, protocols, and cloud infrastructure. Let’s start with a brief overview of distributed tracing.

What is AIOps? The Importance of Artificial Intelligence for IT Operations

Modern IT environments are so complex, dynamic, and expansive that humans alone cannot effectively manage and maintain them. As a developer and operator, I have had to deal with failed servers and containers, running out of storage space, slow or unreliable network links, bugs in code, and unpredictable workloads in some applications.

What's the Sharpest Tool in Your Security Shed?

How easy is it to work with your security tools? So easy that you’re telling all your family and friends and you singing their praises from the occasional rooftop? Well, we sure hope so. Security tools, like any other tool, should help you save time, not waste it. Nobody would have invented a drill if screwdrivers were fast enough — but it’s also up to you to make sure you are using your drill and all the other power tools available in the modern world.

Troubleshoot ISP, Cloud and SaaS Network Delivery Issues with Experience-Driven NetOps

In this demo, we showcase Experience-Driven NetOps from Broadcom Software that includes AppNeta digital experience monitoring and DX NetOps network monitoring software. The combined solution provides the most comprehensive visibility for true end-to-end user experience network delivery assurance for today's network operations teams.

Observability 101: a chat with Jude Bakeer

We recently sat down with Jude Bakeer, one of LogicMonitor’s Solutions Engineers, to talk about the future of IT and Observability. Part of Jude’s role requires her to talk to customers and enterprises every day. Over the years, she’s gathered unparalleled insights into key trends across these industries and segments – from ops teams to C-level executives.

Sentry Dynamic Sampling Improvements - Transaction Breakdown & Latest Release

In this video, Sentry software engineer, Riccardo, will go through the user experience improvements we worked on for our new performance monitoring feature, Dynamic Sampling. These improvements will give the user a more holistic view of transaction volume connected to services and allows for more comprehensive configuration options for sampling rates.

Digital Experience Management (DEM) and Your Salesforce.com: Everything You Need To Know

The post-pandemic world is more digital. Covid-19 fundamentally impacted how we live, work, interact, and shop. And while some of the structural trends (such as the rise in remote work or shifting consumer preferences) were already underway, the pandemic accelerated them by a significant number of years. The competitive landscape is digital, and the winners and losers are defined through the customer experience they provide.

Sky Rocket APM Performance with Log Analytics

Application Performance Monitoring (APM) is great for tracking the health and performance of your software tool. APM helps you understand what's happening inside your application by monitoring various parameters such as CPU/memory stress, internal network throughput, and more. However, mixing in log analytics can take your APM game up a notch. Almost all software tools generate logs when they run.

Pandora FMS becomes an M81 partner to make IBM i system monitoring easier

Pandora FMS, an international benchmark in system monitoring, becomes technology partner of the French company M81. The French platform has extensive experience in the sector since its creation in 1988 and is currently considered one of the great experts when it comes to monitoring IBM i systems. Thanks to this agreement, Pandora FMS will transfer this type of solutions to Spanish companies to improve IT management and the efficiency of the different work areas.

Incident Severity Levels 1-5 Explained

The question isn't whether an incident will happen: it's when it will happen. Systems will crash. Software will fail. Vendors will suffer an outage of their own. It's your job to be prepared for these problems, and incident severity levels are one of the tools you need. Incidents have varying impacts on your business and customers. Incident severity levels are how you classify their impact and manage your response.

Dashboard Studio: It's the Little Things

It's always interesting to hear what feature requests dashboard users share with our product team. Sometimes it's big things — such as being able to set tokens on drilldowns — and sometimes it's little things. In Splunk Cloud Platform 9.0.2208, we've included a handful of Dashboard Studio "little things" updates.

Top 6 services to Integrate with MetricFire

At MetricFire, we focus on integration with your infrastructure. As one of our business values, MetricFire strives to ensure you can integrate your infrastructure with our Hosted Graphite monitoring service easily. Our engineers are committed to going above and beyond in finding solutions to get our customers the insights they need.

AIOps Role in Improving the Telecom Customer Experience

Communication service providers are finding themselves at a crossroads – growing competition is exerting pressure on revenue and costs, while the need to revolutionize operations is pressing. The only way for providers to to set themselves apart from competitors is to deliver top-notch services and incorporate innovative technologies that will ultimately lead to growth and operational efficiency.

Challenges Faced by SaaS Companies Regarding Continuous Integration and Deployment

Software as a Service (SaaS) companies are constantly striving to ensure customers have an exceptional user experience when utilizing their service. The fierce competition in the modern business puts more pressure on companies to meet their customers’ expectations or risk losing them to a market rival. The success or failure of a solution and the underlying organization can rest on the ability to create an engaging and innovative user experience.

Product Update - Adaptive Zoom now live

We love to write and ship code to help developers bring their ideas and projects to life. That’s why we’re constantly working on improving our product to meet developers where they are, to ensure their happiness, and accelerate Time to Awesome. This week, we are covering a product release that helps all InfluxDB Cloud UI users get more from their graphs.

A NetOps Guide to DDoS Defense

Join Kentik and Cloudflare as we discuss and analyze the latest in DDoS attack trends. (We’re seeing some really interesting patterns in our data!) Back by popular demand: Doug Madory, Kentik’s Director of Internet Research, will walk through how BGP monitoring can determine if DDoS mitigations are actually effective. What you’ll learn.

How to Run Solr Cloud on Docker Containers | Setup Tutorial for Beginners - Sematext

Solr is one of the most powerful and popular open-source search engines. And being able to put Solr in docker is an absolute must for anyone looking to get into DevOps. In this video tutorial, we will discuss the benefits of putting solr in containers, the 2 types of architecture solr can utilize, and containerize solr cloud in docker.

Troubleshoot in less than 60 seconds with Grafana: Inside NOS's observability stack

It may seem like ancient history, but there was a time when telecommunications companies only had to worry about connecting customers over landlines. Today, their businesses depend on vast cellular networks to not only provide strong wireless phone coverage in countless locations, but also handle the demands of tablets, computers, and machine-to-machine communications.

How to Reduce Data Costs with OpenTelemetry and BindPlane OP

Data costs fill a large column in many organizations’ accounting sheets. Data pipeline setup and management is a large time sink for DevOps, IT, and SRE. Setting up telemetry pipelines to reduce unwanted data often takes even more time that could better be spent creating value rather than reducing costs. This blog will show you one way to quickly set up your data pipeline to filter unnecessary telemetry data.

So You Received an Alert. Now What?

Your phone buzzes with an incoming text message right when you’re about to start dinner. Inconvenient, but better than a 3 am call. It’s an Uptime.com Alert, and if you want to clear it before your dinner gets cold you need the right tools for investigation… If that scenario sounds familiar to you, then you’re in good (if tired) company.

The Difference Between Monitoring and Observability and Why It Matters

Organizations are adopting cloud native and multi-cloud architectures to drive innovation, achieve faster time to market, improve yield, and deliver exceptional experiences to their customers. However, for all the business benefits of modernizing, the process does not come without challenges.

GripMatix and SquaredUp join forces

We know there are a couple of dashboard solutions for which integration with SCOM can or could be created. However none of them integrates better and more easily than SquaredUp Dashboard Server for SCOM. It integrates with the SCOM platform out-of-the-box, leverages the SCOM object model to the fullest and is very easy to create dashboards with. For most use cases you do not even have to create dashboards, as the product comes with many out-of-the-box community dashboard packs.

Digital experiences - Our second nature

I wanted to take a moment to reflect upon a significant milestone and celebration on September 20, 2022, in which I had the honor and privilege of representing our Sumo community to ring the Nasdaq bell to commence trading. It was a meaningful and emotional moment for me personally because it represented how far we’ve come to build a company, culture and community to enable and accelerate digital transformation for companies around the world.

The Sentry Remix SDK is Now Available

Sentry has made it a priority to support frontend JavaScript developers, regardless of the framework they use. This is why we have SDKs for React, Angular, Vue, Ember, NextJS and more! There’s one more SDK joining that list now - our brand new Sentry Remix SDK for the Remix framework. Remix is a new full-stack JavaScript framework that helps you build web applications with React, with a focus on following web standards and optimizing for performance.

Software Project Managers: get total visibility of all your tools

Project managing global software projects is always a challenge, contending with multiple time zones, tools, and teams. In these environments, the day-to-day life of a Project Manager is filled with status collections, project reporting, and little time for much else. While the detail will always matter – like team bug data, feature status, and build progress – there is a better way than collecting and reporting on all this data manually. Does this scenario sound familiar?

Take Network Monitoring to the Extreme with WhatsUp Gold/Flowmon Duo

Network monitoring is the key to efficient, reliable operation, as well as performance and security. The deeper and more broadly you can monitor (yes, you want to do both), the better your network operates. What if you could combine a superstar in network infrastructure monitoring with the champion of network flow monitoring? You can. Progress, owner of WhatsUp Gold, recently acquired Kemp and their market-leading Flowmon solution.

Sponsored Post

The Importance of Observability for Site Reliability Engineers (SREs)

Site reliability engineers (SREs) play a crucial role in ensuring the reliability of systems. From creating software to improving system reliability in production, responding to incidents, and fixing issues, SREs are responsible for guaranteeing the health of applications.. And observability helps support SREs'. Because an observable system allows them to identify and fix issues promptly, resulting in SRE's being better equipped to fast-track development cycles.

Sponsored Post

What's a Status Page Aggregator and why you need one

Welcome to the future! SaaS (Software as a Service) rules the world. When just a few years ago businesses were buying software and installing it in-house, now they're renting it. There's a SaaS for everything. Actually, multiple SaaS for the exact same problem! Even technology companies with expert engineering teams are choosing to use off-the-shelf components (now in the form of SaaS) instead of developing in-house. It makes complete sense to buy something that would cost 100x more to develop in-house.

An MSP's Guide to Quarterly Business Reviews

Quarterly business reviews (QBRs) are one of the best tools you can use when it comes to being transparent and keeping your clients updated on all the work you’re taking care of for them. Otherwise known as technical business reviews, or semi-annual business reviews, QBRs serve as an excellent opportunity to touch base with clients, highlight the value of the services you provide, and create a strategic agenda moving forward.

8 Secret Struggles Today's MSPs Are Dealing With

In the managed services industry, there’s no shortage of support. Whether online or in person, there are countless ways to connect with other MSPs, share stories, and provide tips and tricks to help you build your business—and overcome MSP struggles and biggest challenges. And while MSPs are willing to share industry trends they’re taking advantage of, they’re not always willing to talk about the ones they’re losing sleep over.

TL;DR InfluxDB Tech Tips: Joins

If you’re an InfluxDB user you’ve almost certainly used the join() function. The join() function performs an inner join of two table streams. It’s most commonly used to perform math across measurements. However, now it is deprecated in favor of the join.inner() function which is part of the new join package. With the addition of the join package, Flux now has the ability to perform the following types of joins: A visualization of different types of joins from this article.

VictoriaMetrics Monitoring

Share: VictoriaMetrics is a monitoring solution. It was designed to collect and process telemetry from many systems, provide a retrospective view, and forecast metrics for capacity planning. But what about monitoring VictoriaMetrics itself? There is one of the software development approaches called Observability Driven Development (ODD). In a nutshell, it means that developers should always keep in mind that software needs to be transparent to the person who uses it. Does your software make backups?

Synthetic Monitoring of Microsoft AVD Service Levels

We recently announced a FREE ready-to-go standalone SaaS logon simulator tool for Microsoft AVD, available here. This joins our fully featured logon simulator available for use on-premises, as SaaS or in the cloud within our synthetic monitoring suite. A logon simulator, whilst a useful tool, should be used in conjunction with other synthetic and real user monitoring tools for a truly proactive IT strategy designed to prevent real users encountering issues and raising support tickets.

Key Observability Scaling Requirements for Your Next Game Launch: Part II

In Part I in our series outlining best practices for scaling observability, we reviewed the data analysis capabilities that can help engineers troubleshoot faster during high pressure situations during a game launch. Nobody wants lag time or crashes in their game launch. Similarly, no one wants terminated sessions or for your gamer customers to log off and play a competitor’s game.

Troubleshooting SaaS User Experiences with AppNeta from Broadcom Software

Many organizations are moving to SaaS-based hosting environments but how do you monitor the user experience of these apps when they no longer exist within the four walls of your data center? In this demo, we are going to reveal how network operations teams can gain true visibility into the user experience with AppNeta from Broadcom Software - even when they do not own any of the network infrastructure delivering that experience.

What can be learned from recent BGP hijacks targeting cryptocurrency services

On August 17, 2022, an attacker was able to steal approximately $235,000 in cryptocurrency by employing a BGP hijack against the Celer Bridge, a service which allows users to convert between cryptocurrencies. In this blog post, I discuss this and previous infrastructure attacks against cryptocurrency services. While these episodes revolve around the theft of cryptocurrency, the underlying attacks hold lessons for securing the BGP routing of any organization that conducts business on the internet.

How to monitor OpenShift with Sysdig Monitor

Monitoring Red Hat OpenShift brings up challenges compared to a vanilla Kubernetes distribution. Discover how Sysdig Monitor, and its exclusive features in OpenShift, will help you monitor and troubleshoot your issues fast and easily. OpenShift builds many out-of-the-box add-ons into its Kubernetes foundation. For example, the OpenShift API server, Controller Manager, Ingress, or Marketplace ecosystem. This creates a more complex environment that can cause you to struggle.

Released: Better Uptime Integration

StatusGator has a wide a variety of use cases: from education to help desk to IT and managed services and DevOps, too. All corners of an organization depend on cloud services and StatusGator gives you visibility into the status of all of your vendors. We’ve heard over and over from our DevOps users that alerts and notifications for their teams are already centralized into a single incident management platform such as OpsGenie, PagerDuty, or FireHydrant.

How to convert a mini-arcade machine into a Grafana dashboard display with Raspberry Pi

When COVID-19 hit, Yonatan Mevorach faced an unexpected challenge, which required an unexpected solution. The Infrastructure Team Lead at Wix, the popular website building platform, was accustomed to looking at multiple monitors on the walls of the software company’s offices in Tel Aviv, Israel. These monitors cycled through Grafana dashboards to help the team keep tabs on Wix’s many services.

Just How Bad is a Down, Slow, or Dysfunctional Website? It's Worse than You Think!

Have you ever watched a movie (*cough* Godfather III) and said to yourself: “wow, this is so incredibly bad — I don’t think this can get worse!” But then it does. Much, much worse. Well, having a down, slow, or dysfunctional website is similarly nightmarish — just when you think the reputation devastation is finally over, there’s more on the horizon. With apologies to Shakespeare: hell hath no fury like a customer scorned. Not convinced?

Understanding the Observability Maturity Model

Based on research and conversations with enterprises from various industries, StackState created the Observability Maturity Model. This model defines the four stages of observability maturity. The ultimate destination is level four, Proactive Observability with AIOps. However, even moving from level one to two, or from level two to three, is a huge improvement in your ability to get essential insights into your IT environment.

Part 5: Proactive Observability With AIOps- Level 4

Level 4, Proactive Observability With AIOps, is the most advanced level of observability. At this stage, artificial intelligence for IT operations (AIOps) is added to the mix. AIOps, in the context of monitoring and observability, is about applying AI and machine learning (ML) to sort through mountains of data looking for patterns.

Flask Monitoring and APM Benefits

Many renowned businesses use the well-known web framework Flask. Flask is quite famous among developers for making small and full-fledged applications. It is known for being a straightforward framework to learn hence why it is popular among established organizations. Monitoring your Flask application can be a challenge. Many important operations are happening inside of them, and if anything goes wrong, it can cause some damage.

Schedule Cronjob for the First Monday of Every Month, the Funky Way

The crontab man page (“man 5 crontab” or read online) contains this bit: What does it mean precisely? If you specify both the day of month and the day of week field, then cron will run the command when either of the fields match. In other words, there’s a logical OR relationship between the two fields.

Java application performance metrics 101

Since it first emerged, Java has had a phenomenal rise in usage and popularity. It’s ability to be robust and platform-independent has enabled it to rule the application development world by providing internet solutions for businesses across industries. Any organization that runs its mission-critical applications on Java shouldn’t be turning a blind eye towards understanding the importance of application performance monitoring.

The Leading Dynatrace Alternatives & Competitors

The Dynatrace platform is a software-driven monitoring platform that simplifies enterprise cloud complexity and accelerates the pace of digital transformation for enterprises. There are many top solutions on the market, but Dynatrace is one of the most popular for its ability to monitor both mobile apps and real users simultaneously. By offering a shared platform for understanding metrics, Dynatrace also helps developers, operations, and business teams improve performance.

Quick tip: How to create Icinga reports with custom branding

With Icinga Reporting you can create custom SLA reports for hosts and services that are monitored with Icinga. The module for Icinga Web fetches existing data and takes planned downtimes into account. You can filter for certain hosts and services as well as set custom thresholds to highlight unmet SLAs visually. You decide if you want to create the reports either manually or automatically, for example every week or month. The report then is sent to you as a PDF via email.

The future of OpenTelemetry | Q&A

OpenTelemetry is an open-source project under the Cloud Native Computing Foundation(CNCF) that aims to standardize the generation and collection of telemetry data. The telemetry data helps developer, DevOps and IT teams to keep a check on their application health. The telemetry data collected by OpenTelemetry consist of logs, metrics, and traces. Together, they are used for performance monitoring and observability in distributed systems. At SigNoz, we are building an OpenTelemetry native APM.

StackState 5.0 UI; Gain a Rapid Understanding and a Speed up Discovery

Do you experience this: Your brain seems to explode because there is so much you try to fit into ”working” memory? It can happen on a Friday afternoon, after a busy work week. Or on a Monday, looking at your calendar while figuring out how to fit in all those meetings and still get real work done.

Beat the holiday rush with Elastic Observability

September is here, and that means many retailers have already begun preparing for the upcoming holiday season. One weekend in particular tends to be the real-life stress test that companies have come to develop a love-hate relationship with: Cyber Weekend. Or more specifically, Black Friday, Cyber Monday, and the weekend in between.

OpsRamp Featured in Six Gartner Hype Cycles

As another autumn fast approaches, any look back at the summer would be incomplete without a review of the latest round of Gartner Hype Cycle reports. The Gartner Hype Cycles are an annual summer ritual, a series of reports published every July (and sometimes August) that track the maturity, adoption and business impact of technology innovations.

8 Real-World MQTT Use Cases

MQTT is becoming the standard protocol for applications that operate in environments where network connectivity is intermittent or unreliable, reducing bandwidth usage is a priority, or where hardware resources are limited. In this post you will learn about some specific use cases where businesses are seeing value from making MQTT part of their tech stack.

Current state of OpenTelemetry and how it fits in the DevOps ecosystem | Q&A

OpenTelemetry is an open-source project under the Cloud Native Computing Foundation(CNCF) that aims to standardize the generation and collection of telemetry data. The telemetry data helps developer, DevOps and IT teams to keep a check on their application health. The telemetry data collected by OpenTelemetry consist of logs, metrics, and traces. Together, they are used for performance monitoring and observability in distributed systems. At SigNoz, we are building an OpenTelemetry native APM.

The IT company Pandora FMS, the first in Spain to offer a monitoring system integrated with IBM i

The technology company Pandora FMS, specialized in system monitoring, takes a step further towards growth becoming the first national technology capable of offering monitoring solutions, both for IBM i systems (formerly known as AS400) and for much more modern systems. A development that, after several decades, means the full coexistence of IBM i with more recent systems.

Microsoft Teams Monitoring Use Cases

Microsoft Teams is a cloud-based software within the Microsoft 365 application suite. The tool creates a highly collaborative and engaging environment with multiple organizational business units. As a modern unified collaboration (UC) platform, Microsoft Teams uses cloud-based calling, video meetings, messaging, chat, screen sharing, and third-party app integration. However, to ensure a great employee experience, IT must develop and execute Microsoft Teams monitoring use cases.

Happy IT Professionals Day! A huge shoutout to all the IT pros

It’s the coolest day of the year again—time to honor and celebrate the people who work behind the scenes to keep our businesses running: IT professionals! Established in 2015, IT Professionals Day is celebrated each year on the third Tuesday of September to celebrate the unsung heroes of the IT world. Join us as we wish all IT pros worldwide a very happy IT professionals day! Kudos to the IT pros!

Modernising applications? Integration with APIs is the way to go

Today’s enterprises can relate to this logic, especially concerning digital transformation projects. They have broken systems that need fixing, but they also spend a lot of effort fixing systems that don’t necessarily need those levels of intervention. Specifically, companies often face a dilemma: digital modernisation requires apps and systems to be upgraded and operate in the new technology norms. Yet those same systems often exist for good reasons.

2022 IT Pro Day survey examines state of the tech job market amid labour shortages and hiring challenges

SolarWinds survey: Two-thirds of IT pros say they're confident in their tech career despite a potential economic slowdown, but only half say their company has been adequately staffed amid the "great resignation".

Part 4: Causal Observability - Level 3

It’s not surprising that most failures are caused by a change somewhere in a system, such as a new code deployment, configuration change, auto-scaling activity or auto-healing event. As you investigate the root cause of an incident, the best place to start is to find what changed. To understand what change caused a problem and what effects propagated across your stack, you need to be able to see how the relationships between stack components have changed over time.

Relational Databases vs Time Series Databases

Databases are often the biggest bottleneck when it comes to application performance. Over the years a number of new database designs have emerged to help with not only basic scalability and performance but also to help improve developer productivity and make building certain types of applications easier. That isn’t to say these new databases are magical — there are always trade-offs being made and certain things are sacrificed for gains in other areas.

Introducing Nexthink Infinity

Today is one of the most special days in Nexthink history. I personally believe that founding and growing a tech company is mainly about developing amazing technologies which have the potential to change how people work. With the launch of our new Infinity platform, I feel we are truly transforming how digital workplace teams get their jobs done—not only for themselves but for all the employees in their companies.

Setting Up and Tuning Amazon S3 as a Cribl Stream Destination

Everybody is starting to look more at object storage to deliver on data lake initiatives, and S3, specifically Amazon S3, is the gold standard for that. In addition, we’ve heard from many of you that setting up S3 as a destination is a must when starting with Cribl Stream. So in this article we’ll walk you through the setup.

How Acquia helps customers deliver "moments that matter" with observability

Acquia is the open source digital experience company that empowers the world’s most ambitious brands to embrace innovation and create customer moments that matter. In this session, Acquia shares its journey to observability, moving from a fragmented and siloed state of monitoring to a sustainable model for observability. Learn how Acquia worked with Sumo Logic to improve the user and developer experience by

Monitoring 10,000 clouds with Hashicorp

While there are many advantages to cloud computing, the cloud is crazy complex. Organizations need to be able to see exactly what’s happening in a variety of cloud environments in real-time, to get a clear picture of the health and operational status of relevant cloud-based components and devices. In this fireside chat, learn how HashiCorp puts Sumo Logic at the center of its security monitoring strategy to help monitor and secure thousands of public and hybrid cloud environments for its customers.

A fresh perspective: Rethinking TCO and expanding visibility to fuel growth

Over the past nine years, Freshworks has experienced record growth using Sumo Logic to ensure a reliable experience for its SaaS customers. Through continuous optimizations, the SRE team has improved their Total Cost of Ownership (TCO) and gained better visibility and insights into the logging patterns of their applications. Developers investigate issues more quickly using features like Live Tail, Anomaly Detection and Time Comparison, maximizing service reliability. Learn how Freshworks is doubling down on Sumo Logic to achieve full-stack visibility.

Quick Bytes - Getting started with ECS monitoring

Lumigo provides visibility into your ECS clusters and the underlying services and tasks in real-time by leveraging out-of-the-box dashboards and turn-key integrations with AWS. All the key metrics you need to monitor your clusters, services and tasks are displayed with easy access to corresponding traces. With one-click distributed tracing, Lumigo lets developers effortlessly find and fix issues in serverless and containerized environments

Grafana alerts as code: Get started with Terraform and Grafana Alerting

Alerting infrastructure is often complex, with many pieces of the pipeline that often live in different places. Scaling this across many teams and organizations is an especially challenging task. As organizations grow in size, the observability component tends to grow along with it. For example, you may have many components, each of which needs a different set of alerts. You may have several teams, each with a different channel where notifications should be delivered.

Harness Continuous Observability to Continuously Predict Deployment Risk

In my previous blog, I discussed how continuous observability can be used to deliver continuous reliability. We also discussed the problem of high change failure rates in most enterprises, and how teams fail to proactively address failure risk before changes go into production. This is because manual assessment of change risk is both labor intensive and time consuming, and often contributes to deployment and release delays.

Software Delivery Platforms to Benefit DevOps Practices

In this era where applications are taking over the world, delivering the service to your customer with scalability and security is of the utmost importance. The software delivery platform helps to manage the data flow, traffic management, and security of the data from both sides of the application. If you are studying software delivery platforms, then most of you must have heard about the Codefresh software delivery platform for continuous integration and continuous deployment of the application.

How to be a top online retailer in 2023? Start with a top ecommerce search engine

Search is the key to improving the customer experience–and business outcomes Retailers that weathered the global pandemic now face new challenges: emerging shopping patterns, competitive upstarts, and economic uncertainty.

Contextual Autocomplete: Why Coralogix is Focused on Developer Productivity

In the observability toolchain, all of our efforts go into data storage and analysis, and the usability of our system becomes a second-class citizen. Autocomplete is a crucial usability feature that significantly improves the developer experience. It is ubiquitous amongst engineering tools from IDEs to CLIs. Autocomplete has long been a feature of many observability tools, but they all miss a crucial detail – optimizing for developer productivity.

Monitoring CrowdSec with Bleemeo

CrowdSec is an open-source software that allows you to detect peers with malicious behaviors and block them from accessing your systems. It benefits from a global community-wide IP reputation database. Attckers can then be prevented from accessing your resources by deploying bouncers. They are in charge of acting upon actors that triggered alerts: they can block the attacking IP, serve a 403 Forbidden response, and much more.

Make Your MSP a Recession-Proof Business

Many believe we’re either in a recession or on the brink of one. It’s a familiar cycle: high inflation, international strife, supply chain challenges, and tightening monetary policies are all driving fears of a downturn. While the IT industry has exceeded expectations for the past few years, it isn’t immune to a recession. As a result, many IT MSPs are thinking about what they can do to ensure a recession-proof business over the next year and beyond.

How to Monitor IIS Performance

Are you looking to monitor the performance of your IIS web server? This guide will break down everything you need to know about monitoring and troubleshooting Microsoft’s popular web server. IIS performance monitoring helps you to proactively monitor the health of your IIS web server by collecting logs, traces, and metrics of the application performance and the underlying IT infrastructure.

An overview of Monitoring Azure Kubernetes Services

Modern applications are increasingly built using containers and microservices packed with their dependencies and configurations. Kubernetes is open-source software for deploying and managing those containers at scale—and it is also the Greek word for helmsmen of a ship or pilot. Monitoring the health and performance of your Azure Kubernetes Service (AKS) cluster is critical to ensure that your applications are up and running as expected.

Integration with Apache Kafka

You can integrate Edge Flow Manager (EFM) with Apache Kafka and forward agent heartbeats to defined Kafka topics. Learn how to perform the integration with Apache Kafka. To integrate EFM with Kafka, you need to configure Kafka and EFM properties. EFM supports the forwarding of agent heartbeats and acknowledges messages exchanged on the C2 protocol between the EFM server and MiNiFi agents.

Product Spotlight: Logz.io Service Performance Monitoring

We believe that one of the most powerful capabilities added to the Logz.io Observability Platform in recent months is our new Service Performance Monitoring (SPM) feature set. As you may have seen earlier this year, Logz.io was named a Visionary in the 2022 Gartner® Magic Quadrant(™) for Application Performance Monitoring and Observability. To that end, SPM is a cornerstone for our related solutions.

How To: Connecting Azure Blob to Cribl Stream to Replay Observability Data

One of the core features of Cribl Stream is our Replay capability. We pride ourselves on giving customers choice and control over their data. The ability to archive data in cheap object storage, and then providing the ability to reach into the same object storage is one example of this. It’s safe to say that S3 and AWS have become synonymous with the term object storage. It’s like a modern day Kleenex, or Band-Aid.

Internet Availability Threats Following the Russian invasion of Ukraine

Since Russia’s invasion of Ukraine on February 24, 2022, denial-of-service (DoS) attacks impacting availability have been rising. The attacks aren’t only affecting Russia and Ukraine either. Public and private organizations in multiple industries have been impacted, and several nations—including the U.S. and U.K.—have issued warnings about cyberthreats from Russia.

How to easily configure Grafana Loki and Promtail to receive logs from Heroku

Heroku is a cloud provider well known for its simplicity and its support out of the box for multiple programming languages. When thinking about consuming logs from applications hosted in Heroku, Grafana Loki is a great choice. But in the past, shipping logs from Heroku to any Loki instance required ad-hoc scripts to fiddle with Heroku’s logs format and send them. This can be a time-consuming experience.

How to Make Your Status Page Stand Out From The Rest

If you’ve ever had a website or service go down as you were using it, then you’ll understand the irritation of a generic error message and a plea to “Be patient!” (if you’re lucky). It’s almost like they know they’re not telling you the full story. The companies that are on top of their outage game will have a prepared link or redirect to their Status Page (or at least, have one prominently displayed on their pages and social media) for times like these.

Take control of LM Envision with the LogicMonitor REST API

We are living in an era where everything depends upon data, any application you are using or that you are creating is going to either consume or create data. LogicMonitor REST API makes it easy for customers who prefer to use APIs to access features without performing activities manually via a UI. LM API allows end-users to perform several activities with added security and reduced efforts. As of today, we are offering v1, and v2, and now have v3 as base versions of API & SDK going further.

5 reasons why accurate data and checkpoint selection matter

There’s an old saying that “knowledge is power” and no matter how you choose to interpret its meaning, nothing could hold true more when it comes to the quality and quantity of information derived from your website monitoring service. When data fails to meet expectations for quality, validity, and consistency, it can have wide-spread negative effects on company operations and key strategies.

5 common Java performance problems and how to avoid them using java monitoring tools

Java is one of the most widely used programming languages and it’s often used by back-end developers as a server-side language. It’s used by über-famous applications like Spotify, Twitter, Signal, and Cash App. Java has evolved immensely over the years and in addition to being easy to write, compile, and debug, it’s also more secure, portable, and effective in memory management compared to other languages.

What is DevOps? A Comprehensive Guide

The term DevOps is a combination of the words “development” and “operations.” In practice, DevOps is a collaborative approach to the work that is performed by an enterprise’s IT operations staff and their application developers. Collaboration and communication between these two teams, who might otherwise function separately, are meant to increase the speed and quality of product or application releases.

Grafana Cloud Metrics: A guide to what metrics to monitor and best practices

Metrics are the cornerstone of an observable system – they tell you a system’s measured outputs, granting visibility into what your customers are experiencing and when there’s a problem. However, not all methods for recording and saving metrics from a system’s output are alike. The best method for shipping your system’s metrics to Grafana Cloud depends on many factors, varying from the source of your metrics data to your familiarity with observability tools.

GitOps your service orchestrations

GitOps takes DevOps best practices used for application development (such as version control and CI/CD) and applies them to infrastructure automation. In GitOps, the Git repository serves as the source of truth and the CD pipeline is responsible for building, testing, and deploying the application code and the underlying infrastructure. Nowadays, an application is not just code running on infrastructure that you own and operate.

Measuring cloud cost efficiency for FinOps

Public cloud can deliver significant business value across infrastructure cost savings, team productivity, service elasticity, and DevOps agility. Yet, up to 70% of organizations are regularly overshooting their cloud budgets, minimizing the gap between cloud costs and the revenue cloud investments can drive.

4 Ways to reproduce issues in microservices

Let’s say we have an issue in production. We’ve all been there, right? The first thing we want to be able to do is reproduce the issue. By reproducing, we can confirm it’s a recurring issue, rather than a sporadic one, and that it requires a fix to ensure that our product is working properly. When shifting from a monolith to microservices, reproducing issues becomes more of a challenge.

An Introduction to GitOps and Argo

In an ideal world, developers would be able to release new products and features from development environments into production extremely fast while also not having to stress about breaking prod. Achieving this combination of development speed while also maintaining software reliability requires having the right toolchain and automation in place.

PostgreSQL Monitoring with Netdata

PostgreSQL is a popular open source object-relational database system designed to work for a wide range of workloads from single machines to data warehouses to web services with many concurrent users. PostgreSQL runs on all major operating systems and is used by teams and organizations across the world, including Netdata. If you are using PostgreSQL in production, it is crucial that you monitor it for potential issues. And the more comprehensive the monitoring the better!

How Rush Capped Time to Resolution by Integrating Sentry With Their With CI/CD Pipeline

Rated as the top order tracking and revenue generation app on Shopify, Rush lets businesses build and personalize their own dashboards to manage the post-sale process with real-time data, custom product recommendations, and user feedback. Their business model focuses on low touch and user-centered design (UCD), which leaves little room for issues impacting how people interact with the platform.

Operator Connect vs Direct Routing: Getting the Most from Microsoft Teams PSTN Functionality

Organizations already heavily reliant on Teams for its collaboration and communication capabilities are using either Phone System, Operator Connect or Direct Routing functionality for PSTN connectivity.

DX UIM 20.4, Cumulative Update 4: What's New and Why Upgrade

At Broadcom Software, we’re constantly trying to speed value delivery and minimize upgrade efforts for our customers. Toward that end, the DX Unified Infrastructure Management (DX UIM) team releases cumulative updates every calendar quarter. In addition to quality fixes, these cumulative updates include performance improvements, feature enhancements, and expanded platform support. Recently DX UIM 20.4, cumulative update 4, was released for both Operator Console and Server Core packs.

Key Observability Scaling Requirements for Your Next Game Launch: Part I

After months–or potentially, years–of hard work by teams across a gaming enterprise, when the day arrives for a game launch, the last thing your enterprise needs is slowdowns, glitches, outages or poor performance. It’s the death knell for any game, because for your avid gaming customers, there’s always something else (read: a game that isn’t yours) to check out.

3 Lessons from a DNS Resolution Failure Incident

Whether you are a Site Reliability or Network Engineer, or simply involved in monitoring a digital service, you know by now that if DNS is not working properly – your users are experiencing an outage. However, despite its importance in ensuring the resilience and availability of the web, DNS is often not monitored correctly, which can mean undetected outages and any associated ripple effects on your business.

Masking and Truncating Fields in Cribl Stream

In Cribl Stream and Cribl Edge, you can operate on your observability event data in flight, all the way down to the field level. Instead of writing complex regex to wrangle JSON and other structured formats, use Cribl’s built-in functions and extensibility to get the results you want. You’ll see formerly complex situations become easier to address and manage over the long term. In this blog, we’ll cover two troublesome use cases.

The 4 Biggest Pain Points of Monitoring Cloud Service Providers

The cloud is limitless – a key factor in its appeal – but it also introduces a new set of problems. Enterprises are looking to innovate at rapid speed, to achieve business outcomes, and deliver superior user experiences. But the majority of enterprises need to continue to maintain complex hybrid infrastructures as well as diversify their cloud strategies across multi-cloud environments.

Current State of AIOps Technology

Nowadays, most organizations are highly dependent on APM technology, but many are shifting to automated monitoring, which is generally referred to as AIOps technology. AIOps is considered the future of operations management for organizations. As we already know, AI is considered the next revolution in the history of mankind. AIOps uses advanced AI for supporting day-to-day operational tasks so the employees don’t need to hassle for no reason.

Managing Cloud Cost Anomalies for FinOps

Cloud cost anomalies are unpredicted variations (typically increases) in cloud spending that are larger than expected based on historical patterns. Misconfiguration, unused resources, malicious activity or overambitious projects are some of the reasons for unexpected anomalies in cloud costs. Even the smallest of incidents can add up over time leading to cost overruns and bill shock.

Welcome to Splunk Secure Gateway 3.0

Splunk Mobile puts the power of Splunk in your hands. But with great power, comes great responsibility. That’s why this year with the release of Splunk Enterprise 9.0, we’ve shipped Splunk Secure Gateway (the backend service that powers Splunk Mobile) with even more features and tools to help you responsibly manage your mobile fleet.

What is AIOps? A beginner's guide

Artificial Intelligence for IT Operations (or AIOps for short) continues to be a hot topic among developers, SREs, and DevOps professionals. The case for AIOps is especially crucial given the expansive nature of today’s observability efforts across hybrid and multi-cloud environments. As with most observability platforms, it all starts with your telemetry data: metrics, logs, traces, and events.

Graduate From Old School Tools: Leverage AIOps to Empower Teams and Boost Revenue

IT operations can become chaotic as businesses become increasingly digital and infrastructure sprawls. And chaos means cost when manageability and observability headaches develop. Multi-cloud management, incident response, technology debt, and IT workloads are challenges across all industries and often hold organizations back from achieving their core business objectives.

Simplify Your Agency's Digital Transformation

From complex IT infrastructures with enormous numbers of devices, applications, services, and tools to trying to make sense out of massive amounts of disparate data - government agencies face unique challenges in moving forward with digital transformation. To become truly agile in the increasingly complex hybrid IT environment, forward-thinking agencies are evaluating the potential of AIOps.

How to use OpenTelemetry for Kafka Monitoring

Apache Kafka is a high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is in essence a massively scalable pub/sub message queue designed as a distributed transaction log. It can be used to process streams of data in real-time, building up a commit log of changes. Kafka has strong ordering guarantees that enable it to handle all sorts of dataflow patterns including very low latency messaging and efficient multicast publish / subscribe.

AIOps for Real: Characteristics of a Platform That Add Value and Drive Change

When you’re investing in automation solutions, ultimately, tangible results need to follow quickly. Getting a return on investment (ROI) out of an automation project after two years is something that would have been OK in the not-so-distant past but is no longer acceptable nowadays. With the current speed of change, where new technologies come and go and existing ones evolve at lightning speed, IT teams require much faster time to value on automation investments.

IT Monitoring for Government

Today’s blog comes from Kevin Howell, CEO of UK partner – Howell Technology Group (HTG) about their work supplying secure cloud technologies and remote working solutions to government and regulated customers. HTG are a trusted industry leader in the UK, who offer virtual desktops, managed services and efficient modern workplace solutions. Their solutions are also available with the UK Government’s Digital Market place under the G-Cloud Framework.

How to Build and Maintain a Winning Culture of Success - Steve Smith (DecisionPoint Systems)

The binding element for any company is its culture and how that culture translates into growth, profits and success for employees, customers and investors. In his thought leadership session, CEO of DescionPoint Systems Steve Smith reveals his individual formula for sales success and the expectations customers have of individual employees, as a group and as a company.

Is your plugin compatible with Grafana? There's a tool for that!

Here at Grafana Labs, we’re always striving to reduce the amount of effort needed to maintain plugins across different versions of Grafana. That is why we’re excited to provide you with a tool to check the compatibility of your plugin with the latest Grafana plugins API. We know that it can be frustrating for developers to find out people can’t use their plugins. Over the past few months, we’ve been working on detecting the breaking changes as soon as they happen.

5 Ways to Report on Your SLA Obligations

Service Level Agreements are designed to foster trust between your customers and your business. They help define the maximum amount of downtime your team finds acceptable. While they can have legal repercussions, SLAs are fundamentally about trust. Your customers use your service because you’re the best at what you do. They remain loyal because they trust you to do what they need. Retention in SaaS is very fickle, and competition in certain spaces is quite stiff.

Making the Most of MQTT - Native Collector or Telegraf?

When it comes to IoT data, MQTT is a superstar. With so many IoT devices generating data out in the world, developers need ways to access it. After all, data lies at the heart of every application. But data doesn’t just magically manifest itself into your datastore, and building the right data pipeline can make or break an application. Data collection is not a one-size-fits-all problem to solve.

How Cribl Stream Helps Enterprises Handle UDP Syslog Challenges

Syslog is a very common method for transmitting data from network devices and open systems servers data to analytics platforms like Elastic and Splunk. As adaptable as syslog is, it still has significant constraints, which is a pain for most companies that lack the resources to scale their capability needed for syslog.

Which is More Important: Observability or AIOps?

In the last half-decade, AIOps and observability have arguably been the hottest two topics in IT operations management. Gartner first mentioned AIOps—Artificial Intelligence for IT Operations—in 2016, defining it as using big data and machine learning to automate IT operations processes, such as event correlation, anomaly detection and causality determination.

Introducing WebPageTest by Catchpoint's First Free Web Performance Course

This week over at WebPageTest, we’re excited to be taking our first steps into the performance education space with our new completely free online course, Lightning-Fast Web Performance by WPT Senior Experience Engineer, Scott Jehl. This course is for you, whether you’re a DevOps engineer curious to learn more about WebPerf or you’re part of a front-end and QA team responsible for website optimization and lowering latency.

AudioCodes SBC Used for Microsoft Teams Direct Routing - Leveraging Performance Metrics

Microsoft Teams Phone has reached about 12 million users, with many Enterprises now in the process of considering and then deploying this functionality that comes with their E5 subscription. Moving to Teams Phone involves important changes in the IT and UC environment but enables a seamless experience for business lines reliant on Microsoft services.

8 reasons why network observability is critical for DDoS detection and mitigation

Distributed denial-of-service (DDoS) attacks have been a continuous threat since the advent of the commercial internet. The struggle between security experts and DDoS protection is an asymmetrical war where $30 attacks can jeopardize millions of dollars for companies in downtime and breaches of contract. They can also be a smokescreen for something worse, such as the infiltration of malware.

Running the OpenTelemetry Collector in Azure Container Apps

In this post, we’ll look at how to host the OpenTelemetry Collector in Azure Container Apps. There are a few gotchas with how it’s deployed, so hopefully this will stop you from hitting the same issues. If you don’t care about the details and just want to run a script, I’ve created one here.

User Monitoring on Heroku-Based Apps

In today’s fast-paced business environment, tech startups are on the rise. Several small- and medium-scale businesses are competing with established companies to showcase their unique digital services. Heroku is a favorite among these businesses and their development teams, as its PaaS provides a ready-to-serve platform for application setup. Developers can deploy the platform easily without investing too heavily in the infrastructure and DevOps setup.

Application Experience Depends on Your Network Experience

The network is designed to connect the organization’s users, partners, customers and visitors, but those connections are useless without software. While applications run on internal servers, end points and the cloud, the performance of the network in large measure defines the performance of the application, and this performance is what user experience and application experience (AX) is based on.

Greater Self-Service Private Apps on Cloud with New AppInspect Tags

We're excited to announce that starting with the new Splunk Cloud Product 9.0.2205 release, it's easier to create, manage and use private apps. Although Splunk is great by itself, we can all agree that the real value of Splunk comes from all the applications that Developers, SplunkTrust folks and Splunkers build.

What is Distributed Tracing vs OpenTelemetry?

There are a few key differences between distributed tracing and OpenTelemetry. One is that OpenTelemetry offers a more unified approach to instrumentation, while distributed tracing takes a more granular approach. This means that OpenTelemetry can be less time-consuming to set up, but it doesn’t necessarily offer as much visibility into your system as distributed tracing does.

IBM LinuxONE and Sysdig: Building cyber resilient systems in hybrid cloud environments

On September 13, 2022, IBM announced the latest IBM LinuxONE Emperor 4, a highly secured and sustainable Linux-based enterprise server designed for companies of all sizes. Sysdig with IBM LinuxONE provides unified visibility across workloads and cloud infrastructure through a single cloud-native monitoring and security platform.

Part 2: Monitoring - Level 1

The first level of the Observability Maturity Model, Monitoring, is not new to IT. But as reliable IT system operation becomes more and more critical, the importance of monitoring continues to increase. A monitor tracks a specific parameter of an individual component in the system to make sure it stays within an acceptable range; if the value moves out of the range, the monitor triggers an action, such as an alert, state change or warning.

Explaining Performance to Non-Technical Stakeholders

Whether you’re an e-commerce company, a SaaS provider, or a content publisher, understanding the performance of your website is important to everyone on the team—not just the developers. Performance is a huge part of the user experience and directly tied to how well your website achieves its goals. But web performance is often measured in very technical terms, like Largest Contentful Paint, that cause most business folk’s eyes to glaze over.

Securing the DX NetOps Development Lifecycle with DevSecOps

Recent, high-profile cybersecurity exploits, such as Sun Burst and Log4j, demonstrate that every enterprise is only a stone’s throw from a software vulnerability. This becomes especially critical when security is breached in a network monitoring component that has privileged access to core enterprise systems. In the case of Sun Burst, a well-known monitoring software provider made international headlines.

Database Decision-Making for Observability, from Simple to Complex

A goal of open-source observability is unifying several different signals to provide the observability everyone wants. It’s always interesting to speak to people on this journey, and how they try to provide it through open-source projects, and the challenges they can face. I was thrilled to host Pranay Prateek on the most recent episode of the OpenObservability Talks podcast.

Distributed Tracing Observability in Microservices

Have you ever tried to find a bug in a multi-layered architecture? Although this might sound like a simple enough task, it can quickly become a nightmare if the system doesn’t have proper monitoring. And the more distributed your system is, the more complex it becomes to analyze the root cause of a problem. That’s precisely why observability is key in distributed systems. Observability can be thought of as the advanced version of application monitoring.

How to download files from ASP.NET Core MVC

I have been implementing a couple of features lately that allow users to download files. During this process, I have visited various namespaces and possibilities with ASP.NET Core. In an attempt not to forget what I have learned and in the hope that this knowledge can be used by others, here is a blog post about downloading files from ASP.NET Core 😊 This post will use an ASP.NET Core MVC application as an example since that is what I am using.

Improve your application monitoring by reducing overhead of managing and updating alert rules

Just about every organization today relies on key applications running on complex multi-cloud environments to transact business and enable users to work. It is critical to ensure that those applications are running optimally. A solid monitoring and alerting system is required to know when an issue needs attention. But having a robust monitoring system is not enough.

How to get maximum value from Service Level Objectives (SLOs)

A reliable digital customer experience is critical to the success of digital-first businesses. Each minute of downtime can result in the loss of revenue, unsatisfied customers, and damage to reputation. However, as your uptime gets closer to 100%, it gets exponentially harder to improve and often comes at the cost of speed of innovation. A good balance between innovation (i.e., new feature releases) and maintaining an acceptable level of reliability is key to success in the digital world.

How to drive better decision-making with reliability management

Almost every organization is going through digital transformation. According to IDC, direct digital transformation investment is growing globally at a compound annual growth rate of 15.5% and is expected to approach $6.8 trillion by 2023. Customers quickly embrace the benefits of a customer experience reshaped by technology. However, they have little patience when that technology doesn’t work as expected.

New capabilities: Sumo Logic expands Real User Monitoring (RUM)

Monitoring the digital experience of users is a must-have these days. Ensuring the end clients are satisfied is difficult though. People are not keen to provide feedback; they just change the vendor without explanation. It is, therefore, crucial to build enough observability into a front-end application (a web page or a mobile UI) so it can tell the story of how well the user perceived the interaction within the application.

Scaling Syslog: The Challenge That Never Goes Away

At this point, you already know how powerful syslog is (and if you don’t, check out “Introduction to Syslog”). But here’s the thing: Scaling your systems to consume high volume syslog is like fighting zombies. Weird unexpected behavior and no easy solutions. Before you fight zombies, though, you have to understand them. So, here are the challenges for scaling syslog one by one.

Why Website Uptime Monitoring Is Crucial For Preventing Downtime

Website uptime monitoring is crucial for any business that depends on its website. But for companies whose whole service is online, it is essential. If your site isn't reliably serving users when they need it, your competitors are just a Google search away. So you can't just check your site is running now and then - you need a tool to check it as frequently as possible.

The Who, What and Where of Microsoft Teams Call Quality

Microsoft Teams is the world-leading collaboration and productivity tool for today’s hybrid workforce, but your users’ experience with it is only as good as the network and IT environment it operates in. There is a critical visibility gap when it comes to delivering a stellar Microsoft Teams user experience to your users. Organizations lack an end-to-end picture of what problems are happening, what is causing the problems and who is affected.

Cloud purchasing strategy KPIs: RIs, SPs, Spot, CUDs

One of the key advantages of cloud services versus on premise deployments is the wide range of purchasing options and pricing models. While it’s an attractive advantage, it can be complicated for organizations to determine the best blend of service pricing models. The ability to define the organization’s blend of purchasing strategies and display the target versus actual performance is critical for optimizing cloud cost management efforts.

Intro to Grafana Incident

In this video, you’ll learn how Grafana Incident offers a complete incident management process out of the box in Grafana Cloud, so you can save time and focus on what’s important when things go wrong. Grafana Incident is available to all free and paid Grafana Cloud users. If you’re not already using Grafana Cloud — the easiest way to get started with observability — sign up now for a free 14-day trial of Grafana Cloud Pro, with unlimited metrics, logs, traces, and users, long-term retention, and premium team collaboration features.

Synthetics 8.4.0 - Elastic Observability

Elastic 8.4.0 was released in August. In this video we cover what's new in 8.4.0 Observability, and go through demos of the newest features in monitor management, data retention, and the public beta. Join Synthetics Tech Lead Andrew Cholakian, and Synthetics Engineer Justin Kambic for the latest in Elastic Synthetics.

Web Endpoint Monitoring

In today’s world, a significant fraction of a software business’s reputation depends on its web application and its speed. It all comes down to how fast your server responds to client requests (assuming your application is reliable and reasonably user-friendly). Therefore, you could argue that the server endpoint is the centerpoint of all the server-side action — the operations here primarily determine the performance of your application.

Accurately Forecasting Cloud Costs for FinOps

Companies are investing heavily in the cloud for the operational and financial benefits. But without a robust cloud cost management strategy in place, the complexity of cloud services and billing can to overspending and unnecessary cloud waste. Being able to accurately predict future cloud spend is one way to more optimize cloud spend and inform budgets.

Online Learning: a Novel Approach to Applying Machine Learning in Splunk

Most classical, batch-oriented machine learning systems follow the paradigm of “fit and apply”. In an earlier blog post, I discussed a few patterns on how to better organize data pipelines and machine learning workflows in Splunk. In this blog, we’ll review how you can organize your machine learning model in a new way: online learning.

Linux server monitoring: Long story short

Servers are almost inseparable from any IT infrastructure. Linux is the most compatible, open source operating system for servers because of its flexibility, consistency, and security. Most Linux servers are set up with any of these variants of Linux OS: Red Hat Enterprise Linux (RHEL), Debian, Fedora, openSUSE, CentOS, Suse Linux Enterprise Server (SLES), or Ubuntu. Basic troubleshooting of a Linux server’s primary metrics can be easily done using the built-in commands.

Track your carbon footprint with Hardware Sentry's offering in the Datadog Marketplace

As we enter a critical period in the effort to mitigate climate change, organizations are facing mounting regulatory pressure—along with a biological imperative—to reduce their carbon footprint. And for those that maintain significant on-prem infrastructure, energy costs associated with operating hardware components can significantly affect their bottom line.

Top AIOps (Artificial Intelligence for IT Operations) Tools/Platforms in 2022

Artificial intelligence (AI) and associated technologies, such as machine learning and natural language processing (NLP), are used for daily IT operations tasks and activities. AIOps supports IT Ops, DevOps, and SRE teams working smarter and faster to identify digital-service issues earlier and address them quickly, preventing disruptions to business operations and customers. This is accomplished through algorithmic analysis of IT data and Observability telemetry.

An Introduction to Syslog

Syslog is an event logging standard that lets almost any device or application send data about status, events, diagnostics, and more. It’s commonly used by network and storage devices to ship observability data to analytics platforms and SIEMs in order to support and secure the enterprise. Syslog is an excellent lightweight protocol to get telemetry from small scale devices.

Quick Bytes - Observing Lambda with SNS & SQS

You can see and understand Lambda → SNS → SQS → Lambda transactions as one complete trace, end-to-end. This view gives you critical context into one of the most common–but unobservable–serverless architectures so you can troubleshoot it faster and easier. Make sure to subscribe so you don't miss out on any new livestreams and observability content! With one-click distributed tracing, Lumigo lets developers effortlessly find and fix issues in serverless and containerized environments

Managing the hidden costs of cloud networking - Part I

Technologies like virtualization and containerization have gained significant traction over the last decade as foundational tools for modern application development. As companies like Amazon (AWS), Microsoft (Azure), and Google (Google Cloud) started to invest in the hardware and software infrastructure required to support access to these virtualized resources, “the cloud” was born.

Papertrail + Slack: Keeping On Top of Your Most Important Logs

One of the most important pieces of operating any application stack is making sure you’re aware of the logging events occurring on your running systems. If your development team is like most, then you collaborate with your colleagues using Slack. Being successful while building and operating software includes looking for ways to implement high-visibility operations within your team communication platform.

Engineering Building Blocks for a Digital Organization (Peter O'Donovan)

OneDigital started as a new organization within an existing enterprise, with an aggressive timeline to launch. Learn how they bootstrapped their engineering organization, set guidelines for communication and collaboration, and balanced the tradeoffs of their technology choices.

Analyze Pacemaker logs in Cloud Logging

As an SAP system administrator, you've probably asked yourself: why did my Compute Instance restart? Why did Pacemaker restart my instance? Why did/didn’t my SAP system failover? By streaming Pacemaker logs into Cloud Logging, you can now find the answers to these questions by using a Cloud Logging query template to filter out the noise generated by Pacemaker logs.

Sentry Performance Monitoring with Dynamic Sampling

This is a sneak peek of Sentry's new performance monitoring feature, Dynamic Sampling. With Dynamic Sampling, you can customize your sampling logic for different parts of your application without having to deploy a new release. These are actual videos submitted by Sentaurs for our monthly Show-N-Tell. We have not edited them except for obscuring personal information that may appear in screenshots. Some videos may include screenshots that contain fictitious usernames or email addresses for illustrative purposes.

Building Grafana dashboards for a large-scale deployment in a tight timeline: Inside Cisco Live

How many Marvel movies’ worth of Internet traffic do 28,000 conference goers create during a five-day Cisco Live event? There’s a Grafana dashboard for that. Cisco Live is the network industry’s largest annual event, delivering education and inspiration to technology innovators worldwide with a week’s worth of programming keynotes, product announcements, entertainment, and more.

How a Product Studio Mitigates User Friction with Performance Monitoring

Panenco is a Product as a Service studio with more than 50 software development, product management, data science, design, and marketing experts building and hosting applications spanning multiple industries, including healthcare, fintech and education. For them, customer success isn’t just a business function, teams focus heavily on providing a fast, frictionless app experience.

Monitoring and Debugging Python Apps on AWS Lambda

As a developer, Python for me is a heavy-lifting and versatile language. I’ve used it for building APIs, internet of things projects, file and data conversions, machine learning and (of course) web development. Like with any modern, commonly used language, the functionality behind the application is only as good as the infrastructure that it is deployed onto.

A Guide to MQTT Messaging Brokers and Client Software

MQTT is a machine-to-machine communication protocol. Devices publish messages to a broker under specific topics, and other devices subscribe to those topics to receive information. It’s popular because it doesn’t take up a lot of bandwidth, so IoT devices with limited network connectivity can use it. MQTT works because of brokers. Each device sending and receiving data can communicate with potentially millions of other devices while only connecting to one broker.

Synthetic Monitoring for Windows, VDI & 2FA | 2 Steps

We relate synthetic monitoring to a heart rate monitor for application performance. It fits hand in glove with real user monitoring and provides insights into how real customers and employees are experiencing the application. Historically, we’ve seen limitations with real user monitoring. If nobody is on the network or using the application, then you're not receiving any performance data. Also, the performance data that you typically receive is from a panel rather than every single user.

Middleware Software Market Is Expected To Expand at Significant Cagr Over 2022-2028

The global Middleware Software market gives a better understanding of the market picture of the local and international markets. It helps the key market players understand the market and product strategies better that can help them survive the global market. The study report gives information on the market volume, market value, predictions of the market share size, and statistics of the international and national level industries.

Grafana Mimir and VictoriaMetrics: performance tests

Grafana Labs Mimir is a new time series database under AGPLv3-license. The engineering team did a great job by taking the best from Cortex TSDB, reducing its complexity and improving scalability in the same time. According to tests by Grafana Labs, Mimir can scale to a billion active time series and 50 million samples/s ingestion rate.

TL;DR InfluxDB, the IoT Stack, and MQTT

The Internet of Things (IoT) describes devices with sensors and computational ability which let them collect, exchange, and act on data. IoT is a broad category that includes uses from smart home thermostats to industrial manufacturing equipment. Sensor data is time series data, and IoT is a common use case for InfluxDB because it can handle the huge amounts of data IoT sensors create.

Deeper Insight into the Factors Impacting Microsoft Teams Call Quality

The new release of Vantage DX takes discoverability of Microsoft Teams call quality to the next level, with at a glance insight into the end-to-end environment supporting Microsoft Teams calls, including the network and session border controller(SBC). Check out the Top 10 Questions to Ask about the Microsoft Teams User Experience and read on to understand how Vantage DX provides the answers.

FinOps: Measuring Cloud Waste

Cloud spend — which research shows makes up 51% of IT budgets — is a prime candidate for company cost savings initiatives with the potential to make a huge difference in gross margins. It’s also an area that has grown dramatically in the last few years due to digital transformation and a rise in cloud demand during the pandemic.

Pandora FMS Named An Emerging Favorite In Capterra Shortlist For Server Monitoring Software and APM

The work is hard and there are high expectations, but we don’t give up! The fight goes on and every day we are happier with what we achieve and what we mean to our users. Therefore, Pandora FMS is proud to announce its mention as an Emerging Favorite in 2022 Shortlist for Server Monitoring Software and APM by Capterra, a free online service that helps organizations find the right software.

Raygun Real User Monitoring and User Privacy

Those who have been paying close attention might have recently noticed small changes in sessions, browsers and platforms lists on Raygun. If you’re using RUM to diagnose user experience issues in different browsers and operating systems down to the detailed versions you will have seen that those details are not always available, with browser versions listed more generally as Firefox 102.0 or Chrome 103.0, for example.

Send Amazon VPC flow logs to Amazon Kinesis Data Firehose and Datadog

Amazon Virtual Private Cloud (Amazon VPC) is an isolated and secure virtual network in which you can deploy resources, such as Amazon Elastic Compute Cloud (EC2) and Amazon Relational Database Service (RDS) instances, while restricting their exposure to the internet. As part of your monitoring strategy, you can collect and analyze VPC flow logs, which record network traffic flow between VPC components.

IBM Patches Severe Vulnerabilities in MQ Messaging Middleware

IBM this week announced patches for high-severity vulnerabilities in IBM MQ, warning that attackers could exploit them to bypass security restrictions or access sensitive information. Messaging and queuing middleware, IBM MQ provides enterprise-grade messaging between applications, enabling the transfer of data between programs and the sending of messages to multiple subscribers. Two security issues were resolved in IBM MQ this week, both residing within the libcurl library.

5 Success Tips for a Winning MSP Business Plan

Looking to scale your MSP network monitoring business this year? Maybe you’re new to the game and still wondering how to write an MSP business plan. Maybe you think the answer is a vCIO? No problem! Here are five tips from the experts at the MSP Growth Summit on how to craft and optimize your MSP business plan.

Why Your MSP Needs a Network Infrastructure Tool?

As a new or growing MSP, discovering tools and processes that will let you do more with less is a prime focus. You’re always looking for ways to boost that bottom line by cutting costs, driving more revenue, and improving overall efficiency. If you have more than a handful of employees, you’ve likely already implemented a good MSP network monitoring tool and PSA (professional services automation) tools.

Catchpoint's New Node to Node Feature Removes Corporate Blind Spots

At Catchpoint, we release new features approximately seven times annually. Catchpoint’s previous release, Andante, includes a variety of updates and new features, including a powerful new smartboard redesigned to be more intuitive and help our users find solutions faster, additional metrics to strengthen our unique RUM solution, and the ability to generate alerts for workforce experience based on issues your Digital Experience Score identifies.

How to deploy the Grafana stack using Podman

You may be asking yourself: What exactly is Podman? Podman is short for Pod Manager and is a daemonless, open source container engine alternative to Docker that allows for rootless containers. Podman is available for Linux, Mac, and Windows operating systems. It only requires a simple and easy install on RPM-based Linuxes, such as Red Hat Enterprise Linux, CentOS, Rocky, or AlmaLinux.

Introducing Kubernetes control plane metrics in GKE

An essential aspect of operating any application is the ability to observe the health and performance of that application and of the underlying infrastructure to quickly resolve issues as they arise. Google Kubernetes Engine (GKE) already provides audit logs, operational logs, and metrics along with out-of-the-box dashboards and automatic error reporting to facilitate running reliable applications at scale.

Akka License Change: The Impact of Akka's Move Away From "Open Source"

Akka’s license change has surprised many of us, but it didn’t come out of nowhere. Lightbend recently announced that Akka will be transitioning from an “Open Source” license to a “Source available” license called BSL 1.1. Let’s unpack this to understand what it all means.

Automate Anomaly Detection for Time Series Data

This article was originally published in The New Stack and is reposted here with permission. Hundreds of billions of sensors produce vast amounts of time series data every day. The sheer volume of data that companies collect makes it challenging to analyze and glean insights. Machine learning drastically accelerates time series data analysis so that companies can understand and act on their time series data to drive significant innovation and improvements.

Sending NGINX Logs to Honeycomb is Darn Easy

Written by Andrew Puch and Brian Langbecker You use NGINX as a proxy for your application, and you want to leverage your favorite features in Honeycomb to help make sense of the traffic data. Have no fear: Honeycomb is more than capable and ready to help! Things you will need: Before you start with the instructions, let’s discuss a lightweight tool called Honeytail. This utility will tail log files, parse the various formats, and send the data to Honeycomb.

APM correlations in Elastic Observability: Automatically identifying probable causes of slow or failed transactions

As a DevOps engineer or SRE, you are often faced with investigating complex problems — mysterious application performance issues that happen intermittently or to only certain portions of your application traffic — that impact your end users and potentially your company’s financial targets. Sifting through hundreds or even thousands of transactions and spans can be a lot of tedious, manual, and time consuming investigative work.

What Does OpenTelemetry Mean for Companies Trying to Change?

Big data experts already agree that the amount of generated data is growing exponentially and forecast that it will reach 175 zettabytes by 2025. That projection is predicated upon current realities, which include a growing number of internet users and the billions of embedded systems and connected devices around the world. Even conceptualizing that amount of data is daunting — but then consider how best to manipulate and export it.

Industry Only End-User Experience Scorecard!

Goliath calculates an overall end-user experience score for your organization based on the critical IT metrics that in combination indicate what the user population is experiencing. Color-coding provides a quick visualization of performance against Citrix industry best practices.

The NetOps Expert - Episode 6: Welcome to the Experience-Driven NOC

Jeremy Rossbach, Head of DX NetOps Product Marketing and Jason Normandin, DX NetOps Product Management discuss the integrations of AppNeta digital experience monitoring and DX NetOps network monitoring to deliver the industry-first Experience-Driven Network Operations Center.

Using Observability with Kubernetes to Automate Site Reliability Engineering

In this video, Anthony Evans, solution architect, explains how the StackState topology-powered observability platform can help SREs to automate site reliability, putting their organizations on the path to becoming a zero-downtime enterprise. See how StackState helps to unify and correlate data across your stack, visualize your entire IT environment, instantly pinpoint root cause, reduce alert storms and with AIOps capabilities, even prevent problems proactively. It's all here!

10 Essential Cloud DevOps Tools for AWS

Building, testing, and monitoring applications in the cloud is a unique challenge. While many organizations have embraced a DevOps methodology, their DevOps machine is still not at the level of maturity they might like it to be. According to a recent survey, 53% work on a team with a 'low level' of DevOps based on maturity factors.

Simplify infrastructure and reduce costs with VPC Flow Logs ingest via Amazon Kinesis Data Firehose into Sumo Logic

Sumo Logic is proud to announce that, in collaboration with AWS, we now fully support Virtual Private Cloud (VPC) Flow Logs ingestion via Amazon Kinesis Data Firehose. Customers can now simplify log delivery to Sumo Logic which is natively integrated with Kinesis Data Firehose. You can also simplify your toolchains for aggregating, transforming and enriching VPC Flow Logs using Kinesis Data Firehose.

Splunk Data Manager Enables Google Cloud Platform Data Onboarding

I'm excited to announce that Splunk Data Manager now supports onboarding of Google Cloud Platform (GCP) data sources, effective immediately. With this launch, you can now get the benefits of Splunk data analysis for the high-value events generated by Google Cloud when you onboard GCP data sources into Splunk using Data Manager.

Reports, Sharing and More! What's New in Splunk Mobile This Summer

Hot summer days mean beautiful weather for picnics, pool days, and trips with the family. While you’re out this summer enjoying the sun, leave your laptop and backpack behind, because with Splunk Mobile, you’ll always be ready to access dashboards or receive alerts no matter where you are. The new features announced this year at.conf22 let you do even more from the comfort of your pool chaise!

Streamline Your Amazon VPC Flow Logs Ingestion to Splunk

Amazon Web Services (AWS) recently announced the ability to publish VPC Flow Logs directly to Amazon Kinesis Data Firehose. For Splunk customers, this feature helps to optimize the architecture to send VPC Flow Logs directly to Splunk Enterprise or Splunk Cloud Platform. With a fully managed service like Amazon Kinesis Data Firehose, users don’t have to worry about scaling, and can optionally transform their data in near real-time and enjoy the cost-effective, reliable service.

Network Monitoring & eBPF

I’m not going to lie, I have a strong hatred towards the Berkeley Packet Filter (BPF). There are a lot of reasons mainly having to do with having to support BPF on a network monitoring tool. There’s also the challenge of writing BPF filters and the weird way they work. So when I first heard about eBPF, I was more than a little reluctant to be excited. As I dug in further, I became much more excited about the technology and the benefits it can bring. So, what is eBPF then?

Debugging Node.js HTTP Requests

HTTP is the backbone of all API-centric, modern web apps. APIs are the place where the core business logic of an application lives. As a result, developers spend a lot of time optimizing the API business logic. This article addresses a Node.js developer’s dilemma while debugging an HTTP API request. We take a sample Node.js/Express.js-based HTTP service to demonstrate a new way of debugging Node.js applications using the Lightrun observability platform.

What is Hybrid Cloud Monitoring?

Cloud computing is becoming extremely popular. As a result, the trust in hybrid Cloud platforms is also increasing day by day. By combining the benefits of a Cloud platform with legacy systems, the hybrid cloud model offers something for every kind of business. But, without a strategy to manage operations on a hybrid Cloud model, it is challenging to extract maximum value from it. This blog will examine the best practices for hybrid Cloud management.

Dashboards that Replace your Release Manager

Back in my day, our offices used to have an “open concept” layout – just rows of desks. And at the end of every row was a 720i LCD TV showing 4 to 5 key metrics we’d watch after every release with great concern. While those wallboards sure were beautiful, we rarely had a clear view on how a release was trending. With our latest update to Dashboards, we’re joining form and function with Release Health widgets and a new release filter.

Top 10 Logging Frameworks Across Various Programming Platforms

A logging framework is a software tool that helps developers output diagnostic information during the execution of a program. This information is used to debug the program or monitor its performance. There are many different logging frameworks available, starting with simple logging libraries to full-fledged logging and observability platforms.

Why is Network Monitoring and Network Log Management So Crucial?

Without Network Monitoring, there is no good way to get a real-time view of your connected environment. But with Network Monitoring reports, you can look backwards to spot problems and trends. Just as vital are logs that deepen this rear-view mirror look, as they contain all the data for all the elements you are monitoring.

Swift Package Manager for Raygun4Apple

Raygun4Apple can now be added to your project using the newer Swift Package Manager, instead of Cocopods. Raygun4Apple provides Crash Reporting and Real User Monitoring for iOS, tvOS & macOS. Users of Raygun4Apple can now add it to their project by adding a reference to our GitHub repository. If you are not familiar with Swift, You might also want to check out Apple’s guide on how to add Swift packages in xcode first.

3 ways to implement Zero Trust in a legacy environment

Trust is a very fickle partner to rely on in the IT sector primarily due to the incessant barrage of security threats from both external and internal actors. This is why government, enterprise, and other types of organizations hold cybersecurity as a top priority as hackers discover ever more ingenious ways to stay under the radar.

Exoprise Customers Achieve Significant ROI With SaaS and Digital Experience Monitoring

Exoprise end-user experience management solutions for Microsoft 365, SaaS, and collaboration apps deliver cost savings, increased productivity, and elevated employee experience in times of growing hybrid/mobile work.

Monitor user-facing bugs with LambdaTest's subscription in the Datadog Marketplace

As your products and client base scale, maintaining effective test suites and providing rapid response to user-facing issues becomes increasingly challenging. Without thorough testing, bugs are more likely to go undetected, which creates poor user experiences and slower release cycles. LambdaTest is a cloud-based platform that supports real-time and automated testing for over 3,000 browsers, real devices, and operating systems.

Business-to-Business Middleware (B2B Integration) Market is Booming Worldwide with Microsoft, Oracle, IBM, etc.

The latest updated report published by Data Lab Forecast of COVID-19 titled “global Business-to-Business Middleware (B2B Integration) market analysis and forecast 2022-2030” includes information regarding the market share, industry’s growth prospects, scope, and challenges. The study comes up with the research objectives, detailed overview, import-export status, market segmentation, market share, and Business-to-Business Middleware (B2B Integration) market size evaluation.

25 AWS Monitoring Tools And Best Practices For 2022

Cloud computing offers several advantages over legacy on-premises systems, including cost, scalability, and performance. Today, Amazon Web Services (AWS) offers over 200 cloud services that can integrate seamlessly with your existing workflows, making it one of the most popular public cloud platforms. AWS strives to make its tools easy to use, but managing resources and services can be challenging.

New in Grafana Mimir: Introducing out-of-order sample ingestion

Traditionally the Prometheus TSDB only accepts in-order samples that are less than one hour old, discarding everything else. Having this requirement has allowed Prometheus to be extremely efficient with how it stores samples. And in practice, it really hasn’t really been much of a limitation for users because of the pull-based model in Prometheus, which scrapes data at a regular cadence off of the targets being observed. Several use cases, however, need out-of-order support.

Retrace - Much More Than Your Average APM Tool

Finding the root cause of application performance issues is the crux of every app troubleshooting exercise. So why don’t more APM tools provide great root cause analysis? The trouble is – pun intended – the cause of a slow application can come from different sources, many outside the application itself. So APM tools that focus on the ‘A’ may never truly find the root cause.

InfluxDB Cloud Native Collectors, Enterprise and Industrial IoT Examples - Part 1

Learn how to deploy InfluxDB Cloud’s Native Collectors with Kepware and The Things Network. Did you hear about the new feature that just dropped to InfluxDB Cloud? Native Collectors! Starting with MQTT. There will be plenty of content to get you started with Native Collectors. So this blog series covers connecting two popular IoT-based platforms to InfluxDB Cloud using native Collectors. One Enterprise use case and one industrial use case.

Augmenting APM with InfluxDB for Faster Issue Resolution

An enterprise IT company hosted a large industry event that drew attendees from all around the globe, including key technology leaders. Organizers knew that their IT offerings needed to be top notch to ensure attendees were happy when it came to event experience. The event application allowed attendees to browse and register for sessions at the event. So, organizers needed to be able to identify issues in real-time and fix them quickly.

Honeycomb Announces Major Updates to PagerDuty Integration

Today, we’re announcing major new updates to Honeycomb’s PagerDuty integration. These updates put more of the information you need into PagerDuty notifications and allow for greater configurability. These enhancements are available to all users who leverage Honeycomb Triggers and Burn Alerts to send notifications via PagerDuty.

Top 10 Questions to Ask about the Microsoft Teams User Experience

For more than a million organizations around the world, Microsoft Teams is at the core of communications and productivity in today’s hybrid workplace. Having made the investment in Teams, organizations have realized a challenge when it comes to Microsoft Teams user experience: visibility into the many factors that can impact Teams performance, from networks and ISPs to misconfigured routers.

Replay Data from Azure Blob with Cribl Stream

One of the core features of Cribl Stream is the Replay capability. We pride ourselves on giving customers choice and control over their data. The ability to archive data in cheap object storage, and then providing the ability to reach into the same object storage is one example of this. It’s safe to say that S3 and AWS have become synonymous with the term object storage. It’s like a modern-day Kleenex, or Band-Aid. However, it’s important to remember that there are other, equally featured object storage options available. In this video, we’ll walk through an example of Replay with Azure Blob, and view logs within Humio.

Three New Standards Compound Security Engineering Challenges

A recent ESG/ISSA survey highlighted that security professionals are overwhelmed with competing proprietary data standards and integration challenges. Today’s security landscape often comprises dozens of tools, each with its own unique format. Even if the format is defined and widely adopted, like Syslog, implementations vary widely from tool to tool, or even from release to release for the same tool. How big of a problem are these differing data formats?

SignalFlows to SLOs

How are you tracking the long-term operation and health indicators for your micro and macro services? Service Level Indicators (SLIs) and Service Level Objectives (SLOs) are prized (but sometimes “aspirational”) metrics for DevOps teams and ITOps analysts. Today we’ll see how we can leverage SignalFlow to put some SLOs Error Budget tracking together (or easily spin up same with Terraform)!

Changes are Observability's Biggest Blind Spot

Classically, the space of observability lies within layers of information on a dashboard. It operates by using the fundamental trio of data — metrics, logs and traces — from each layer of the environment to assess the health of an IT infrastructure. However, a time component is critical, making the stack observable at any point in time. Gathering reliable data and insights into your IT infrastructure remains the primary role of observability tools and services.

Status dashboards: Get visibility across teams and services all in one view

Applications are built and run by many people and made of many components: infrastructure, code pipelines and end users to name a few. Understanding the status of those components and teams is never straight forward. In this blog, we will be unpacking the problem faced by most organizations and taking a look at how SquaredUp can empower you and your organization with status visibility across different teams / components / services – all in one view.

Introducing Netdata Source Plugin for Grafana: Enhanced high-fidelity troubleshooting data source for the Open Source community!

The open-source community is about to benefit greatly from Netdata’s new Grafana data source plugin, which makes use of a powerful data collection engine. This new plugin maximizes the troubleshooting capabilities of Netdata in Grafana, making them more widely available. Some of the key capabilities provided to you with this plugin include the following.

Getting started with the Node.js client library in InfluxDB

If you use Node.js, then the Node.js client library allows you to interact with the InfluxDB platform quickly, using a familiar language. Here, Zoe Steinkamp discusses some of the features of the Node.js client library to help you get started building awesome applications with InfluxDB even faster.

What Is An Observability Data Pipeline?

Have you ever wondered how to get your organization's data into one place so you can easily monitor and troubleshoot your systems? If so, you're not alone. This is a common challenge faced by many organizations. The solution is an observability data pipeline. To better understand what this is and how it works, we've put together a brief overview.

SAP hyperautomation: SAP and the future of robotic process automation

In a post COVID landscape, the business world is increasingly focusing its attention towards task automation and digital first processes to allow them to save time, money and valuable business resources. As one of the key technology trends of 2022, hyperautomation is quickly becoming the main way organizations are achieving these goals.

How to Monitor Aerospike with OpenTelemetry

With observIQ’s latest contributions to OpenTelemetry, you can now use free open source tools to easily monitor Aerospike. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here. In this blog, the Aerospike receiver is configured to monitor metrics locally with OTLP–you can use the Aerospike receiver to ship metrics to many popular analysis tools, including Google Cloud, New Relic, and more.

The 5Ws (and 1H) of InfluxDB's Native MQTT Collector

MQTT is a messaging protocol used widely in the IoT space. Its publish/subscribe model is ideal for IoT because multiple devices can send data to a single MQTT broker, which MQTT clients can then access. The MQTT integration for InfluxDB’s native collector feature is a fast and easy way to create cloud-to-cloud data pipelines. Here’s the lowdown on using the Native MQTT collector. Who: Users and companies that rely on MQTT to provide time series data for their applications.

Logs Management made available in the latest release, podcasts, office hours & more - SigNal 16

Welcome back to our monthly product updates - SigNal! The latest release of SigNoz is now available with Logs management. You can use SigNoz as your one-stop open source observability solution with logs, metrics, and traces under a single pane of glass. We also participated in podcasts, held office hours, and got featured as one of the top promising startups in the DevOps ecosystem. Let’s see what humans at SigNoz were up to in the month of August 2022!

Authors' Cut-No More Pipeline Blues: Accelerate CI/CD with Observability

It’s no secret that CI/CD pipelines make the lives of engineering and operations easier by accelerating the feedback loop for higher quality code and apps. They build code, run tests, and safely deploy new versions of your application. But just like any aspect of development, poor integration, invisible bottlenecks, and bugs can plague your pipelines. And debugging them? Well, it’s complicated.

Real World Insights - My Take on the Observability Maturity Model

A prelude to our upcoming six-part Observability Maturity Model Fundamentals blog series. By Lodewijk Bogaards At StackState, we have spent eight years in the monitoring and observability spaces. During this time, we have spoken with countless DevOps engineers, architects, SREs, heads of IT operations and CTOs, and we have heard the same struggles over and over.

Goliath Technologies Announces KLAS Arch Collaborative Membership

Philadelphia, PA – September 6, 2022 – Goliath Technologies, the Health IT Standard with respect to improving end user client satisfaction with EHR applications, announced its membership in the Arch Collaborative, a KLAS Research initiative. The Arch Collaborative aims to improve the Electronic Health Record (EHR) experience through shared performance data and collaboration among providers, vendors, and industry leaders.

See how your end-users' experience compares to industry best practices

Goliath calculates an overall end user experience score for your organization based on the critical IT metrics that in combination indicate what the user population is experiencing. Color-coding provides a quick visualization of performance against Citrix industry best practices.

How do flow states work?

Work free from distractions can be hard to find, especially with more of us working from home. A flow state is the most productive place you can be, and humans have recognized the benefits of flow for thousands of years. But in order to achieve a flow state, you need to be free of distractions. In today's digital environment, it's up to companies to architect spaces where their employees can thrive.

What is N+1 query problem and how distributed tracing solves it?

N+1 query problem is a problem in database retrieval where the related entities of an object are queried individually from a database, leading to O(n) queries where n is the number of related entities of the object. Mouthful of words, I agree 🙂 Let’s take an example to illustrate what it means.

Top 15 Key Categories of Monitoring Metrics in Kubernetes and OpenShift Environments

Over the last couple of years, Kubernetes (often called K8s) has become the most popular and well-known container orchestration system for automating application deployment, scaling, and management. Scheduling containers at scale in a cloud-native ecosystem is central to the technology. Kubernetes itself is an open-source project, and as such presents challenges for many enterprises especially in regulated industries with strong security requirements and formal SLA commitments.

Get the Most Value from Your Observability Investment by Building for the Future

Technically speaking, observability offers visibility into the data being generated by your infrastructure devices, systems, and applications — but in reality, it offers the opportunity to see what’s happening, There’s no guarantee that you’ll get what you want; you have to set things up in a way that makes it possible for you to get the insights you need.

New Features in the Content Pack for Monitoring and Alerting

The 1.7 release of the Splunk App for Content Packs comes with a slew of new awesomeness for the Content Pack for ITSI Monitoring and Alerting designed to bolster your IT operations team’s visibility and AIOps posture! Previous versions of the content pack focused on making it easy for you to create and group Notable Events from ITSI Services and third-party monitoring tools.

Instantly Diagnose a Database Outage with Flow Alerts

Stateful, commonly monolithic, and absolutely fundamental to system design, the quality of your database administration and operation is a key determinant of your overall success. Databases are the cornerstone of modern architecture, requiring constant effort, investigation, and iteration to get the most out of a database. This makes it all the more terrifying when an outage occurs.

Data Collection Strategies for Infrastructure Monitoring - Troubleshooting Specifics

Monitoring and troubleshooting; unfortunately, these terms are still used interchangeably, which can lead to misunderstandings about data collection strategies. In this article we aim to clarify some important definitions, processes, and common data collection strategies for monitoring solutions. We will specify the limitations of the described strategies, as well as key benefits which can potentially be also used for troubleshooting needs.

How to Integrate GitHub with Sentry to Increase Speed to Resolution

Toolchains are complicated these days - developers and engineering managers are working with more tools than they probably care to count. In order to work efficiently in today’s world, it is essential to have smart integrations in place that bridge the gap between your tools to get you what you need, faster.

Sponsored Post

Core Web Vitals e-commerce analysis: part one

In 2021, Google introduced Core Web Vitals, three criteria to measure if a website is fast, stable, and responsive enough to give visitors a good digital experience. These factor into search ranking and have a powerful influence on customer behavior. But while Google has been urging the web performance community to get on board for more than two years, many are still falling short. We pulled data from the Chrome User Experience Report to conduct our own Core Web Vitals analysis, finding that even some of the largest e-commerce brands aren't passing these thresholds.

Sematext | Front End Tools and Monitoring Solutions

Monitoring Multiple Sites and Web applications can be difficult without the correct toolset. In this video, we will look at two of Sematext's front-end monitoring solutions, Sematext Syntethics, and Sematext Experience. Sematext Synthetics is a synthetic monitoring tool that allows you to monitor your website's availability, yours and 3rd party APIs, and business-critical web transactions with robust synthetic monitoring and testing tools. Create an HTTP monitor or use a custom script to track and monitor your systems performance.

7 Critical Considerations for Evaluating Infrastructure Monitoring Platforms

I remember how excited I was to build my first Network Operations Center (NOC). It was a new idea at the time (yes, I know I’m dating myself), and boy, did we feel like we were cutting edge. The mere idea that we needed a place and a set of tools to monitor our entire infrastructure (because it’s never really been about just the network) was a big transition at the time. How things have changed.

Dashboard Design: Getting Started With Best Practices (Part 1)

Every day, dashboards are viewed more than 500,000 times at Splunk. They’re what make the sea of data intelligible and help tell a story when working with a team. However, constant net-new dashboard creation is not necessarily a value-add activity — it’s a workflow to rapidly turn data into doing.

Sponsored Post

How Is Machine Learning Used In AIOps?

When we think of computers, we typically think in terms of exactness. For example, if we ask a computer to do a numeric calculation and it gives us a result, we are 100% sure that the result is correct. And if we write an algorithm and it gives an incorrect result, we know we have coded improperly and it needs to be corrected. This exactness however, is not the case when dealing with Machine Learning. As a matter of fact, it is par for the course, that Machine Learning will be incorrect a percentage of the time.

Sponsored Post

How to Get Real-Time Network Insight into Microsoft Teams Call Quality

A recent Exoprise customer survey found that 60-70% of application problems occur within the enterprise environment or home network/ISP. So, if you need to resolve Teams call quality problems, it's best to investigate your network before you try and finger point to Microsoft. In today's article, we see how this applies to Exoprise when team members work from home or in a hybrid work setting. Last Friday, at about 10:00 am EST, I jumped on an impromptu video call with one of my sales colleagues to discuss an ongoing marketing project. Although I am based in the Northern Virginia area, my comrade (as they say in British English!) is from Boston.

5 Best Practices of Network Security Monitoring

According to Accenture’s “State of Cybersecurity Resilience 2021” report, security attacks have increased 31% from 2021 to 2022. This statistic shows that organizations are not ready with a robust security plan and lack continuous network monitoring, resulting in security loopholes. Efficient network infrastructure is crucial for the success of your enterprise.

How to monitor Vault with Google Cloud Platform

Monitor Vault in Google Cloud Platform with the Google Ops Agent. The Ops Agent is available on GitHub, and makes it easy to collect and ship telemetry from dozens of sources directly to your Google Cloud Platform. You can check it out here! Below are steps to get up and running quickly with observIQ’s Google Cloud Platform integrations, and monitor metrics and logs from Vault in your Google Cloud Platform.

August Monthly Product Update - InfluxDB Native Collectors, Improved Tasks, CLI Onboarding, and a New OSS Distribution

We love to write and ship code to help developers bring their ideas and projects to life. That’s why we’re constantly working on improving our product in sync with developer needs to ensure their happiness and accelerate Time To Awesome. This month is special. We have many features that we think you will love when onboarding or continuing to use InfluxDB. We launched Native Data Collectors this month.

Archive Query 3 minute Overview

Archive Query is a revolutionary approach to long term storage of Observability data. Leveraging our Streama technology, Instead of hiding your data away, Coralogix archive query can be easily accessed, with one of the fastest archive query mechanisms on the market. This simple, intuitive approach will directly save you money and time, while ensuring that all of your data is available to you, whenever you need it, either via the Coralogix UI or in your very own cloud storage.

IDC Analyst Brief: Why You Need to Proactively Monitor and Optimize UC&C

Rich Costello, Senior Research Analyst at IDC has recently developed an Analyst Brief that looks at the growing need for proactive service quality management and optimization tools for communication and collaboration environments. As the world pivoted online during the COVID-19 pandemic, few companies were ready to support a fully virtual workforce that still required constant connection and to remain as productive as possible.

How to reduce MTTR with Grafana Loki and Grafana Tempo: Inside the Houzz observability renovation

Houzz is where millions of homeowners and home improvement professionals go to seek inspiration and supplies for their remodeling projects. But to continue as the leading platform for home remodeling and design, the Houzz tech stack needed a renovation of its own as the company scaled. In response, the Houzz team began by revamping their monoliths into microservices.

The Theme Park Workplace: A Modern Approach to IT Operations

IT teams in modern workplaces are no longer spending the bulk of their time troubleshooting and break/fixing issues. As in any service industry in the consumer world, IT service workers are now expected to deliver a great experience to their consumers – the employees. Managing the workplace has become much more like managing a theme park, where every aspect of its real estate should exhibit interest, joy, and fun; everything that makes up a great experience.

The Scaling Limitations of Graphite and Solutions to Overcome Them

Graphite is a free open-source software (FOSS) tool that monitors and graphs numeric time-series data. Graphite was originally a project developed internally at Orbitz in 2006, which eventually grew to be their foundational monitoring tool. In 2008, Orbitz allowed Graphite to be released under the open source Apache 2.0 license. Graphite made it possible to know more than simply if applications were up and running.

Tools for Time Series Data Science Problems with InfluxDB

This article was originally published in The New Stack and is reposted here with permission. You might need to perform anomaly detection or forecasting if you’re working with time-series data. The first step before working on your time series is finding the right data store. To effectively detect or forecast your data, you will require a data store that can handle a large volume of data at a high ingest rate. Therefore, you might want to look at using a purpose-built time-series database.

Feature Focus: August 2022

It’s already September! Time flies by when you’re getting things done, and we’ve been a busy bunch of bees here at Honeycomb. 🐝 We’re excited that we’ve gotten to share some of those changes with you already, like our relaunched interactive sandbox and the beta release of our OpenTelemetry log support and Go distribution, but that’s just the tip of the iceberg.

Measuring Cloud Unit Costs for FinOps

Cloud adoption has been on an upward trajectory for over a decade with no signs of slowing down. As widescale migration becomes the norm, organizations are realizing cloud financial management — also referred to as FinOps — is critical to creating long term value in the cloud. Building a culture of financial discipline requires visibility and a strategy for measuring success along the way.

How to add a Golden Signal to a service in Gremlin RM

In this video, we show you how to add a Golden Signal to a service. Gremlin uses your Golden Signals to ensure your services are still healthy and responsive during reliability tests. You can configure Golden Signals to use an existing monitor in your observability tools, such as Datadog, New Relic, or Prometheus. We recommend adding all four Golden Signals to each of your services to ensure comprehensive coverage.

Building a one-stop Open Source Observability Platform | OpenObservability Podcast

Pranay, one of the the co-founders at SigNoz, was recently invited as a guest speaker by Jonah Kowall, CTO at Logz.io on his OpenObservability Podcast. In the podcast, Pranay talks about the mission behind SigNoz - unifying traces, metrics, and logs in a single platform and interface. He also shared anecdotes about the evolution of SigNoz since its inception, the community adoption, and its contribution to SigNoz.

What is a Data Warehouse? Benefits and Tips

A data warehouse (DW) is a centralized repository of data integrated from multiple systems.. This data is often cleansed and standardized before being loaded. Designed to support analytical workloads, a data warehouse can help organizations better leverage both current data and historical data to improve decision-making through the analysis of business processes and outcomes.

THWACK Livecast: Automating Your Way Beyond Simple Incident Management

Presented by: Kevin M. Sparenberg (KMSigma) and David Russell (david.russell.CSM) It’s time to take your service desk solution to the next level with automation rules. Built on the framework of simple rules, you can improve efficiency, refine standardized processes, and transform the way your organization runs. This THWACK© Livecast is all about how automation rules in SolarWinds© Service Desk can help lighten the load and allow your teams to focus on those big picture projects which actually improve the business. Let's stop getting bogged down in the minutia and manual interaction of incident management and instead look at ways to lighten your load.

Ask Miss O11y: How Can I Convince My Organization to Invest in Instrumenting for Observability?

We recently hosted a Twitter Space, and a question came in regarding speaking to executives about instrumenting for observability. It’s a great topic we love expanding on. Here’s the answer we provided.

Tracing and Observing AWS ECS

It’s no secret that application containerization has revolutionized the digital world as we know it by providing a transient gateway into elastic infrastructure that can scale and grow as needed. Where traditional virtualization was all about creating a single homogenous entity, containers are self-contained units of software, able to run in just about any environment, making them extremely portable.

How to get started with the new Grafana Ansible collection for Grafana Cloud

More than 20,000 companies around the world use Ansible as their Infrastructure as Code and configuration management tool. With the rising popularity towards managing infrastructure using IaC and config management tools, Ansible is one of the best open source tools to choose from. That is why we are excited to announce a new Grafana Ansible collection available to all Grafana Cloud users, including those in the generous free tier.

How Netdata's Machine Learning works

Following on from the recent launch of our Anomaly Advisor feature, and in keeping with our approach to machine learning, here is a detailed Python notebook outlining exactly how the machine learning powering the Anomaly Advisor actually works under the hood. Or if you’d rather watch a video walkthrough of the notebook then check out below. Try it for yourself, get started by signing in to Netdata and connecting a node.

Automatically Convert Grafana Dashboards from InfluxQL to PromQL with a New Open Source Tool

It’s monitoring time. We all collect metrics from our system and applications to monitor their health, availability and performance. Our metrics are essentially time-series data collected from various endpoints. Then, it is stored in time series specialized databases, and then visualized in the metrics graphs we all know and love.

The What, Why, and How of Time Series Databases

This article was written by Thamatam Vijay Kumar. Scroll down for author bio and photo. Modern-day websites are filled with dashboards featuring enriched charts, line graphs, radar as well as multigraphs. The world is fascinated with such charts and graphs, which deliver much value to millennial web applications. There are many such chart libraries which provide interactive visualization and deliver data insights for users. The charts plot the lines using data points.

The Cost of Not Modernizing Your Infrastructure Monitoring

Did you know that nearly 15% of IT budgets go towards maintaining legacy systems? Recent data suggests that these systems cost hundreds of millions annually to operate and maintain. For that reason, many companies continue to invest in moving their systems away from expensive on-premises to the cloud, where maintenance is more cost-effective and consistent thanks to modern monitoring capabilities.

What Are Spans in Distributed Tracing?

Distributed tracing is an essential process in the modern world of cloud-based applications. Tracing tracks and observes each service request an application makes across distributed systems. Developers may find distributed tracing most prevalent in microservice architectures where user requests pass through multiple services before providing the desired results.

7 Ways to Slam Dunk Your Next Network Assessment Using Auvik

So, you’re an MSP that’s recently won a new client. As part of the deal, you’ve promised a network assessment: a look at the overall network to determine if there are any glaring issues that will interfere with day-to-day operations. The topology is vast, spanning multiple sites and dozens of switches and firewalls. How do you act quickly? And what should you be looking for? Here are 7 network assessment tasks Auvik can help you easily complete.

FastAPI and Starlette Sentry Integrations Have Arrived

FastAPI is known for building REST APIs, middleware services, and simple integration for adding authentications and more. And it’s known for doing all of that…you guessed it, fast. It’s used by tech giants and scientists alike, and according to Stack Overflow’s Developer Survey 2022 more developers are using FastAPI than Ruby on Rails.

Device Onboarding with Netreo's Auto Configuration

Wouldn’t it be great if there was some attribute you could query and set on a device? Then you could automagically configure that device based on that attribute that you just set for fully automated device onboarding! Welcome to Dynamic Device Attribute Pollers and Auto-Configuration Parameters! Rolls right off the tongue doesn’t it?

How Netdata's machine learning works

In this video we will walk though the Netdata Anomaly Advisor deepdive python notebook. The aim of this notebook is to explain, in detail, how the unsupervised anomaly detection in the Netdata agent actually works under the hood. No buzzwords, no magic, no mystery :) Try it for yourself, get started by signing in to Netdata and connecting a node. Once initial models have been trained (usually after the agent has about one hour of data, zero configuration needed), you'll be able to start exploring in the Anomaly Advisor tab of Netdata.

Cloud Providers Health Report - August 2022

Check August 2022 health report on the top 10 most popular cloud providers. We analyze the health of the cloud providers based on the number of outages and problems during the month. Please analyze the information with a grain of salt as each provider has their own framework for reporting incidents, and in no way one provider having more outages, means that they are worse than other. Also we are comparing providers with very different sizes and market share.

Higher Education and the Persnickety Network: How Monitoring Comes to the Rescue

Colleges and universities face immense IT challenges. The end user base is regularly overturning with students coming and going. And residential students are just part of the problem. Increasingly, schools support extensive distance learning, which only gained ground thanks to COVID. Now that remote work and distance learning are the new mandates, there are even more difficult challenges for Higher Education IT to deal with.

Leveraging the Power of AIOps for IT Modernization in Government

Government agencies want to modernize their ITOps, but technology and operations issues such as limited technology budgets and complicated government procurement processes are impacting their ability to transform. Read this eBook to understand: Download this eBook today with our compliments to get a detailed analysis of how AIOps can help government agencies modernize their IT.

AIOps Means Business: IT Innovation for Business Advantage

With organizations requiring more technology to support the shift to a hybrid workforce, IT is overtaxed. And digital transformation requires a skilled staff, but most organizations are struggling to find IT employees with the right skill set-halting digital transformation initiatives. Thankfully, there's a solution: AIOps. In "AIOps means business: IT innovation for business advantage," EMA digs deep into the meaning of AIOps and how it has evolved to mean AI + automation.

Monitor Physical Servers and Virtual Machines with WhatsUp Gold

In today's world, monitoring your servers is more important than ever before. With proper monitoring, you can understand system resource usage and identify performance-related issues like utilization, downtime, and response time. As a network or system admin, you know how vital server uptime is. On the reverse, how downtime can be detrimental to your business and your staff's productivity, resulting in loss of sales and, therefore, loss of revenue.