Operations | Monitoring | ITSM | DevOps | Cloud

October 2023

The role of cron job monitoring in preventing business disruptions

In today's digital landscape, businesses rely heavily on various online operations to deliver services, communicate with customers, and maintain a competitive edge. Behind the scenes, cron jobs play a pivotal role in automating critical processes such as data backups, report generation, and system maintenance. However, ensuring the reliability of these cron jobs is often overlooked until a disruption occurs.

Hybrid Cloud Strategies - AWS Launches Dedicated Local Zones with Singapore Government as First Customer

Amazon Web Services (AWS) recently released a new cloud deployment option – ”Dedicated Local Zones” targeted at public sector, government and regulated industry use cases. Many customers still rely on on-premises infrastructure to meet regulatory and compliance requirements such as data localization. You can read the details of the AWS announcement including which services Dedicated Local Zones will support, here: Announcing AWS Dedicated Local Zones (amazon.com).

Replay messages in Azure Service Bus dead-letter queue

When working and dealing with asynchronous messaging patterns – in this case, using Azure Service Bus – depending on the requirements, you will find many scenarios when you need to reprocess messages. Sometimes, a message failed because a system was offline for a certain period, there was a bug with the service, and we needed to resend specific messages and many other reasons.

Meeting the SEC's New Cybersecurity Rules: How Flowmon Empowers Public Companies To Comply

The much-anticipated cybersecurity rules by the U.S. Securities and Exchange Commission (SEC) for public companies have arrived, signaling a significant step forward from the proposed rules released in March 2022. These final rules, effective July 26, 2023, introduce new obligations that public companies must adhere to, promising a more secure and transparent corporate landscape. However, these regulations bring significant compliance challenges and litigation risks.

Why digital experiences will make or break the holiday season for retailers in 2023

Survey results reveal online shopping is set to soar over the holiday season and retailers must ensure their applications are ready. Research published today by Cisco AppDynamics reveals that consumers around the world are planning to do more of their holiday season shopping online than ever before this year. On average, consumers expect that 59% of their spending on key shopping dates such as Black Friday and Cyber Monday will be online this year versus in-store, compared to 53% last year.

AppSignal Monitoring Available for Python Applications

We're happy to announce that AppSignal now offers monitoring tools for Python projects. AppSignal helps you get the most out of your Python application's monitoring metrics, with additional support for multiple Python frameworks and packages such as Django and Celery. In this article, we'll walk you through some of our core features to show you how to power up your Python application with AppSignal.

How to configure OpenTelemetry .NET Automatic Instrumentation with Grafana Cloud

For those who have limited experience with OpenTelemetry, it can be intimidating to instrument.NET applications. But the OpenTelemetry community created a welcome shortcut with the first stable release of.NET Automatic Instrumentation. It simplifies the process of collecting metrics, logs, and traces from your.NET applications, without applying any changes to the source code or adding any dependencies to the OSS project.

Create a logs app plugin with Grafana Scenes and Grafana Loki

Grafana’s plugin tools help developers extend Grafana’s core functionality and create plugins faster, with a modern build setup and zero configuration. Grafana Scenes, meanwhile, is a new front-end library, introduced with Grafana 10, that enables developers to create dashboard-like experiences — such as querying and transformations, dynamic panel rendering, and time ranges — directly within Grafana application plugins.

Got Ghosts in Your Enterprise Network?

Shining a light on the dark corners of the new enterprise network doesn't have to be as scary or overwhelming as some think. While “ghost issues” typically lurk in these sometimes unexplored places on the internet or in cloud environments, during this Halloween season your network operations teams can gain the confidence to not only uncover these network ghosts, but compel and cast them out forever.

Using Cribl Search to Aid in Threat Hunting by Enriching Data in Motion

Cribl Search is reshaping the data search paradigm, empowering users to uncover and analyze data directly from its source. Cribl Search can easily reach out and query data already collected in Amazon S3 (or S3 compatible), Amazon Security Lake, Azure Blob, Google Cloud Storage, and more. By searching data where it lives, you can dramatically speed up your search process by avoiding the need to move data before analyzing it.

Achieving observability in Heroku applications with Sumo Logic

Are you one of the many companies harnessing the power of Heroku to build, deliver and scale your applications seamlessly? If so, you're likely aware of the need for robust observability to ensure your Heroku environment runs smoothly. Let’s delve into the world of Heroku monitoring and explore how Sumo Logic, a leading observability platform, can provide invaluable insights into your Heroku infrastructure and application logs.

What is Network Monitoring?

Today, more than ever, as IT environments become more diverse and complex, the need for an effective network monitoring solution has become paramount. However, with the digital environment, it’s constantly ever-evolving, so, these tools must keep pace with these changes to ensure they are still effective for users diagnosing issues and identifying bottlenecks within their network.

The Driving Force of Community at All Things Open 2023

The most noticeable takeaway from All Things Open 2023 was how visibly and demonstrably people were there for the event itself. Not to check a box or browse the swag but to be together, show their support of open source, and glean every last bit of knowledge they could.

Top 6 Benefits of AIOps You Must Know

AIOps — a term coined by Gartner back in 2017 has emerged as one of the hottest talking points in the realm of ITOps. Since its inception, its adoption by large enterprises has gone up from 5% to 30% in 2023. This impressive 6X growth is a clear indicator that AIOps is here to transform how businesses manage their IT operations. But what is it that makes AIOps special? What are the benefits of AIOps that businesses are looking to leverage.

ManageEngine's Android Enterprise Gold partnership: Empowering your organization

We are thrilled to announce that ManageEngine has achieved the prestigious gold badge in Google’s Android Enterprise partner program. This recognition stands as a testament to our commitment to excellence and our dedication to providing top-notch solutions for your organization. In this blog, we’ll explore the Android Enterprise program, the benefits of our partnership with it, and how it can empower your organization in the digital era.

Sponsored Post

SIEM Logging for Enterprise Security Operations and Threat Hunting

Today's enterprise networks are diverse and complex. Rather than the simple network perimeter of old, bad actors can attack through multiple entry points, including cloud-based applications. Not to mention, these networks generate massive amounts of transactional data. Because enterprise networks have become larger, they're more difficult to secure and manage. As a result, IT operations teams and security analysts seek better ways to deal with the massive influx of information to improve security and observability.

PromCon Recap: Prometheus Ecosystem Updates

In the first part of our 2023 PromCon recap, we spent OpenObservability Talks exploring the Perses open source project. We found heavy users of open source Grafana who found themselves grappling with issues arising from managing a vast number of dashboards, and the need to manage dashboards as code in a GitOps fashion.

IT Operations Management (ITOM): The Basics

What is ITOM? Information technology operations management (ITOM) is the administration and management of an organization’s hardware, network, applications and technology needs. Generally regarded as the true meaning of “tech support,” it is a service-centric approach to IT infrastructure, IT support operations, IT networking and end user support.

Top 10 Distributed Tracing Tools For Your Success

In the intricate web of modern software systems and full-stack observability, knowing how requests flow and interact across distributed components is paramount. Distributed tracing tools can help you. To better understand how distributed tracing works and benefits, here’s our selection of top distributed tracing tools to choose from.

Troubleshoot and Monitor LogStash using Cribl Edge and Cribl Search

I have worked as a helpdesk specialist, cyber security analyst, information systems security engineer, professional services consultant, etc. At this point in my career, I have seen enough to relate with anyone in the IT world. Let’s narrow our focus and chat about monitoring system health and troubleshooting. Tool sprawl is the standard.

How to integrate a Spring Boot app with Grafana using OpenTelemetry standards

Maciej Nawrocki, Senior Backend Developer at Bright Inventions, is a backend developer focused on DevOps and monitoring. Adam Waniak, Senior Backend Developer at Bright Inventions, is a backend developer with a keen interest in DevOps. Bright Inventions is a software consulting studio based in Gdansk, Poland, with expertise in mobile, web, blockchain, and IOT systems. At Bright Inventions, we always prioritize app optimization when we develop software solutions for our clients.

The Significance of Event-Driven Architecture in IT Monitoring

In today's fast-paced digital world, the reliability and performance of IT infrastructure are critical to business success. Monitoring technology plays a pivotal role in ensuring that systems and networks operate seamlessly, and one such technology is Zenoss. This blog provides an in-depth look at Zenoss technology and sheds light on the importance of event-driven architecture in modern monitoring.

Plan new architectures and track your cloud footprint with Cloudcraft by Datadog

In a rapidly expanding, highly distributed cloud infrastructure environment, it can be difficult to make decisions about the design and management of cloud architectures. That’s because it’s hard for a single observer to see the full scope when their organization owns thousands of cloud resources distributed across hundreds of accounts. You need broad, complete visibility in order to find underutilized resources and other forms of bloat.

The Biggest Ecommerce Challenges this Black Friday

We recently featured in Ecommerce Age. If you missed the write up, you can catch up in full, here… As ecommerce continues to outdo the high street, Black Friday sales are becoming as much of a tradition as Christmas dinners. But shoppers are very influenced by external factors, from the economy to website experiences. We outline the key ecommerce challenges this Black Friday…

Interlink's Service Chain Mapping solution: Helping Banking & Finance Organizations Strengthen Operational Resilience and Meet Regulatory Requirements

Operational resilience is an increasing area of focus and scrutiny for regulators of the banking and financial services industry. In the European Union, the Digital Operational Resilience Act (DORA) looms on the near horizon - with equivalent regulatory frameworks slowly but surely rolling out across the globe.

VMware performance monitoring: Importance, benefits, and best practices for optimal VMware performance

Virtualization involves creating multiple virtual instances on a single physical server, allowing for efficient utilization of hardware resources and isolation of workloads. Businesses prefer a virtual environment as it can be tailored to meet specific security and performance requirements, and it provides numerous customization. The concept of virtualization became accessible after the emergence of VMware, a cloud computing virtualization platform for hosting complex architecture effortlessly.

Discovering Zero Days: Why configuration management wins

“Zero Days” may be one of the most recognizable cybersecurity terms, other than hacker of course, for good reason. Zero Day Vulnerabilities are notoriously challenging for defending security teams to identify. Because of delays between active exploit and discovery, they are one of the worst examples of “Known Unknowns” in cybersecurity (Other than user’s behavior of course..). It’s important to understand that Zero Days are not really brand-new vulnerabilities.

Azure Functions Distributed Tracing

In today’s cloud-centric, serverless computing landscape, applications are increasingly distributed and complex, composed of numerous microservices, functions, and external dependencies. Azure Functions, a serverless compute service offered by Microsoft Azure, plays a pivotal role in building scalable, event-driven applications.

A Guide to Docker Adoption

Whether you’re a developer or a security analyst, you probably already know the name Docker. Developers use Docker’s open-source platform to build, package, and distribute their applications. Since the application and all dependencies sit in the container, it runs consistently across different operating systems and environments. As with everything technology, Docker adoption is a good news/bad news story. Good news: DevOps teams can ship applications faster.

How to Detect, Fix & Reduce Network Overload

Network overload is a modern-day nemesis we've all had to grapple with at some point. It's that heart-pounding moment when your network decides to take an unscheduled coffee break, right in the middle of that critical video conference or a major data transfer. But fear not, for this blog is your trusty guide to help you navigate the treacherous waters of network congestion with the finesse of a digital detective.

Secret to Flawless Deployments: Real-Time Canary Deployment tracking with Argo CD & Levitate!

Most of your outages are probably caused by a change, and having observability around that will make a lot of difference. Dive into this walkthrough, where we showcase tracking Canary deployments in Argo CD, correlating events and metrics seamlessly with Levitate. For Site Reliability Engineers, DevOps engineers, Software Engineers, and Product Managers seeking to elevate their observability and ensure smooth deployments every time.

Real user monitoring with Grafana (Grafana Office Hours #17)

Have you heard of Grafana Faro and how it can help in providing insights into your real user experience? Developer Advocates Nicole van der Hoeven and Marie Cruz chat with Software Engineers Kostas Pelelis and Marco Schaefer to discuss what Grafana Faro is, why it's important to have a real user monitoring solution, and how to get started with Grafana Faro.

Netdata vs Prometheus: Performance Analysis

In an era dominated by data-driven decision making, monitoring tools play an indispensable role in ensuring that our systems run efficiently and without interruption. When considering tools like Netdata and Prometheus, performance isn't just a number; it's about empowering users with real-time insights and enabling them to act with agility.

Monitor your multi-client network with OpManager MSP's agent-based monitoring

Monitoring a network for its uptime and peak performance is crucial. By tracking network performance, organizations can better understand their network requirements, gain in-depth visibility, identify mishaps quickly, and roll out remediation measures. However, this is easier said than done. The complexity only increases when the network is an MSP’s.

How do you troubleshoot a network problem?

Murphy’s Law states, “Anything that can go wrong, will go wrong.” This is an old adage that can be applied to IT networks everywhere. Organizations and IT admins can perfect their networks to the best of their powers, however, network issues of varying severities can still pop up. These network issues need immediate responses and resolutions. If such issues go unresolved for an unreasonably long time, the damages to both the network and the organization can be costly.

Sponsored Post

Streamlining SAP Kernel upgrades with Avantra

Picture the SAP Kernel as the heartbeat of the system, vitalizing the core programs upon which the fundamental functionality of SAP applications rely on. It's the life force pulsing through the application server, executable programs, database, and operating system, rather than merely encompassing them within itself. SAP Kernel upgrades refer to updating the system's current executables with upgraded versions. These upgrades are essential to patch security vulnerabilities and fix bugs. Besides bug fixing, SAP Kernel upgrades improve hardware compatibility, boost speed, and enhance stability.

10+ Best API Monitoring Tools: Free & Paid Services [2023 Comparison]

As APIs play a crucial role in connecting modern cloud applications, monitoring their availability and performance is a must if you want to provide a top-notch experience. A good API monitoring tool will help you build reliable APIs by identifying and resolving the issues before they reach your users. If you’re interested in such a solution, look no further. In this article, we reviewed some of the best API monitoring tools and services available today, both open source and commercial.

Webinar Recap: Build an Edge-to-Cloud Architecture Using MQTT and InfluxDB

Industrial IoT (IIoT) machines and sensors generate valuable time series data. It’s impossible to derive the insights necessary to inform decisions as a company to produce or operate more efficiently without sending operational technology (OT) data to informational technology (IT) systems.

Grafana panel title generator, interactive visualizations, and more

Dive deep into the enhancements in Grafana 10.2. From embedding interactive buttons within Canvas visualizations to utilizing AI for dashboard titles and descriptions, this release brims with features designed to optimize your Grafana experience. In this video, discover how to: Stay with us through this playlist to delve deeper into each addition and maximize your Grafana 10.2 experience.

Anomaly Detection for Time Series Data: Anomaly Types

Welcome to the second chapter of the handbook on Anomaly Detection for Time Series Data! This series of blog posts aims to provide an in-depth look into the fundamentals of anomaly detection and root cause analysis. It will also address the challenges posed by the time-series characteristics of the data and demystify technical jargon by breaking it down into easily understandable language. This blog post (Chapter 2) is focused on different types of anomalies.

DevOps & DORA Metrics: The Complete Guide

In in order to achieve DevOps success, you must measure how well your DevOps initiatives work. Tracking the right DevOps metrics will help you evaluate the effectiveness of your DevOps practices. In this article, I’ll explain many DevOps metrics, including their significance, the key metrics for various goals, and — best of all — tips for improving the score of each DevOps metric discussed here.

What Is ITSM? IT Service Management Explained

ITSM, which stands for IT service management, is a strategy for delivering IT services and support to an organization, its employees, customers and business partners. ITSM focuses on understanding end users’ expectations and improving the quality of both IT services and their delivery. In the early days of computers, employees relied on the company IT department for help whenever a computer issue arose.

Use Datadog Dynamic Instrumentation to add application logs without redeploying

Modern distributed applications are composed of potentially hundreds of disparate services, all containing code from different internal development teams as well as from third-party libraries and frameworks with limited external visibility. Instrumenting your code is essential for ensuring the operational excellence of all these different services. However, keeping your instrumentation up to date can be challenging when new issues arise outside the scope of your existing logs.

Continuous profiling: The key to more efficient and cost-effective applications

Recently, Elastic Universal ProfilingTM became generally available. It is the part of our Observability solution that allows users to do whole system, continuous profiling in production environments. If you're not familiar with continuous profiling, you are probably wondering what Universal Profiling is and why you should care. That's what we will address in this post.

Set up Microsoft Teams alerts when a website changes

Website monitoring has grown in importance over the past decade for individuals and businesses all around the globe – and for different purposes. It became even more important in 2020 during the COVID-19 pandemic. As travel, events, and offices around the globe shut down rapidly, people relied on different tools and features to be kept up to speed regarding the ongoing situation.

Fighting DDoS at the Source

For decades, the scourge of distributed denial of service (DDoS) attacks has plagued the internet. Join Doug Madory, Director of Internet Analysis at Kentik, and Aaron Weintraub, Principal Engineer at Cogent Communications, as they explain how organizations can identify customer networks sending the spoofed traffic that leads to DDoS attacks.

A Brief History of BGP Incidents

Kentik internet analysis expert, Doug Madory, discusses the most notable and significant BGP incidents in the history of the internet, from traffic-disrupting leaks to recent crypto-stealing hijacks. Stretching back to the AS7007 leak of 1997, this webinar uses a historical perspective to explore the questions: what progress has been made and what is the path to finally securing BGP?

WAN Monitoring for Turbocharging WAN Performance

In an era defined by the relentless pace of digital transformation, the Wide Area Network (WAN) has emerged as the unsung hero of connectivity. With organizations expanding globally, remote work becoming the norm, and data flowing like a digital river, the WAN is the backbone that keeps the modern world interconnected. Yet, as the demand for high-performance, reliable, and secure WANs skyrockets, so do the challenges that network administrators face.

Unleashing the Full Potential of FinOps: Going Beyond the Cloud

FinOps is beginning to take the enterprise by storm, but many enterprise IT leaders may be taking too narrow a view and risk falling into a trap. There’s a good reason for this upswing in FinOps attention: runaway cloud costs are becoming a significant challenge for enterprise IT leaders, particularly as they move legacy workloads to the cloud in earnest.

OpenSearch vs. Elasticsearch: Which is Better?

Following its release under the open-source Apache 2.0 license in 2010, Elasticsearch rose to prominence as the world’s most popular enterprise search engine. Elasticsearch is frequently deployed alongside Logstash and Kibana, a combination known as the ELK stack, to enable log analytics use cases that include application observability, security log analysis, and understanding user behavior.

HTTP Monitor Overview: What It Is, Why & How to Create One [Tutorial]

The World Wide Web’s transmission system is built on HTTP. To ensure an application that uses the HTTP transmission works, you must monitor it constantly. This is where an HTTP monitor comes in. In this tutorial, we’ll cover the fundamentals of HTTP monitors, including what they are, why they matter, and how to set one up.

Solr Monitoring Tools

Solr is widely adopted by startups and enterprises alike. It’s powerful and open-source, so it’s very appealing to just about everyone looking for a search platform to build off of. Being easily accessible, many people overlook the importance of monitoring Solr. Even when that importance is put into question, a lot of people continue with the trend and use an open-source tool for their monitoring needs.

Elastic's contribution: Invokedynamic in the OpenTelemetry Java agent

As the second largest and active Cloud Native Computing Foundation (CNCF) project, OpenTelemetry is well on its way to becoming the ubiquitous, unified standard and framework for observability. OpenTelemetry owes this success to its comprehensive and feature-rich toolset that allows users to retrieve valuable observability data from their applications with low effort. The OpenTelemetry Java agent is one of the most mature and feature-rich components in OpenTelemetry’s ecosystem.

Effortless Engineering: Quick Tips for Crafting Prompts

Large Language Models (LLMs) are all the rage in software development, and for good reason: they provide crucial opportunities to positively enhance our software. At Honeycomb, we saw an opportunity in the form of Query Assistant, a feature that can help engineers ask questions of their systems in plain English.

PromCon Recap: Unveiling Perses, the GitOps-Friendly Metrics Visualization Tool

In the vibrant atmosphere of PromCon during the last week of September, attendees were treated to a plethora of exciting updates from the Prometheus universe. A significant highlight of the event has been the unveiling of the Perses project. With its innovative approach of dashboard as code, GitOps, and Kubernetes native features, Perses promises a revolutionary experience for Prometheus users, which gained a lot of traction at the conference.

AI Explainer: Glossary of Artificial Intelligence Terms

I speak with customers and partners pretty much every week about artificial intelligence. The knowledge levels can differ quite dramatically — some are quite AI savvy while others find the jargon bewildering. This is quite understandable as AI is a rapidly evolving field with its own set of specialized terminology. This blog post is purely meant to provide a beginner-friendly reference for some essential AI terms to make it easier to navigate conversations and articles on the topic.

Evil in Automation: A Haunting Tale This Halloween

In the spirit of Halloween, imagine a world where goblins, ghosts, and ghastly waits lurk in the shadows of your website monitoring. In the world of automation, nothing is more terrifying than the sneaky presence of ‘waits’. They may seem harmless and solve problems in the short term but in reality, they can distort the very essence of monitoring.

AppDynamics Talks Optimized Self-healing with Full-stack Observability, Auto-remediation

From an IT perspective, technologists generally agree that the ability to monitor and have visibility into the IT stack across every one of their applications is essential with the now-permanent remote and hybrid work models. It also stems from the fact that digital transformation and IT growth has accelerated by seven years since the pandemic in 2020, analysts say.

Auto-Instrumentation with the AWS CDK

As a developer I love automation, Whether it’s orchestrating a smart home or optimizing developer toolchains. Automation injects efficiency into my daily routine, simplifying intricate processes and eliminating repetitive tasks. Especially when it comes to toolchains, I’m constantly on the lookout for ways to boost coding workflows. Cloud Development Kits (CDKs) brought with them a new era of streamlined developer toolchains.

Proactively secure your business with new Cisco Secure Application enhancements to protect cloud environments

Map, prioritize and act on security issues found in cloud environments with the newly expanded security offering from Cisco AppDynamics. Welcome to the October edition of the What’s New in Security series — and — happy security awareness month!

Our DNS check can now monitor hidden CNAME records

Besides monitoring your site's uptime, Oh Dear offers many other checks to monitor all kinds of aspects of your web app. One of those checks is our DNS check. Whenever we detect problems with your DNS records or when one of the DNS records changes, we can notify you. By default, we only monitor the DNS records of the domain you are monitoring. So when you're monitoring example.com, we'll only monitor the records of that hostname. A CNAME record is a special kind of DNS record.

OKRs, KPIs, and Metrics: Understanding the Differences

In the world of business management and performance tracking, OKRs, KPIs, and Metrics are common terms thrown around. Each plays a distinct role in helping organizations define their vision, measure their progress, and improve their performance. Let's dive deep into understanding the nuanced differences between these three concepts.

What is Observability? An Introduction

Simply put: Observability is the ability to measure the internal states of a system by examining its outputs. A system is considered “observable” if the current state can be estimated by only using information from outputs, namely sensor data. More than just a buzzword, the term “observability” originated decades ago with control theory (which is about describing and understanding self-regulating systems).

Importance of Network Monitoring

In today’s fast-paced digital world, the importance of network monitoring is growing by leaps and bounds. With real-time network monitoring, businesses can transform their user experiences and drive unstoppable growth. Whether you run a small business or a large corporation, incorporating network monitoring tools can help in the smooth functioning of your networks. Join us as we unveil the top benefits of network monitoring in the modern business landscape.

How to navigate alerting insights in Grafana Cloud

Navigate your alerting systems in Grafana Cloud with the new Alerting Insights feature. This enhanced landing page provides a comprehensive view of your alerting data, from Grafana managed rules to Mimir managed rules, and highlights critical trends in your organization's alert management performance.

Countering Internet Complexity with AI Powered Internet Performance Monitoring

Organizations’ technology strategies are coming in for a major overhaul in 2024 and beyond. The rate of innovation in IT in general and in security in particular is accelerating steadily. Meanwhile, the revolutionary advances in artificial intelligence of the last 18 months are upending technologies across the board -- and requiring a serious rethink for technology professionals and managers. In this on-demand webinar, we explain how enterprise companies can counter Internet complexity with Catchpoint's AI-powered Internet Performance Monitoring.

What's new at Catchpoint! Fall 2023 Product Launch Event

Catchpoint is constantly evolving! Find out how Catchpoint’s newest products and capabilities accelerate time to detection, improve automation capabilities, and further expand our Global Observability Network. Watch the webinar recording for a demo of our 2 new products: Catchpoint Tracing – Extending the reach of IPM beyond the traditional Internet Stack to visualize request journeys through backend application components for more efficient and effective troubleshooting.

Best practices for Elasticsearch on Kubernetes | Kubecon

In this talk, Radu will delve into the world of Elasticsearch and OpenSearch within Kubernetes. In this informative snippet, we uncover the best practices for deploying, managing, and optimizing these powerful search and analytics engines in your Kubernetes environment. Whether you're a seasoned developer, a DevOps enthusiast, or a data-driven professional, this presentation offers invaluable insights that will enhance your Elasticsearch and OpenSearch deployment strategies.

The Grafana developer portal: your gateway to enhanced plugin development

As Grafana continues to evolve, we remain dedicated to improving the experience for Grafana users, as well as the developers building applications on top of the platform. Today, we are delighted to introduce the next step in that evolution, the all-new Grafana developer portal — a central hub of curated resources for developers who want to extend Grafana’s capabilities.

Sponsored Post

Microsoft Windows 365 on VMware Horizon Cloud

The world of virtualization has been evolving at a rapid pace, transforming the way organizations manage and deliver desktop computing solutions. Windows 365, Microsoft's cloud-based operating system, is a game-changer in this space. Combined with VMware Horizon Cloud, it opens up a world of possibilities for businesses seeking flexibility, scalability, and enhanced security in their desktop infrastructure. In this blog post, we'll explore the synergy between Windows 365 and VMware Horizon Cloud, showcasing the benefits and features of this powerful combination.

Prioritize and promote service observability best practices with Service Scorecards

The Datadog Service Catalog consolidates knowledge of your organization’s services and shows you information about their performance, reliability, and ownership in a central location. The Service Catalog now includes Service Scorecards, which inform service owners, SREs, and other stakeholders throughout your organization of any gaps in observability or deviations from reliability best practices.

Querying Arrow tables with DataFusion in Python

InfluxDB v3 allows users to write data at a rate of 4.3 million points per second. However, an incredibly fast ingest rate like this is meaningless without the ability to query that data. Apache DataFusion is an “extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.” It enables 5–25x faster query responses across a broad range of query types compared to previous versions of InfluxDB that didn’t use the Apache ecosystem.

Why infrastructure monitoring is important

In today’s technology driven world, businesses rely heavily on their digital infrastructure to operate efficiently and serve customers effectively. With the growing complexity of these infrastructures, ensuring their stability and performance has become paramount. “But how can I realize that?” is a question i often hear. And this is where infrastructure monitoring steps in.

Releasing Icinga Director v1.11

You may have noticed that, during the last few weeks, we released a bunch of new versions for different components of our stack. It’s a very exciting time of the year for us, since we’re currently finishing work that we have done through the last months. Today, we’re announcing another release: The general availability of Icinga Director v1.11! This new version ships with nice new features, which has been requested by many users. Check out the full changelog for all details.

Kubernetes + Cribl Edge: Because Logging and Metrics Shouldn't Be a Mystery Novel!

To fully utilize the capabilities of Kubernetes, it’s crucial to have a reliable system for gathering and organizing logs, metrics, and events. With the complex nature of container orchestration, it’s crucial to understand the significance and process behind the data generated in a Kubernetes environment at scale. Cribl Edge works seamlessly with Kubernetes and can cater to various needs.

Building the Future of Data for IT and Security

Today, Cribl surpassed $100 million in annual recurring revenue (ARR), becoming one of the fastest companies to ever reach this milestone in under four years––an incredible achievement on our journey to building a generational company. Reaching $100 million in ARR so quickly shows that our unique approach and steadfast focus on IT and Security continues to be validated by the market.

6 Best Azure FinOps Tools for Cost Optimization

FinOps is an evolving concept increasingly practiced in cloud computing organizations to manage and optimize their infrastructure cost. It requires team collaboration among Finance, Engineering and IT Operations to gain a deep understanding of the expenditure, take financial accountability, and make informed decisions to maximize the business performance.

Unlocking IT Transformation: Synergizing ITIL and AIOps for Enhanced Monitoring and Observability

In the ever-evolving landscape of IT operations, staying competitive and efficient is a paramount concern for organizations. This session aims to be your compass on this transformative journey, offering strategic insights that resonate with both C-suite executives and IT team leaders. Join UnityTech CEO & Founder Jesus Cordoba for a dynamic exploration of how the ITIL framework, DevOps practices, and AIOps technologies can synergize to provide your teams and executives with unprecedented superpowers in the realm of monitoring and observability.

How to Calculate Uptime? And 5 Tips for Achieving 99.9%

The lifeblood of any online business lies in its accessibility. When your site or application is accessible, your customers are happy, your brand reputation remains intact, and revenues keep flowing. Because of this, understanding and calculating uptime becomes crucial. Let’s face it: any downtime translates to lost revenue.

How OpsRamp Closes the Complexity Gap with Distributed Tracing

As distributed, interconnected microservices have replaced monolithic applications, application monitoring has had to evolve to support these modern, complex architectures. Rather than monitoring a single application and code base, organizations need to monitor the performance and network connectivity of multiple services that interact with each other.

Kubernetes Deep Dive: Key Features, Visibility and Optimization

Kubernetes or K8s is an open-source production-grade container orchestration system for automating, scaling, and managing containerized applications. A container is a lightweight, standalone, executable ready-to-run software package that contains everything needed to run an application. It includes the runtime, code, libraries, systems tools, and default values for any essential settings.

Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0

This article coins the term “FDAP stack”, explains why we used it to build InfluxDB 3.0, and argues that it will enable and power a generation of analytics applications in the same way that the LAMP stack enabled and powered a generation of interactive websites (by the way we are hiring!).

The benefits of web performance testing with Grafana k6 browser and Grafana Faro

Performance tests created with the Grafana k6 browser module use lab data collected from pre-defined environments, devices, and network settings. Lab data allows you to repeatedly reproduce performance results, making it useful for detecting and fixing performance issues early. Lab data, however, doesn’t account for one very important testing component: real user experience.

Using Cisco Meraki Solution for SD WAN? ScienceLogic Was Built for That.

Since its acquisition by Cisco in 2012, Meraki has taken off as one of the most valuable tools for simplifying networking in the cloud era. Organizations using Meraki to install and configure software-defined networking (SDN) and software-defined wide area networking (SD-WAN) devices across their IT estates can attest to the fact.

Availability: A Beginner's Guide

Availability is the amount of time a device, service or other piece of IT infrastructure is usable — or if it’s available at all. Because availability, or system availability, identifies whether a system is operating normally and how effectively it can recover from a crash, attack or some other type of failure, availability is considered one of the most essential metrics in information technology management. It is a constant concern.

What's New in Flowmon 12.3 Release

New release offers exciting new features such as new navigation menu and visual comparison of historical trends. Flowmon ADS 12.2 brings new IDS event visualisation, AI-assisted analysis, Threat Score, Additional insight into Application and platform and more. In this webinar product experts Martin Skoda and Filip Cerny will showcase new workflows and improvements in user experience that Flowmon 12.3 and Flowmon ADS 12.2 brings in live demonstration.

What Is Adaptive Flowspec and Does It Solve the DDoS Problem?

Managing modern networks means taking on the complexity of downtime, config errors, and vulnerabilities that hackers can exploit. Learn how BGP Flow Specification (Flowspec) can help to mitigate DDoS attacks through disseminating traffic flow specification rules throughout a network.

3 Strategies for Becoming a Network Visionary

As businesses become more digitized, IT leaders must consider the ability of the network to transform business operations in every functional area. Many organizations are already doing this today, demonstrating best practices in how the network can be an innovation partner in marketing, service delivery, customer service, and logistics, among many other areas. However, those who rest on their laurels today may find themselves playing catchup in the not-so-distant future.

SD-WAN Performance and User Experience: Gaining Unified Visibility with DX NetOps

As the use of SD-WAN continues to expand, benefits and challenges may seem to be proliferating in equal measure as well. In this post, we look at some of the advantages and obstacles presented by SD-WAN, and we detail how DX NetOps by Broadcom delivers the visibility teams need to monitor and manage their SD-WAN and legacy network environments.

Head Based Sampling using the OTEL Collector

This is part three in a series where I learn OpenTelemetry (OTEL) from scratch. If you haven't yet seen them yet, part 1 is about setting up auto-instrumented tracing for Node.js and part 2 is where I initially implemented the OTEL collector. Today we are going to begin experimenting with sampling. We need to sample traces because we capture so much data! It would be impractical to process and store it all (in most cases).

Stream your Google Cloud logs to Datadog with Dataflow

IT environments can produce billions of log events each day from a variety of hosts and applications. Collecting this data can be costly, often resulting in increased network overhead from processing inefficiencies and inconsistent ingestion during major system events. Google Cloud Dataflow is a serverless, fully managed framework that enables you to automate and autoscale data processing.

Accessible Home Diagnostic Services for Seniors: Tailored Solutions for Well-Being

As we age, health monitoring becomes increasingly important, and for many seniors, the convenience of accessible home diagnostic services can make all the difference. These specialized services are designed with the unique needs of older adults in mind, offering a range of diagnostic assessments, from osteoporosis screening to cognitive health evaluations. In this article, we'll explore the significance of these home diagnostic solutions for seniors and how they contribute to overall well-being.

API Monitoring: A Complete Introduction

At the most basic level, application programming interface (API) monitoring checks to see if API-connected resources are available, working properly and responding to calls. API monitoring has become even more important (and complicated) as more elements are added to the network and the environment evolves, including multiple types of devices, microservices as a key part of application delivery, and, of course, the widespread move to the cloud.

Monitoring vs Observability: What Engineers Need to Know

As systems increasingly shift towards distributed architectures to deliver application services, the roles of monitoring and observability have never been more crucial. Monitoring delivers the situational awareness you need to detect issues, while observability goes a step further, offering the analytical depth to understand the root cause of those issues. Understanding the nuanced differences between monitoring and observability is crucial for anyone responsible for system health and performance.

Grafana 10.2 release: Grafana panel title generator, interactive visualizations, and more

Grafana 10.2 is here! Download Grafana 10.2 As always, the latest version of Grafana includes a ton of dashboard and data visualization improvements. You can add interactive buttons to your Canvas visualizations; auto-generate dashboard panel titles, and descriptions using AI; and zoom in on specific y-axis values in your time series.

Challenge Met: Adopting Intelligent Observability Pipelines

Over the last year or so, the unavoidable topic of overwhelming cost has emerged as the number one issue among today’s observability practitioners. Whether it is in conversations among end users, feedback from customers and prospects, industry chatter or the coverage of experts including Gartner, the issue of massive telemetry data volumes driving unsustainable observability budgets prevails.

Start with Traces, not with Logs: How Honeycomb Helped Massdriver Reduce Alert Fatigue

Massdriver is a cloud operations platform that makes it easier for engineering teams to build, deploy, and scale cloud-native applications. While many companies use this lofty language to make similar promises, Dave Williams, CTO and co-founder at Massdriver, means it. Before Massdriver, Dave worked in product engineering where he was constantly bogged down with DevOps toil. He spent his time doing everything except what he was hired to do: write software.

What is Service Desk and how does it help IT departments?

Skilled IT leaders understand that consistent and sustainable growth comes with a number of significant challenges. Developing IT growth requires a cohesive strategy that aligns perfectly with the engine of that operation: And that is Service Desk, the undisputed main character of this journey.

Kubernetes Unpacked: Driving Enterprise Success with Cloud-Native

Supercharge your production timelines by watching "Kubernetes Unpacked: Driving Enterprise Success with Cloud-Native Architecture." In this video, Andreas Prins, StackState's CEO, investigates why Kubernetes has emerged as the leading OS of the cloud and delves into why businesses worldwide are choosing it as their container orchestrator.

Using the Cribl API - Part 1

Cribl’s interface is Super Neato: Reactive, beautiful, and easy to use. But sometimes you need to access settings and configurations programmatically. The good news is that interactive API docs are baked into your Cribl instance. The better news is that everything that happens in the GUI is making API calls. With your browser’s developer mode, you can easily take a peak behind the curtain to see exactly how the API was called and what the payload looked like.

How to Monitor Internet SLAs (Service-Level Agreements)

Whether you're streaming a movie, conducting a crucial Zoom meeting, or managing a data-intensive enterprise, a reliable Internet connection is non-negotiable. The Internet is expected to be fast, always available, and capable of handling your specific needs. But how can you ensure that your Internet service provider (ISP) is meeting these expectations? The answer lies in Internet Service Level Agreements (or Internet SLAs).

Limitations of API-only testing: Why it shouldn't be your sole testing strategy

A spicy article hit my inbox the other day. It came with a bold claim — “API testing is better than UI testing”. Absolutes like “A is better than B” rarely hold in the software world. “It depends” is the answer to most tech questions for a reason. Let’s compare API and UI testing and discuss why one isn’t better than the other. The frenemies are “just different”, and always will be. And that’s a good thing.

Leveraging Tines and Cribl Search for Security Automation

At Cribl, we have the privilege of helping our customers achieve their strategic data goals by giving them visibility and control over all of their observability data. The reality today is that data is commonly stored across many places. Whether intentional (such as using Cribl Stream to create a security data lake) or unintentional (because of silos and tool sprawl), organizations desire the ability to access and analyze all of this information at any time.

Live Debugging for Critical Systems

Live debugging refers to debugging software while running in production without causing any downtime. It has gained popularity in modern software development practices, which drives many critical systems across businesses and industries. In the context of always-on, cloud-native applications, unearthing severe bugs and fixing them in real time is only possible through live debugging. Therefore, live debugging becomes an integral part of any developer’s skill set.

What is Infrastructure as Code? An Introduction to IaC

Infrastructure as Code, or IaC, is the practice of automatically provisioning and configuring infrastructure using code and scripts. IaC allows developers to automate the creation of environments to generate infrastructure components rather than setting up the necessary systems and devices manually.

10 Best WordPress Plugins for Website Monitoring

WordPress is the content management system that has found its place in the hearts of innumerable web developers and users around the world. You might be shocked to learn that almost 45% of all websites on the Internet are grounded on WordPress. The question is, “How do users monitor their WordPress website and check their critical metrics?” Any site needs to be monitored these days. There are countless online services that offer tools for external website monitoring.

How to Parse JSON With BindPlane

About ObservIQ: observIQ brings clarity and control to our customer's existing observability chaos. How? Through an observability pipeline: a fast, powerful and intuitive orchestration engine built for the modern observability team. Our product is designed to help teams significantly reduce cost, simplify collection, and standardize their observability data.

Frontend vs Backend Performance: Which is Slower?

Kent C Dodds made a claim on Twitter (X) that the “biggest performance problems are probably backend, not frontend related.” Is this true? Some websites have slow backends, for sure. Others have slow frontends. A few unfortunate sites are slow in both. But as of today, right now in 2023, which is the bigger performance problem for most teams, the frontend or the backend? I wanted to explore it with some real data from the web.

Gain the Visibility Needed to Hold Last-Mile ISPs Accountable: How AppNeta Can Help

In relatively short order, the adoption of cloud services and hybrid work models went from exception to ubiquity. This has fundamentally changed the nature of the networks users rely upon—and created an entirely new set of challenges for IT and network operations teams. More than ever, business services and interactions are reliant upon network connectivity that spans a diverse mix of the public internet and third-party networks.

12 Best Practices to Improve Incident Management

Today’s fast-paced digital world can lead to system breakdown and disruptions that strain organizational resources. What truly distinguishes successful organizations is their response when problems occur. Incident management serves this function. At its core, incident management involves teams managing unexpected disruptions quickly with minimal impact to users or business operations. The process is like a safety net that prevents further problems from developing into trust issues.

Optimize your infrastructure with CloudNatix and Datadog

CloudNatix is an infrastructure monitoring and optimization platform for VMs, containers, and other cloud resources. Customers can use CloudNatix’s Autopilot feature to automatically configure and run infrastructure optimization workflows that allocate and run their resources more efficiently. CloudNatix can take action to auto-size Kubernetes and VM workloads, defragment Kubernetes clusters, and create harvest pods from unused VMs, among other key optimizations.

Risks for Health IT Running Outdated Citrix Receiver / Workspace App

Over the past couple of years, I’ve carried out several audits on healthcare customers’ Citrix Virtual Apps and Desktops or DaaS deployments, and one of the checks that consistently stands out is the use of older Citrix Receiver and Workspace app versions connecting to the environment.

Transforming Observability with Elastic AI Assistant: A Proactive, AI-Driven Approach

Discover how Elastic AI Assistant is transforming the world of observability by offering proactive, AI-powered insights that help SRE teams manage complex systems more efficiently. Say goodbye to manual, reactive processes and hello to a proactive, AI-driven approach with the Elastic AI Assistant for Observability.

Sept 13, 2023: SF Python Meetup - API Documentation: How Sentry Designed Custom Tooling

On September 13, 2023, Sentry hosted SF Python for a developer meetup in San Francisco. In this talk, Josh Ferge, Senior Software Engineer at Sentry, shared his experiences and insights on Sentry's journey of API documentation for their Django application. He talked about the various things they’ve tried, including: Schema / Example generation using dynamic tests; Writing OpenAPI JSON manually; Django Rest Framework & autodoc tooling around it; Problems with DRF serializers & performance, leading to Sentry custom implementation of schema generation using Python typing.

A Proactive Approach to Network Performance Management

In our digitally-driven world, networks serve as the backbone of every organization's operations. Whether you're a business, a healthcare facility, an educational institution, or a government agency, the efficiency and reliability of your network infrastructure can make or break your daily operations.

How to deploy a Hello World web app with Elastic Observability on Azure Container Apps

Elastic Observability is the optimal tool to provide visibility into your running web apps. Microsoft Azure Container Apps is a fully managed environment that enables you to run containerized applications on a serverless platform so that your applications scale up and down. This allows you to accomplish the dual objective of serving every customer’s need for availability while meeting your needs to do so as efficiently as possible.

What Are The Benefits Of Investing In Tracking Apps?

In the times that we live in, owning a fleet of vehicles for your business is commonplace. From small businesses to large corporations, the use of fleets has become an essential part of running operations smoothly. However, managing and tracking these fleets can be a daunting task without the help of technology. This is where fleet tracking apps come into play. These apps are designed to track and monitor your vehicle's location, speed, and other important data in real-time. But what are the benefits of investing in fleet-tracking apps? Let's explore.

Getting Started with Kubernetes | Start learning Kubernetes in 2023

Ready to dive into the world of Kubernetes? Join us in this beginner-friendly tutorial where we break down Kubernetes infrastructure, explore its fundamental components, and understand how it all fits together at a high level. Whether you're a developer, sysadmin, or just curious about Kubernetes, this video has you covered.

Cribl Stream Demo with Max Weber

Join Cribl's Ed Bailey and Max Weber, Senior Detection Engineer, for a fun discussion about the challenges of detection engineering and how Max is solving these problems every day. We will discuss the current state of detection engineering, why data engineering is a prerequisite for better detection engineering, and what Max would like to see to help drive better outcomes. Max will demo Cribl Stream and show how his data engineering skills drive better detections.

Improving Trace Data For Azure Virtual Desktop

The Microsoft infrastructure makes collecting network traces more complicated. Network traces (tracert) inside and out of the Azure Virtual Desktop virtual machine are valuable when diagnosing support issues or when an end-user calls up complaining of poor Azure Virtual Desktop performance. The following information should help improve the collection of network traces and trace data, which will aid in diagnosing Virtual Desktop Infrastructure connectivity.

How to Monitor MySQL Using OpenTelemetry

MySQL is the trusted open-source database management system for many desktop, mobile, web, and cloud applications. Monitoring the performance of MySQL is critical but as the applications expand over multi-cloud, cloud-native, and hybrid cloud, monitoring also grows in complexity. Continuous monitoring and scaling help applications take advantage of MySQL’s capabilities such as reliability, security, flexibility, availability, and performance scalability.

Datadog Pricing - Beware These Surprises in 2023

Datadog has a huge product footprint with a sophisticated user experience, but any discussion of its usefulness must include a consideration of its significant costs. Datadog pricing is complex and has a lot of SKUs that a customer needs to understand. If you're not careful, you might end up blowing your Datadog bill. It’s likely that your business isn’t at the scale that it will generate a $65 million bill, but it is possible to generate bills that rival your operations bills.

Understanding Request Latency with Profiling

It can be hard to figure out why response times are high in Java applications. In my experience, when engineers investigate this type of issue, they typically use one of two methods: They either apply a process of elimination to find a recent commit that might have caused the problem, or they use profiles of the system to look for the cause of value changes in relevant metrics.

How Flexcity used Grafana Cloud to help balance the national power grid in France

Last winter, Flexcity — a market leader in electric flexibility — faced an unprecedented challenge: Help stabilize the French national power grid, in the midst of a widespread energy crisis that loomed over Europe. As a byproduct of the Russian invasion of Ukraine, energy prices in the EU soared in 2022. And France, meanwhile, faced a nuclear power outage that winter that threatened to significantly disrupt its energy supply and increase the risk of electricity shortages.

Choosing Azure Database Services - What are the options?

Microsoft Azure offers a choice of relational and non-relational database services to support a wide range of application needs and demands. Built-in intelligence helps automate management tasks like high availability, scaling, and query performance tuning to provide users with services that ensure applications are always available and performant. Many services offer essentially limitless database scale and SLAs (Service Level Agreements) usually range between 99.9-99.999% availability.

How To Profile and Optimize Telemetry Data: A Deep Dive

We recently had the privilege of presenting our telemetry data pipelining platform at Cloud Field Day. Today, we'd like to share a recap of our demo with you. In this demo, we explore the transformative potential of data profiling, telemetry pipeline optimization, and incident response. Foundationally, we follow an Understand, Optimize, and Respond workflow.

ServiceNow Integration

In this video, we'll be covering the benefits of the ServiceNow integration. We'll also go through the configuration process allowing you to follow along and quickly configure this integration. The integration will allow you to automatically create and resolve incidents based off of alarms triggered in Exoprise. With incidents being sent to ServiceNow, you can begin automatically assigning incidents to the correct teams based off of criteria or automatically escalating issues that have not been addressed yet.

Common Nagios Errors and What to Do about Them

Nagios is an open-source monitoring system that has become indispensable for system administrators and DevOps teams across the world. However, like any other software, you’re bound to come across errors with Nagios. In this article, we’re going to take a look at some common errors and how to solve them, along with the pros and cons of Nagios, and why MetricFire is the perfect alternative for monitoring.

SigNoz + Tracetest: OpenTelemetry-Native Observability Meets Testing

What is the hidden potential of OpenTelemetry? It goes a lot further than the (awesome) application of tracing and monitoring your software. The OpenTelemetry project is an attempt to standardize how performance is reported and how trace data is passed around your microservice architecture. This context propagation is a superpower for those who adopt OpenTelemetry tracing.

Coffee Talk with SURGe: The Interview Series featuring Michael Rodriguez

Join Mick Baccio and special guest Michael Rodriguez, Principal Strategic Consultant for Google Public Sector, for a conversation about Michael’s career path into cybersecurity, the origin of his nickname “Duckie,” and his work as a cybersecurity subject matter expert for Google Space.

OpenTelemetry Logs - A Complete Introduction & Implementation

OpenTelemetry is a Cloud Native Computing Foundation(CNCF) incubating project aimed at standardizing the way we instrument applications for generating telemetry data(logs, metrics, and traces). OpenTelemetry aims to provide a vendor-agnostic observability framework that provides a set of tools, APIs, and SDKs to instrument applications.

See How I&O Leaders Can Monitor Strategy for Their Organizations' Specific Needs

Organizations must always be ready to pivot on a dime and adjust their business goals when the market—or their customers—demand it. Whether driven by industry changes or developments in market trends, when goals shift at the top, the teams who execute against them must follow suit. Since network infrastructure is becoming increasingly complex to fit business needs, IT teams are part of these initiatives.

Why Can't Network Teams Have Nice Things?

Let me tell you something you already know: Networks are more complex than ever. They are massive. They are confounding. Modern networks are obtuse superorganisms of switches, routers, containers, and overlays; a hodgepodge of telemetry from AWS, Azure, GCP, OCI, and sprawling infrastructure that spans more than a dozen timezones.

Monitor and optimize your modern, AI-powered applications with Cisco AppDynamics

Learn how Cisco AppDynamics OpenAI API monitoring provides comprehensive insights that enable application owners and operations to optimize cost and monitor performance of OpenAI integrations. The rapid advancement of generative artificial intelligence (GenAI) has reshaped various industries and transformed the way we interact with technology. Companies across diverse sectors have fully embraced the power of GenAI to such an extent that it is now an integral part of the digital experience.

Turbo-charging AI Ops with the Elastic Observability AI Assistant: ElasticON AI

Elastic Observability experts Bahubali Shetti and Gagan Singh take a deep dive into how the Elastic Observability AI Assistant can help you get deeper contextual insights into telemetry, troubleshoot issues more effectively, reduce time to resolution, and streamline operations.

Introducing Lumigo Webhook Alerts

Webhooks, those wonderful little lifelines connecting one application to another, have become an essential part of our app notification world. They help keep your systems in the loop, notifying them immediately when events of interest occur. This real-time communication ensures that your applications remain responsive, adaptive, and always up-to-date with the latest information.

Why Real-Time Debugging Becomes Essential in Platform Engineering

Platform engineering has been one of the hottest keywords in the software community in recent years. As a natural extension of DevOps and the shift-left mentality it fosters, platform engineering is a subfield within software engineering that focuses on building and maintaining tools, workflows, and frameworks that allow developers to build and test their applications efficiently.

A detailed guide on Azure Reservations

Organizations that invest in cloud technologies like Microsoft Azure might notice that cloud costs can easily get out of control. When cloud services use the Pay-as-you-go payment model, small amounts must be paid each time a cloud service is used. However, when you have deployed hundreds of cloud resources, the total spending can end up with a much higher monthly bill than expected. By optimizing your cloud costs, you lower your monthly Azure bill and gain cost efficiency and predictability.

Kubernetes cost optimization: tips for a more efficient operation

Kubernetes offers unparalleled flexibility and scalability for containerized orchestration. However, this dynamism can also lead to unexpected costs if you don’t efficiently manage your corresponding cloud resources. In this blog, we’ll outline a series of best practices for Kubernetes cost optimization that will help you keep your infrastructure running smoothly while staying within your budget.

New in Grafana k6: The latest OSS features in v0.47.0 and more efficient performance testing in Grafana Cloud k6

Grafana k6 v0.47.0 has been released, featuring gRPC’s binary metadata support, new authentication methods, and tons of other improvements for Grafana k6 OSS. Here’s a quick overview of the latest features in Grafana k6 v0.47.0, as well as some other exciting updates related to Grafana Cloud k6 and the k6 community.

Network Infrastructure Monitoring: Getting Started

The rapid evolution of technology profoundly impacts network infrastructure monitoring. New technologies such as containerization, microservices, and serverless computing introduce complexities that require monitoring solutions to adapt. The shift to DevOps practices, where development and operations teams collaborate closely, emphasizes the need for real-time monitoring and feedback loops to ensure continuous integration and delivery of applications and services.

Viewing Audit Logs in BPOP Enterprise

About ObservIQ: observIQ brings clarity and control to our customer's existing observability chaos. How? Through an observability pipeline: a fast, powerful and intuitive orchestration engine built for the modern observability team. Our product is designed to help teams significantly reduce cost, simplify collection, and standardize their observability data.

4 Ways to Reduce Your Mean Time to Resolution

Dealing with a high MTTR in your network? Auvik Network Management is a comprehensive network monitoring and troubleshooting solution. With over 50 pre-configured alerts, it keeps you informed about critical network events. Users have the flexibility to customize these alerts and control notification frequency so that they have all the essential context to be able to fix issues.

[Webinar] Optimize your critical workflows by managing multi-step synthetic transactions

Are your critical workflows performing well? Or, are issues hampering your site’s user experience? Watch our webinar to learn how important it is to track the performance of your workflows and how Site24x7’s synthetic monitoring tool can help you offer a perfect user experience. The webinar also touches upon the various metrics that will help you in debugging the performance issues in your critical workflows or transactions.

13+ Best Kubernetes Monitoring Tools: Free, Open Source & Paid [2023 Comparison]

While Kubernetes revolutionized distributed orchestration, it also added complexity to logging and monitoring. To keep up with the challenges of working with Kubernetes clusters, you need to adapt your monitoring strategy. This includes changing the tools you use. To help keep your Kubernetes environment healthy, we made a list of the best Kubernetes monitoring tools. This list includes both open-source and commercial.

Unleash optimal IT network performance with OpManager's Windows service monitoring capabilities

Windows services are the unsung heroes of Windows machines. This is because they act as critical components of the Windows operating system that run in the background to keep your computer running smoothly and securely. They are responsible for a wide array of tasks, including system startup and shutdown, security, performance, and application support.

Jaeger vs Prometheus - Side by Side Comparison [Updated for 2023]

Both Jaeger and Prometheus are popular open-source application performance monitoring tools. While Jaeger is an end-to-end distributed tracing tool, Prometheus is used as a time-series database for monitoring metrics. Let's dive in to explore their key features and differences. Application performance monitoring is the key to keep your system's health in check. In today's digital economy, no business can afford to have failed or delayed completion of user requests.

OpenTelemetry vs. OpenTracing - Decoding the Future of Telemetry Data

OpenTelemetry and OpenTracing are open-source projects used to instrument application code for generating telemetry data. While OpenTelemetry can help you generate logs, metrics, and traces, OpenTracing focuses on generating traces for distributed applications. If you’re thinking of choosing between OpenTelemetry and OpenTracing, go for OpenTelemetry. OpenTracing is now deprecated, and users of OpenTracing are advised to migrate to OpenTelemetry.

OpenTelemetry MongoDB | Monitor and visualize your MongoDB database calls

OpenTelemetry libraries can be used to monitor MongoDB interactions. In this tutorial, we will learn how we can monitor MongoDB with OpenTelemetry libraries to analyze query execution and identify performance bottlenecks. Most modern applications have distributed architecture thanks to cloud and containerization. In cloud-native applications, it is necessary to track user requests across services and components like databases.

OpenTelemetry UI - See What's Possible With OpenTelemetry data

OpenTelemetry is a Cloud Native Computing Foundation(CNCF) project aimed at standardizing the way we instrument applications for generating telemetry data(logs, metrics, and traces). However, OpenTelemetry does not provide storage and visualization for the collected telemetry data. For visualizing OpenTelemetry data, you need an OpenTelemetry UI. The data collected by OpenTelemetry can be sent to a backend of your choice, which can then be visualized.

8 Steps to Create a Successful Cloud Strategy

In today’s digital world, performance and agility are essential for running a successful business, and in the technology arena, there is no exception. Over the past few years, there has been a significant rise in the use of cloud computing technology as more companies prefer scalability of resources, flexibility in experimenting with new technologies, reducing cost by eliminating the need to invest in on-premises hardware, etc.

What is remote access? An open door to productivity and flexibility

What is remote access and how has it transformed work dynamics around the world? Let’s dive in, explore and discover together how this innovative practice has reshaped conventional work structures and opened up a whole range of possibilities!

Top 9 Port Monitoring Tools of 2023

We’ve compiled a list of 9 top-notch port monitoring tools that will help you stay ahead without breaking the bank. So strap in as we explore the world of ports through these beneficial applications and tell you about the best options on the market today! In this guide, you’ll learn… Looking for more amazing monitoring tools? Check out our post about the top 7 ping monitoring tools in 2023.

Your Guide to Prometheus Observability

Imagine you’re piloting a spaceship through the cosmos, embarking on a thrilling journey to explore the far reaches of the universe. As the captain of this ship, you need a dashboard that displays critical information about your vessel, such as fuel levels, navigation data, and life support systems. This dashboard is your lifeline, providing you with real-time insights about the health and performance of various systems within your ship, so you can quickly make critical decisions.

Anomaly Detection for Time Series Data: An Introduction

Welcome to the handbook on Anomaly Detection for Time Series Data! This series of blog posts aims to provide an in-depth look into the fundamentals of anomaly detection and root cause analysis. It will also address the challenges posed by the time-series characteristics of the data and demystify technical jargon by breaking it down into easily understandable language. This blog post (Chapter 1) is focused on.

The Advantage of Cold Storage in InfluxDB

Imagine, if you will, having hundreds of devices that you need to monitor. All these devices generate data at sub-second intervals, and you need all that high fidelity data for historical analysis to feed machine learning models. Storing all that data can get really expensive, really fast. When that happens, you must decide what’s more important: keeping all your data or sacrificing insights and analysis. It may not be a big stretch of the imagination for many readers.

Quix Community Plugins for InfluxDB: Build Your Own Streaming Task Engine

With our plans for InfluxDB 3.0 OSS laid out, both myself and the rest of the DevRel team have been actively searching for ecosystem platforms that would be logical integrations for the future of InfluxDB. One of these platforms is Quix! Quix is a comprehensive solution tailored for crafting, launching, and overseeing event streaming applications using Python. If you’re looking to sift through time series or event data in real-time for instant decision-making, Quix is your go-to.

Troubleshooting Cloud Native Applications at Runtime

Organizations are moving to micro-services and container-based architectures because these modern environments enable speed, efficiency, availability, and the power to innovate and scale more quickly. However, when it comes to troubleshooting distributed cloud native applications, teams face a unique set of challenges due to the dynamic and decentralized nature of these systems.

Use embedded AI to find performance problems

The root cause of a performance transaction can be complex to troubleshoot, but it does not have to be. By using AppDynamics’s built-in machine learning capabilities, we can quickly identify Health Rule violations triggered by transaction response times deviating from their baseline and then combine those with diagnostic capabilities that get us to the specific cause. We are able to drill down into the relevant snapshots to see which method and specific line of code is to blame.

Latest breakthroughs in vector search for Elasticsearch and Lucene: ElasticON AI

Elastic experts Jim Ferenzi and Ben Trent discuss key Elasticsearch and Lucene improvements — including intuitive vector search support, multi-threading, RRF, and hybrid search with filtering and doc-level security. Plus, hear what they are working on next! Additional resources.

How to easily retrieve values from a range in Grafana using a stat panel

Grafana is an open source visualization and monitoring solution for correlating and analyzing data from various sources. From time series graphs to heatmaps to 3D charts, it gives you lots of ways to untangle complex datasets. And while that’s incredibly powerful for observability, sometimes you’re looking for something fairly straightforward.

How AppNeta Passive Monitoring and Deep Packet Inspection Speeds Troubleshooting

In recent months, we’ve talked a lot about how AppNeta by Broadcom offers active monitoring capabilities, and how they enable teams to rapidly troubleshoot issues across both internally managed networks and those managed by third parties, such as ISPs and cloud providers.

How to Parse with Regex in BindPlane

About ObservIQ: observIQ brings clarity and control to our customer's existing observability chaos. How? Through an observability pipeline: a fast, powerful and intuitive orchestration engine built for the modern observability team. Our product is designed to help teams significantly reduce cost, simplify collection, and standardize their observability data.

DEX Show Live from London: The Future Workplace

Tim and Tom hosted a very special DEX Show episode LIVE at an incredible, standing-room-only Experience Everywhere event in London. They were joined for the occasion by two amazing guests - Darren Wright (Vice President, Information Technology, Honeywell) and former F1 driver and entreupreneur Robert Doornbos. It was an amazing conversation in front of a huge crowd – touching upon data, Formula 1, adaptation, technolgy, employee engagement and much more.

Datadog on Kubernetes Node Management

Datadog, the observability platform used by thousands of companies, runs on dozens of self-managed Kubernetes clusters in a multi-cloud environment, adding up to tens of thousands of nodes, or hundreds of thousands of pods. This infrastructure is used by a wide variety of engineering teams at Datadog, with different feature and capacity needs.

Internet Sonar: A Game-Changer for Incident Detection

When outages cost you tens of thousands of dollars each minute, pinpointing the source of disruptions as quickly as possible becomes mission-critical. This is not a time for finger-pointing and hastily assembled war rooms searching for that needle in the haystack. You need simple, intelligent, trustworthy Internet health information to expedite your incident detection.

SASE Monitoring: How to Monitor & Optimize A SASE Architecture

We've all heard the buzz about SASE (Secure Access Service Edge) and how it's revolutionizing the way we handle network security and connectivity. But let's face it, keeping your SASE architecture running like a well-oiled machine isn't a walk in the park. It's more like a continuous sprint, with multiple moving parts, countless devices, and the never-ending quest for optimal performance. Today, we're delving into the art and science of SASE monitoring.

Top 9 Port Monitoring Tools of 2024

We’ve compiled a list of 9 top-notch port monitoring tools that will help you stay ahead without breaking the bank. So strap in as we explore the world of ports through these beneficial applications and tell you about the best options on the market today! In this guide, you’ll learn… Looking for more amazing monitoring tools? Check out our post about the top 7 ping monitoring tools in 2024.

Enhancing Microsoft Intune With Digital Experience Monitoring

Organizations today rely on mobile device management (MDM) solutions to secure and manage corporate resources across a diverse range of devices. One such platform is Microsoft Intune, which offers security, patch, and access control features for mobile and desktop devices. Today, we look at Microsoft Intune and how Exoprise Service Watch helps customers ensure Intune is working well.

Our key takeaways from the 2023 Gartner Market Guide for Unified Endpoint Management Tools

The 2023 Gartner Market Guide for Unified Endpoint Management Tools is here! This year’s report has been highly anticipated since Gartner retired the Magic Quadrant™ for unified endpoint management (UEM) tools last year. Organizations and IT professionals can use this research to understand the core capabilities to look out for when selecting a UEM solution, plus they’ll get insights into the future of the UEM market so they can align their investments accordingly.

Maximize Sales with Uptime.com this Holiday Season

Uptime checker and website monitoring tools are vital to ensure carts and CTAs are working effectively for Black Friday & Christmas sales. Black Friday online sales reached almost $10 billion last year, and getting a share of that action can make or break a retailer’s year. With that kickoff to the holiday shopping season fast approaching, businesses must be ready with website security and online monitoring to ensure their online infrastructure is secure.

Optimize cloud costs while delivering flawless digital experiences

Learn how new modern application optimization modules built on the Cisco Full-Stack Observability Platform can help you solve cloud cost and resource optimization challenges for your Kubernetes workloads. Controlling unpredictable cloud cost and resource utilization has become critical for organizations to ensure the profitability of modern workloads.

Saved Filter Notifications

Alerts and notifications have been part of TrackJS since the very beginning. Our standard notification options reflect our desire to keep things simple. Over time though, our customers have asked to customize their alerts and fine tune them to specific scenarios. To support that use case, we’re releasing a new kind of notification we’re calling “Saved Filter Notifications”.

Why is API Testing Important and How to do it Right

Whether you're a business that relies on Amazon reviews and seller feedback or planning a much-needed holiday through your favorite travel comparison site, everyday activities like these wouldn't be impossible without APIs. An integral part of app development, API technology is interwoven into a rich tapestry of popular applications that companies and consumers use daily. Without them, there would be no smartphones, social media, or instant messaging.

Query Smarter, Not Harder

We're excited to share an update to our Analyze package—introducing the RQL AI Assistant, a natural language AI assistant to help you write your RQL queries. If you've ever been frustrated by the complexity of Rollbar Query Language (RQL) or the time it takes to get your data, this feature is the solution you've been waiting for. We understand working with the RQL has been a steep learning curve for many.

Adding automation to monitoring: Azure troubleshooting simplified

The transition from traditional on-premises IT infrastructure to the public cloud has brought substantial relief to IT decision-makers and sysadmins. Since many organizations use Microsoft Windows as their preferred operating system, Microsoft Azure has become the public cloud provider of choice automatically owing to a familiar GUI and Active Directory sync.

Reproducing and testing distributed system failures with xk6-disruptor

Distributed systems, such as modern microservices-based applications, are highly scalable, but also highly complex. Dependencies and unexpected interactions between services are a common cause of incidents, and these incidents are also notoriously hard to test for. xk6-disruptor — an extension that adds fault injection capabilities to Grafana k6, the open source reliability and load testing tool — can help overcome these challenges.

Microsoft Teams Issues: How to Pinpoint & Highlight Performance Problems

See the connection between your car's warnings and IT alerts? Just as your car warns you of problems, monitoring tools do the same for your business. But what about services you don't own, like Microsoft Teams? Discover how Vantage DX transforms Teams calls and meetings into monitoring insights, helping IT teams automatically detect and prevent issues, boosting user satisfaction and productivity.

Breaking Through the Observability Wall: Scaling Your Telemetry Architecture

In today's digital landscape, Observability and telemetry data play a crucial role in ensuring the performance, reliability, and security of modern applications and services. However, as data volumes explode due to the proliferation of micro-services, cloud-based applications, and connected devices, existing architectures are hitting a scalability wall.

Introducing Mezmo Edge

Mezmo Edge enables users to deploy telemetry pipelines and process data in their own environment. A significant advancement in Mezmo’s capabilities, Edge is especially useful when working with sensitive medical or financial records. Organizations that need to comply with PCI, GDPR, or CCPA or that generally work with PII will benefit from Edge’s secure approach to data protection. Edge also provides the telemetry data optimization benefits of a pipeline without cloud data egress charges.

Home Assistant Hardware: Requirements and Recommendations

With the smart home revolution in full swing, choosing the proper hardware for platforms like Home Assistant can be overwhelming. Whether you’re new to home automation or a seasoned pro, the hardware you select can make or break your experience. But fear not! This comprehensive guide will demystify the requirements, delve into the various options, and help you make an informed decision. From the compact Raspberry Pi to the powerful Intel NUC, we’ve got you covered.

Industry Cloud Platforms, Explained

Cloud computing changed the way enterprise IT works. Investments in public technologies are forecasted to grow by 21.7% to reach the $600 billion mark by the end of this year. The trend is driven by two major factors: Business organizations view these capabilities as an imperative for digital transformation — especially the domain-specific IT services that solve problems unique to their industry verticals.

Maturity Models for IT & Technology

Setting meaningful goals for your technology investment decisions requires an understanding of your requirements. Primarily, that’s… Measuring your IT maturity is one way to advance your IT performance — in a way that aligns with your organizational goals and minimizes the risk of failure. You can compare your current situation to a group of peers or competitors and also to industry benchmarks. Let’s take a look.

Partner Watch: CI/CD Build Systems for Embedded Development

To excel in embedded development in 2023, it is essential to have a solid understanding of build systems, continuous integration, and deployment strategies. This workshop by Percepio training partner Jacob Beningo aims to provide a comprehensive primer on these practices, equipping participants with the knowledge and skills necessary to tackle complex firmware projects with confidence.

What is Prometheus Alertmanager?

Prometheus Alertmanager is a powerful tool designed to handle various alerts generated by Prometheus. It plays a vital role in the overall monitoring ecosystem, acting as a centralized hub for managing alert notifications. With Prometheus Alertmanager and its robust notification management capabilities, you can efficiently define alert routing and notification policies. This empowers you to take timely actions and mitigate potential issues before they impact your service availability.

Unpacking the Hype: Navigating the Complexities of Advanced Data Analytics in Cybersecurity

The cybersecurity industry is experiencing an explosion of innovative tools designed to tackle complex security challenges. However, the hype surrounding these tools has outpaced their actual capabilities, leading many teams to struggle with complexity and extracting value from their investment. In this conversation with Optiv‘s Randy Lariar, we explore the potential and dangers of bringing advanced data analytics and artificial intelligence tools to the cybersecurity space.

Monitoring Policies: Enabling Scalable, Hands-Free Monitoring

AppNeta by Broadcom will soon offer monitoring policies that streamline monitoring setup and maintenance. Now available for preview, these capabilities will significantly reduce the time and effort required for ongoing operations, especially for customers with large-scale and dynamic sets of monitoring points.

3 Ways to Build a Network That Supports Business Innovation

In recent years, it has often been said that the network is a “utility” much like heat or hot water. It’s a necessity to the needs of daily life for much of the world’s population. In fact, the United Nations has called for universal internet access to be accepted as a basic human right by 2030. But as the network becomes more ubiquitous, there’s risk of it being taken for granted. Organizations that do this, however, will be putting themselves at a disadvantage.

Monitoring CPU Temperature with Hosted Graphite

Monitoring CPU temperature is crucial for ensuring the smooth and efficient functioning of computer systems. As processors become more powerful, they generate more heat, which can lead to performance issues, system instability, and even hardware damage. Overheating is a common problem faced by many computer users, especially those who engage in resource-intensive tasks like gaming or running complex software.

Grafana and Graphite Best Practices

Efficient monitoring and visualization of performance metrics are paramount for ensuring seamless user experiences and reliable system operations. Grafana and Graphite, two powerful open-source tools, form an unbeatable combination when it comes to monitoring and analyzing time-series data. Grafana provides a robust and flexible platform for visualizing data, while Graphite acts as a scalable and efficient backend for storing and retrieving metric data.

Network Utilization Monitoring: How to Check Network Usage

Whether you're a small startup or a large enterprise, the ability to access and share information quickly and reliably is essential for productivity, customer satisfaction, and overall business performance. However, as networks become increasingly complex and interconnected, it's crucial for businesses to gain deep insights into their network utilization.

SanFrancisco OpenTelemetry Meetup - Pranay from SigNoz on choosing Otel from Day 1

More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator. SigNoz helps developers monitor applications and troubleshoot problems in their deployed applications. SigNoz uses distributed tracing to gain visibility into your software stack.

VictoriaMetrics Long-Term Support (LTS): Current State

We release VictoriaMetrics several times a month, including at least one major update. However, because these new releases often introduce new features, they may be less stable. That’s why we also regularly publish Long-term support releases (LTS) alongside our regular releases. These LTS versions focus exclusively on bug fixes without new features and performance improvements. We committed to publishing LTS versions every six months and supporting them for one year.

Traceroute Troubleshooting: How To Identify Network Issues with Traceroutes?

Whether you're a seasoned network administrator or a curious tech enthusiast, the ability to diagnose and resolve network issues is an essential skill. When it comes to troubleshooting network problems, few tools are as powerful and revealing as the humble traceroute. In this blog post, we'll delve into the world of traceroute troubleshooting, uncovering its inner workings and demonstrating how it can be your secret weapon in identifying and resolving network issues.

On-premises infrastructure management in the age of the cloud

The cloud has revolutionized the possibilities of managing IT infrastructure. However, not all organizations are ready to make the move to the cloud. In this blog, we will discuss why on-premises infrastructure management solutions are still relevant in the cloud age.

Azure API Management monitoring

Azure API Management is a Microsoft Azure cloud-based solution that helps businesses effortlessly create, publish, secure, and analyze APIs (Application Programming Interface). APIs are the building blocks of any business and play an essential role in data exchange. Azure API Management monitoring is vital to enable the business to function seamlessly. It helps early problem detection, resource optimization, and data-driven decision-making to increase the quality of the API ecosystem.

Field Data vs Doc Values | Understanding Elasticsearch Performance Issues

🚀 Dive into the world of Elasticsearch performance with our expert at Sematext! In this insightful conference talk, we explore the crucial differences between Doc Values and Field Data, shedding light on the best practices for optimizing your Elasticsearch clusters. Discover how the choice between Doc Values and Field Data can significantly impact your Elasticsearch queries, indexing, and overall system efficiency. Gain the knowledge and insights to supercharge your Elasticsearch deployments.

Introduction to Grafana Plugins

Grafana is a powerful open-source platform for monitoring and observability, but what truly makes it shine are its plugins. For technology engineers looking to expand Grafana's capabilities, plugins are the way to go. In this post, we'll dive into the world of Grafana plugins and offer some unique tips to get the most out of them.

Unlocking Observability - Dive into OpenTelemetry's Top Use Cases

OpenTelemetry can be used for generating and collecting telemetry signals like logs, metrics, and traces. The advantage of using OpenTelemetry for observability is that it is open-source and frees you from vendor lock-in. You can use OpenTelemetry for multiple use cases OpenTelemetry is an open-source project which has emerged as the standard for achieving comprehensive observability in modern applications.

Monitoring systemd logs with Netdata using the systemd journal Function

The systemd journal plugin by Netdata makes viewing, exploring and analyzing systemd journal logs simple and efficient. It automatically discovers available journal sources, allows advanced filtering, offers interactive visual representations and supports exploring the logs of both individual servers and the logs on infrastructure wide journal centralization servers.

Streamlining Kubernetes Operations with Enterprise Workload Automation

Kubernetes integrations are now available for AutoSys, dSeries, and Automic Automation. It wasn’t that long ago that teams in many organizations started dipping their toes into the world of containers and microservices. It didn’t take long for this approach to application development and orchestration to take hold, and for Kubernetes to emerge as a dominant, broadly used technology.

Troubleshooting Azure Virtual Desktop (AVD) Sessions - Key User Experience and Graphics Metrics to Monitor

For Azure Virtual Desktop (AVD) sessions, Microsoft exposes a set of user experience and graphics performance counters that eG Enterprise monitors out-of-the-box. These performance counters for Azure Virtual Desktop and Remote Desktop Protocol (RDP) / RemoteFX sessions can be used to troubleshoot AVD problems.

Coralogix vs Grafana Cloud: Pricing, Features and More

While Grafana is one of the better known names in the industry, Coralogix offers a full-stack observability platform. Despite the popularity of the Grafana brand, the cloud based solution lacks in some key areas. This article will go over the differences between Coralogix and Grafana Cloud, from features, customer support, pricing and more.

The Power of Data Correlation: Troubleshooting Made Easy

As software engineers, we all know that troubleshooting often involves sifting through heaps of data points — scanning metrics, reading logs, checking resource status and analyzing events. We manually connect the dots, and if we're experienced enough, we might spot an issue that's about to become a problem. At StackState, we've faced these same challenges.

9 ChatOps tips your team should adopt today

Pandora FMS is an excellent monitoring system that helps collect data, detect anomalies, and monitor devices, infrastructures, applications, and business processes. However, more than monitoring alone is needed to manage the entire incident lifecycle. ilert complements Pandora FMS by adding alerting and incident management capabilities. While Pandora FMS detects anomalies, ilert ensures that the right people are notified and can take action quickly.

How Nexthink Helps You See and Diagnose Issues at Scale

Digital Employee Experience is vital to attract and keep the best employees. A solid DEX strategy also increases employee productivity and motivation, a selling point for business leadership across the board. Yet, IT teams have less and less ability to control the factors that make up the digital employee experience. Although IT still own the devices, most of the network and applications are managed by third-party vendors.

Can You Use the ELK Stack as a SIEM? A Fresh Take

A SIEM system (Security Information and Event Management) is often used by security operations centers (SOCs) for real-time detection of suspicious activity and security events. While some teams choose to adopt a purpose-built SIEM, others rely on the same DevOps tools they are already using for tasks like troubleshooting and operational log data analysis.

Elasticsearch to OpenSearch Migration Facilitated by Sematext Cloud

OK, so you’ve decided to move from Elasticsearch to OpenSearch. Maybe our comparison helped you decide and maybe you’ve checked our guide on how to perform the migration. But how do you know if your new OpenSearch performs as well and functions as correctly as the existing Elasticsearch? Even when comparing old with new versions, upgrades don’t always translate into better performance.

Is a $1 million Datadog bill worth it?

In a recent reddit thread, I got into a conversation about justifying the cost of observability. It got to a really basic question about running a tech company: how do you know that any cost is justified? While a small number of expenses have clear and direct business values, a bunch of other costs, I would even say most costs, just aren’t that clear cut.

Staying Ahead of Threats with Continuous Security Monitoring Tools for DevOps

According to the latest Crowdstrike report, in 2022 cloud-based exploitation increased by 95%, and there was an average eCrime breakout time of 84 minutes. Just as significantly, in 2021, the Biden administration passed an executive order to improve the nation’s cybersecurity standards. There are also upcoming laws like DORA in the European Union. So, increased cyber attacks and legislative pressures mean you need to (a) actively protect against threats and (b) prove that you are doing so.

What Is Continuous Security Monitoring Software?

Many DevOps teams work proactively to meet security and compliance standards. They consider security best practices when developing software with open source components, scanning code for vulnerabilities, deploying changes, and maintaining applications and infrastructure. Security is a key feature of many of the tools they’re using, and the policies and industry standards they’re following.

Introducing Item Snooze

We are introducing a new Snooze option for items. When Snoozing an item, the user will define how long an item will stop sending notifications for - once that time period expires then the item will return to normal and begin sending notifications again. Currently, setting an Item to have a status of Muted prevents notifications from being sent until somebody changes the status back to Active.

Migrating 1 billion log lines from OpenSearch to Elasticsearch

What are the current options to migrate from OpenSearch to Elasticsearch®? OpenSearch is a fork of Elasticsearch 7.10 that has diverged quite a bit from itself lately, resulting in a different set of features and also different performance, as this benchmark shows (hint: it’s currently much slower than Elasticsearch).

Why Cloud Unit Economics Matter

In our first blog post, we introduced the concept of cloud unit economics—a system to measure cost and usage metrics. It helps maximize cloud value for better outcomes per dollar spent. We reviewed what cloud unit economics is, why it’s crucial to FinOps success, and how it enables organizations to unlock the full business value potential of cloud computing.

An Overview of the Essential Observability Metrics

Metrics are closely associated with cloud infrastructure monitoring or application performance monitoring – we monitor metrics like infrastructure CPU and request latency to understand how our services are responding to changes in the system, which is a good way to surface new production issues. As many teams transition to observability, collecting metric data isn’t enough.

Predictive Maintenance: A Brief Introduction

Predictive maintenance is a maintenance strategy that uses machine learning algorithms trained with Industrial Internet of Things (IIoT) data to make predictions about future outcomes, such as determining the likelihood of equipment and machinery breaking down. Using a combination of data, statistics, machine learning and modeling, predictive maintenance is able to optimize when and how to execute maintenance on industrial machine assets.

What Happens to DevOps when the Kubernetes Adrenaline Rush Ends?

Kubernetes has been around for nearly 10 years now. In the past five years, we’ve seen a drastic increase in adoption by engineering teams of all sizes. The promise of standardization of deployments and scaling across different types of applications, from static websites to full-blown microservice solutions, has fueled this sharp increase.

Visualizing MongoDB with Grafana Cloud

Learn to connect your MongoDB data and other data sources to a single dashboard to provide impactful insights into your business and improve visibility across teams. Vijay Tolani will demonstrate how to translate and transform your existing application and business metrics into flexible and versatile charts within dashboards using a wide array of visualizations, such as panels, bar gauges, geomaps, and more. Also query and alert on MongoDB and MongoDB Atlas data in real time without having to migrate or ingest it.

Grafana Labs' new AI-powered chatbot

Got questions about Grafana? Just ask Grot, Grafana's new AI-powered chatbot (still in beta). Built in partnership with Pal, a company that creates AI assistants for businesses, and inspired by our bulbous dino mascot, Grot the chatbot has been trained with large language models (LLM) on Grafana Labs’ own content. It can help you easily answer just about any question about our Grafana LGTM Stack, our open and composable hosted Grafana Cloud platform, and more — regardless of how narrow or broad the query might be or what language is set in your browser.

The Quest to Process Microsoft Windows Event Logs in Snare Format with Cribl

One of the things I really love about working for Cribl is the ability to help our customers optimize their data. Microsoft Windows Event Logs are something I have always looked to as a proverbial Rosetta Stone to help translate semi-structured, classic-style events into something more efficient and less resource-intensive to search. Extracting field values requires a large number of regular expressions to parse the events, which isn’t ideal.

Kafka Monitoring Using Prometheus

In this article, we are going to discuss how to set up Kafka monitoring using Prometheus. Kafka is one of the most widely used streaming platforms, and Prometheus is a popular way to monitor Kafka. We will use Prometheus to pull metrics from Kafka and then visualize the important metrics on a Grafana dashboard. We will also look at some of the challenges of running a self-hosted Prometheus and Grafana instance versus the Hosted Grafana offered by MetricFire.

Complete Guide To Grafana Dashboards

Grafana is one of the most popular dashboarding and visualization tools for metrics. The Grafana Dashboards are a very important part of infrastructure and application instrumentation. In this post, we will deep dive into Grafana dashboards. We will create a Grafana dashboard for a VM’s most important metrics, learn to create advanced dashboards with filters for multiple instance metrics, import and export dashboards, learn to refresh intervals in dashboards and learn about plugins.

Understanding & Reducing Network Round-Trip Time (RTT in Networking)

In the dynamic realm of modern business operations, the heartbeat of connectivity relies on the seamless flow of information across networks. Network administrators and IT professionals, entrusted with the pivotal responsibility of maintaining these vital lifelines, understand the significance of every nanosecond.

Exploring systemd journal logs with Netdata

Today, we released our systemd journal plugin for Netdata, allowing you to explore, view, search, filter and analyze systemd journal logs. Like most things about Netdata, this is a zero-configuration plugin. You don’t have to do anything apart from installing Netdata on your systems.This is key design direction for Netdata, since we want Netdata to be able to help even if you install it mid-crisis, while you have an incident at hand.

Helpful Tools for Starting a Transportation Business

Starting a transportation business can be a rewarding yet challenging endeavor. Considering options like the cheapest gas in Melbourne can help save costs and enhance efficiency. Here, we will discuss the most useful software for a transportation startup. These tools can help you ensure you are running your business as efficiently as possible. Keep reading!

A Conversation on Smart Infrastructure Management

In a recent episode of the Millennium Live Podcast, Galileo partner Charles Araujo, an industry analyst, author, and recognized authority on digital transformation, joined the host, Conor Tuohy, to delve deep into the world of smart infrastructure management. The interview provides valuable insights into the evolving landscape of IT operations, the challenges posed by digital transformation, and the role of Galileo Suite in revolutionizing infrastructure management.

Delivering Distributed Transaction Tracing Across Integration MESH

Distributed transaction tracing (DTT) is a way of following the progress of message requests as they permeate through distributed cloud environments. Tracing the transactions as they make their way through many different layers of the application stack, such as from Kafka to ActiveMQ to MQ or any similar platform, is achieved by tagging the message request with a unique identifier that allows it to be followed.

CapEx vs OpEx for Cloud, IT Spending, & More

Capital expenditures (CapEx) and operational expenditures (OpEx) are two ways organizations categorize their business expenses. Every organization has a variety of expenses, from office rent to IT infrastructure costs to wages for their employees. To simplify accounting, they organize these costs into different categories, two of the most common being CapEx and OpEx.

Container Orchestration: A Beginner's Guide

Container orchestration is the process of managing containers using automation. It allows organizations to automatically deploy, manage, scale and network containers and hosts, freeing engineers from having to complete these processes manually. As software development has evolved from monolithic applications, containers have become the choice for developing new applications and migrating old ones.

Elasticsearch and Arduino: Better together!

An easy way to communicate with Elasticsearch and Elastic Cloud using Arduino IoT devices At Elastic®, we are constantly looking for new ways to simplify search experience, and we started to look at the IoT world. The collection of data coming from IoT can be quite challenging, especially when we have thousands of devices. Elasticsearch® can be very useful to collect, explore, visualize, and discover data — for all the data coming from multiple devices.

Are there any alternatives to OpenTelemetry worth considering?

Are you looking for an OpenTelemetry alternative? Then you've come to the right place. There are no good alternatives to OpenTelemetry if your use case involves generating different types of telemetry signals like logs, metrics, and traces and their collection. In certain use cases, like monitoring only metrics or time-series data, you can use a tool like Prometheus. If you’re sure you want an OpenTelemetry alternative, then let me point you to these three here.

Aggregation mapping pattern in BizTalk to Azure Integration Services migration

Let’s embark on a new journey as we begin a series of blog posts dedicated to the migration of BizTalk Server to Azure Integration Services. I’d like to highlight that when I mention the migration to Azure Integration Services (AIS), I’m making a clear distinction from Logic Apps. This differentiation is important because, contrary to what some consultants and salespeople may suggest, migrating BizTalk Server entirely to Logic Apps is not a viable path!

Send Lambda traces to Grafana Cloud with OpenTelemetry

AWS’s serverless technologies are popular because they provide cost effective scaling and great separation of concerns. However, observing serverless architectures like Lambda is challenging due to their transient nature and abstracted infrastructure. Unlike traditional systems with consistent hosts, serverless functions are ephemeral, often scaling rapidly and operating in isolation.

Dependency Redundancy Groups in Icinga 2.14

Icinga 2.14 introduced a new feature that allows to better model complex dependencies between your hosts and services: redundancy groups. Let’s take an e-mail server as an example. In order to deliver outgoing messages, it has to look up the addresses of the destination servers and relies on DNS for doing so. For incoming messages, it has to know which accounts exist and in a corporate environment, this typically means looking up user accounts in a directory service like LDAP.

SLA vs. SLO vs. SLI: What's the Difference?

When it comes to managing services effectively, terms like SLA, SLO, and SLI are often thrown around like confetti at a parade. They’re in meetings, in documents, and even in casual office conversations. But if you’re new to the field or simply haven’t had the chance to dig into these acronyms, they can feel like a bewildering alphabet soup. And they can’t be missing on an uptime monitoring blog such as ours! So, what do these terms really mean?

Getting Started with Infrastructure Monitoring

This article was originally published on The New Stack and is reposted here with permission. By taking advantage of monitoring data, companies can ensure their infrastructure is performing optimally while reducing costs. While building new features and launching new products is fun, none of it matters if your software isn’t reliable. One key part of making sure your apps run smoothly is having robust infrastructure monitoring in place.

A Vicious Cycle: Data Hidden Behind Lock and Key

Understanding production has historically been reserved for software developers and engineers. After all, those folks are the ones building, maintaining, and fixing everything they deliver into production. However, the value of software doesn't stop the moment it makes it to production. Software systems have users, and there are often teams dedicated to their support.

Ending Saint Helena's Exile from the Internet

Just after midnight on October 1, 2023, the remote island of Saint Helena in the South Atlantic began passing internet traffic over its long-awaited, first-ever submarine cable connection. In this blog post, we cover how Kentik’s measurements captured this historic activation, as well as the epic story of the advocacy work it took to make this development possible.

Getting Started with the OpenTelemetry Collector

In the previous article I covered how to set up auto-instrumented tracing for a Node.js app using OpenTelemetry (OTEL). We then sent the spans directly to the open source tracing tool Jaeger. I recommend you give that a read first before walking through this guide because we're going to re-use the instrumentation we set up last time. Today we're going to take things a step further by introducing the OpenTelemetry Collector.

Why collaboration is vital for mature security practices and how to achieve it

Learn how collaboration fueled by business risk observability can help your teams protect what matters most. According to IDC, 750 million cloud native applications will be created globally by 2025, underscoring the seismic shift to cloud native application environments to harness the scalability and agility of the cloud.

Elevating IT Support for VIPs: The Power Of Proactive Solutions

VIPs can be hard work, but in many ways, that’s for good reason. Whether it’s your C-suite that carries the responsibility of the company on their shoulders, or if it’s your top-shelf customers that form a big part of your business, you really need to look after them all. You know that, but from an IT perspective, how can you not support them while making your life easier? You need to quit being reactive. Easier said than done… but here’s how to start making it happen.

Troubleshooting Unknown Unknowns with the Tier Metric Correlator

Tier Metric Correlation allows for fast root cause analysis by tying together business transaction performance outliers, nodes/servers, and key metrics that indicate a path to problem identification. In this example, we navigate through a blue/green deployment to identify a broken pipe/database issue. From here, we can drill down into the call graph and root cause.

Grafana vs. Zabbix

Grafana is a visualization tool that allows you to see and analyze all of your metrics in one unified dashboard. Grafana can pull metrics from any source, display that data, and then enable you to annotate and understand the data directly in the dashboard. Grafana dashboards are designed to allow you to visualize information in a ton of ways, from histograms and heatmaps to world maps. Grafana also has an alerting feature that can communicate with you through Slack, PagerDuty, and more.

Visualize user interactions with your pages by using Scroll Maps in Datadog Heatmaps

When developing modern applications, product managers, designers, and website developers need to understand how users interact with web pages in order to guide those users through their desired journeys. For example, teams need to know if users ever see the content near the bottom of the page, where to place CTAs to ensure they are in high-traffic areas, and how to compare different pages based on user engagement.

Organize and analyze related session replays with Playlists in Datadog RUM

Datadog Session Replay in Real User Monitoring (RUM) enables customers to capture and visually replay the web and mobile experience of their end users. With Session Replay, customers can quickly find and address UX errors by seeing precisely what actions an end user took, the point where they got stuck, and the outcome encountered as a result. Session Replay allows for easier troubleshooting and debugging because it delivers visible, insightful context into frontend errors.

Agentless monitoring for Prometheus in Grafana Cloud (Grafana Office Hours #15)

Did you know you could do agentless monitoring for Prometheus in Grafana Cloud? Senior Software Engineer Matt Nolf talks about the new Metrics Endpoint integration and how you can use it to collect Prometheus metrics from publicly addressable hosts-- WITHOUT using Grafana Agent. He is joined by Developer Advocates Nicole van der Hoeven and Paul Balogh.

Top OpenTelemetry Tools Most Suited for OpenTelemetry Data

OpenTelemetry is a Cloud Native Computing Foundation(CNCF) project aimed at standardizing the way we instrument applications for generating telemetry data(logs, metrics, and traces). OpenTelemetry lets you export the data it collects to any backend of your choice. In this article, we will discuss some of the top OpenTelemetry tools that are tailored to support OpenTelemetry data, offering valuable insights into the functioning and optimization of applications.

Taking the Work Out of Workflows with Nexthink Flow's Low Code Visual Designer

How automated are your automations? You (or your expert engineers) are probably spending hours on complicated PowerShell coding – writing, testing, reviewing, signing, and updating. What if there were a better way to coordinate your automations with workflows? Orchestrate multi-layer automated detection, communication, integration, and action.

3 Nexthink Flow Integrations to Help You Automate Anything, Anywhere

Nexthink’s ability to integrate with anything has long been one of the most popular aspects of our products. From your ITSM tool to Azure AD, Nexthink data can be connected across your environment – so you can act on Nexthink insights to build the best possible digital employee experience. Nexthink Flow continues our dedication to integrations by allowing for the sharing of data and actions in both directions for several key integrations.

5 of Our Favorite Use Cases for Nexthink Flow

Automation and orchestration present huge opportunities for business efficiency, time optimization, and cost savings. With Nexthink Flow, EUC teams now have the power to orchestrate full end-to-end automations that eliminate repetitive manual work and drive employee productivity. But with so many opportunities at your fingertips, it can be hard to know where to begin. Flow is a powerful orchestration engine, with incredible flexibility designed to fit diverse business needs.

Deploying Java Spring Boot with OpenTelemetry Faster than Docker Build

In our journeys as developers, we frequently encounter the need for speed and efficiency. But often, integrating development tools can feel like a time-consuming venture, more so than our usual build processes. If you’ve ever found yourself delving into java logs looking for needles in logstacks, you’ll appreciate the beauty of this 1-click OpenTelemetry.

Releasing Icinga Ansible collection v0.3.0

This release of the collection will feature a whole set of possibilities to deploy a complete Icinga 2 environment. Before diving deep into the collection, a quick recap of all roles which were available and which are included in the current release v0.3.0. New Roles in v0.3.0 To further enhance the Icinga 2 installation process via Ansible those roles are vital for a successful deployment. The Icinga DB is the future backend of Icinga 2, this can be handled with our icingadb and icingadb_redis roles.

Revolutionizing Data Strategy: Achieving 99.94% Cost Savings and Accelerated Performance with Cribl Search

Imagine sending logs to cost-effective storage, converting them into efficient metrics, and forwarding only essential data for analysis. This change can slash ingest and long-term storage expenses by an order of magnitude! Enter Cribl Search—an ingenious solution that skillfully navigates storage, transforms logs into actionable metrics, and seamlessly channels vital data to your analysis systems. The result? Over 99.94% reduction in volume, enhanced efficiency and substantial cost savings.

The future of SCOM monitoring: An agile approach to service delivery

The future of SCOM monitoring: An agile approach to service delivery | OpsLogix As modern technology continues to evolve, adaptability and responsiveness become a necessity for all organizations. Throughout our years of experience, we have developed a unique approach, leveraging agile methodologies, to meet these evolving needs. Because of this, we are now able to offer a SCOM monitoring service unlike any other.

Introducing Nexthink Flow: IT Orchestration for Efficient IT

Go beyond automation and bridge the gap between monitoring and management. Real-time, AI-powered data combined with a powerful, low code orchestration engine that continuously optimizes complex workflows, monitors progress, handles exceptions, and ensures that all tasks are completed in the right sequence and with the right dependencies. By automating repetitive and manual tasks, EUC teams can optimize resource allocation, capitalize on opportunity costs, and enable EUC teams to shift-left and effortlessly maintain desired state.

What is Infrastructure Monitoring : Comprehensive Guide

Tired of being a firefighter in your IT department who is always battling issues after they erupt? Well, you’re not the only one. With IT infrastructure becoming more complex day by day, IT managers across the globe face the same challenge. But not anymore. Infrastructure monitoring tools are here to save the day by empowering IT managers to not only predict but also prevent IT infrastructure issues. But how?

Ingest OpenTelemetry logs with the Datadog Agent

OpenTelemetry (OTel) is an open-source, vendor-neutral observability solution that provides a suite of components—including APIs, SDKs, and a data collector—that enable teams to collect and communicate telemetry data from cloud-native applications and services. OTel also defines the OpenTelemetry Protocol (OTLP), a standard for the encoding and transfer of telemetry data.

Ingesting and analyzing Prometheus metrics with Elastic Observability

In the world of monitoring and observability, Prometheus has grown into the de-facto standard for monitoring in cloud-native environments because of its robust data collection mechanism, flexible querying capabilities, and integration with other tools for rich dashboarding and visualization.

Logic App Best Practices, Tips, and Tricks: #37 How to handle special characters inside Logic Apps actions?

Today, I will speak about another useful Best practice, Tips, and Tricks that you must consider while designing your business processes (Logic Apps): How to handle special chars inside Logic Apps actions.

How to embed Grafana dashboards into web applications

Grafana dashboards are powerful and flexible tools for observing applications and infrastructure, so it’s no surprise we get a lot of questions from the community about how to embed them into their web applications. Over the past few releases, we’ve developed a lot of options for how to do this in Grafana, but there can be confusion about how they work, and when to use each approach.

Grafana Agent v0.37: Feature parity between Static and Flow mode, easy migration configs, and more

Grafana Agent v0.37 is here! This new release brings a lot of exciting new features and marks the pinnacle of a year-long effort to achieve feature parity between Grafana Agent Flow mode and Grafana Agent Static mode. We also extended our config converter to ease the migration from Static to Flow mode and we added the possibility to split your Flow configuration into multiple files. Please make note of some breaking changes in this release.

Debugging Modern Applications: Advanced Techniques

Today’s applications are designed to be always available and serve users 24/7. Performing live debugging on such applications is akin to doctors operating on a patient. Since the advent of the “as a service” model, software is like a living, breathing entity, akin to an anatomical system. Operating on such entities requires more dexterity on the developer’s part, to ensure that the software application lives on while being debugged and improved continuously.

Unlocking Network Capacity Monitoring & Planning: From Overload to Overdrive

In the world of networking, where connectivity and data flow are at the heart of operations, a network hiccup can spell disaster. The digital transformation wave has engulfed businesses of all sizes, and as companies become increasingly reliant on their networks to serve customers, collaborate with teams, and manage critical processes, the demand for robust and efficient network performance has never been higher.

Centralized Logging & Centralized Log Management (CLM)

Centralized logging provides visibility into the system by consolidating all the log data in a single all-in-one source. It supports two particular enterprise needs: Once all the data is ingested in a central location, you can seamlessly identify the problems in systems and troubleshoot them. But with ease comes challenges, too. For example, your team members may struggle with locating their desired details from this sea of data.

Monitoring Kubernetes costs with OpenCost and VictoriaMetrics

Control over operational costs is pivotal in Kubernetes' deployment and management. Although Kubernetes brings power and control over your deployments, it also necessitates thorough understanding and management of costs. OpenCost, specifically designed for Kubernetes cost monitoring, combined with VictoriaMetrics, an efficient time series database, offers a comprehensive solution for this challenge.

Webinar Recap: Introducing InfluxDB Clustered

Time series data is foundational in almost all applications and services. Even if time series isn’t the focus, like in an IoT sensor data centered application, it appears in monitoring data as metrics, logs, and traces. Because of time series data’s unique characteristics, it’s best served in a time series database. InfluxDB is purpose-built to handle the high volume and velocity of time series ingestion, and perform real-time analytics, alerting, and anomaly detection at scale.

A Long Time Ago, on a Server Far, Far Away...

This article was originally published on The New Stack and is reposted here with permission. Here is a brief case study that explores the logistics and motivations that would lead a successful company to spend time and resources completely rewriting the core of their flagship product in Rust. Calling a programming language Rust almost seems like a misnomer. Rust is the brittle byproduct of corrosion — not something that would typically inspire confidence.

(Crowd)Strike While the Data Is Hot: Getting Started with CrowdStream, Powered by Cribl

In today’s landscape, what’s considered security data has expanded to encompass more diverse data types like network data, behavioral analytics, and application metrics. These sources are now essential for a comprehensive security strategy, and visibility into all that data makes proactive threat detection possible. That said, organizations often struggle to process data from various vendors and merge telemetry sets to gain a complete view of their environments.

systemd journal logs: A Game-Changer for DevOps and Developers

“Why bother with it? I let it run in the background and focus on more important DevOps work.”— a random DevOps Engineer at Reddit r/devops In an era where technology is evolving at breakneck speeds, it's easy to overlook the tools that are right under our noses. One such underutilized powerhouse is the systemd journal. For many, it's a mere tool to check the status of systemd service units or to tail the most recent events (journalctl -f).

OpenTelemetry Webinars: Logs in OpenTelemetry

Join Nočnica and Nityananda Gohain in an exploration of the best way to send logs with OpenTelemetry. More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator. SigNoz helps developers monitor applications and troubleshoot problems in their deployed applications. SigNoz uses distributed tracing to gain visibility into your software stack.

Use Grafana to Monitor Flask Apps With Graphite

Monitoring the performance and health of web applications is paramount for ensuring a seamless user experience. Flask offers developers the flexibility to build dynamic applications. However, as applications grow in complexity, so does the need for efficient monitoring solutions. This is where Grafana and Graphite come into play.

Distributed tracing with Grafana Cloud k6 (Grafana Office Hours #14)

Senior Software Engineer Łukasz Gut talks about a new feature: Distributed Tracing with Grafana Cloud k6. He discusses what distributed tracing is, why it matters, and how it can help teams find reliability issues faster. He is joined by Developer Advocates Marie Cruz and Nicole van der Hoeven.

Evolution of The Windows Experience Index and Reliability Monitor

As technology advances, operating systems play a vital role in providing a seamless user experience. Microsoft’s Windows OS has been at the forefront, constantly introducing innovative features over time. Two features related to improving the end-user digital experience are the Windows Experience Index (WEI) and Reliability Monitor. These measurements have become instrumental to Digital Experience Monitoring (DEM) in assessing system capabilities and measuring stability.

OpenShift monitoring: Five crucial elements to look out for

Most IT firms build their empire on Kubernetes, for its amazing flexibility and super scalability. RedHat OpenShift Container Platform (formerly OpenShift Enterprise) is a hybrid cloud application platform powered by Kubernetes, which initially only operated on-premise, and has been open to service for more than nine years.

Local US government slices MTTD by 50% within a year using Applications Manager

Since its incorporation in 1852, the government administration has grown into a regional hub for marketing, processing, packaging, and distributing agricultural commodities for trade areas. The agency aims to deliver municipal services that meet its resident’s vital health, safety, and general welfare needs to sustain and improve their quality of life.

Cotiviti witnesses 82% improvement in application uptime and proactively prevents outages with Applications Manager

Established in 1979, Cotiviti is a leading analytics company that aids healthcare organizations, such as payers and providers, in enhancing their financial performance and operational efficiency through advanced data analytics, technology, and consulting services. Apart from working with more than 180 healthcare players, the organization also supports the retail industry with audit and recovery services in order to increase efficiency and maximize profitability.

How to Monitor SQL Server with OpenTelemetry

At observIQ, we've seen growing interest in observing the health of Windows systems and applications using OpenTelemetry. Requests on the SQL Server receiver continue to garner the most interest, so let's start there. Below are steps to get up and running quickly with the contrib distribution of the OpenTelemetry collector. We'll be collecting and shipping SQL Server metrics to a popular backend, Google Cloud.

AI is not intellignece: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Bugs in NASAs codebase : Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Top 15 Azure Cost Management Tools in 2023

Azure, Microsoft’s cloud platform, has become an essential part of modern businesses, offering a vast array of services and resources. However, effective cost management in Azure is crucial to avoid unexpected expenses and optimize spending. While Azure provides its native tools for cost management, several third-party solutions offer advanced features and capabilities to help you make the most of your Azure resources. This blog will explore the top 15 Azure cost management tools.

Manage API performance, security, and ownership with Datadog API Catalog

Today’s modern applications are made up of thousands of loosely connected private and publicly exposed APIs, each serving a specific function. This dynamic API landscape, in combination with the decentralized nature of microservice development, can be overwhelmingly challenging to manage—let alone govern or secure adequately. API sprawl is often created as a result, leading to fragmented or nonexistent internal API documentation, knowledge bases, and toolsets.

Improve your API test coverage with Datadog Synthetic Monitoring

As your applications grow, your teams may be faced with managing a complex, expanding mesh of potentially thousands of loosely connected APIs—each one a new point of failure that can be difficult to track and patch. API sprawl comes naturally in rapidly expanding, distributed applications, and the difficulty of maintaining centralized knowledge and toolsets for your APIs creates friction when teams need to leverage APIs they don’t own.

The Hidden Costs of Website Outages and How Uptime.com Has Your Back

Businesses lose potential revenue, trust, and brand reputation every moment your website is down. Some of those things can never be earned back. Website outages sting whether you’re a blossoming startup or a seasoned enterprise. How often do they happen, and what’s the actual cost? That is exactly what we will explore together today!

Evolution at the Core: LogicMonitor's Transformative AI Empowers the Future

New Integrations Provide Superior Digital Developer and User Experiences Our most recent release of Dexda demonstrates our commitment to harnessing the transformational power of Artificial Intelligence (AI) for hybrid observability. Using a variety of advanced machine learning and AI techniques, Dexda dramatically reduces alert fatigue. Our use of intelligence and automation continues to change the way IT teams work.

Predictive Network Technology in 2024

IT networks generate large volumes of information in the form of security, network, system and application logs. The volume and variety of log data makes traditional network monitoring capabilities ineffective — especially for monitoring use cases that require proactive decision making. These decisions are based on things like: All of this makes large-scale and complex enterprise IT networks a suitable use case for advanced AI and machine learning capabilities.

Tutorial: Monitoring MySQL Server Performance with Prometheus and sql_exporter

Databases in one form or another are almost an inseparable part of modern applications. A popular one among them is MySQL on which this article will focus. But how to monitor MySQL? This article will give an introduction to this topic.

Prometheus Dashboards

Prometheus is a very popular open-source monitoring and alerting toolkit originally built in 2012. Its main focus is to provide valid insight into system performance by providing a way for certain variables of that system to be monitored. Prometheus displays the performance of these variables as a graph to allow its users to see their system’s performance at a glance.

Monitor Azure Resource Events with LogicMonitor Logs

The integration of Azure’s event-driven model with LogicMonitor’s monitoring capabilities offers businesses a robust solution for real-time IT infrastructure monitoring. LogicMonitor’s cloud-based platform provides a comprehensive overview of an organization’s IT infrastructure, both on cloud and on-prem.

Optimize Kubernetes Monitoring Costs with Dynamic Property Filtering

LogicMonitor’s LM Container is an excellent choice for users who wish to effectively monitor Kubernetes environments running mission-critical business applications. We have now introduced a new cost optimization feature called dynamic property filtering, which offers added flexibility and customization to users. In this blog post, we will guide you through this exciting new feature with a step-by-step example of selectively monitoring a persistent volume based on its state.

Cloud Imperium Games moves ELK stack with ChaosSearch.

Cloud Imperium Games (CIG) is a prominent video game development company known for its ambitious project, Star Citizen, which aims to be an open-world, massively multiplayer online space simulation game. As a result of the game's popularity, all the metrics, events, and logs, generated to track every single action during gameplay, also experienced explosive growth in terms of volume and also in diversity (a consequence of the dynamic and fast-paced development environment).

Troubleshooting Common Kafka Conundrums

This is the third blog in our series on Kafka, where we continue to explore the nuances of deploying Kafka for scale. In our previous blogs, Essential Metrics for Kafka Performance Monitoring and Auto-Instrumenting OpenTelemetry for Kafka, we laid the foundation for understanding Kafka’s performance and monitoring aspects. Now, as we explore further into the Kafka ecosystem, we’re here to tackle the common challenges that can arise during deployment and scaling.

7 Best Azure Service Bus Monitoring Tools in 2023

Azure Service Bus is a cloud messaging service that transfers information between services running in both the cloud and on-premises. So, it becomes essential to ensure the performance and availability of Service Bus as it might be used in applications and integrations for transferring business-critical messages. To help you with that, we have listed and compared the top Azure Service Bus monitoring tools with their features.

Telemetry 101: An Introduction To Telemetry

Understanding system performance is critical for gaining a competitive advantage. Telemetry provides deeper insights into the system, helping business owners make better decisions. This article take a comprehensive look at the topic of telemetry. We’ll look at its functionality and telemetry types. We’ll also look at all the things telemetry data can help you with — plus the challenges companies with telemetry systems might face.

Listen, Learn and Adapt: The Keys to a Nimble Customer Experience Strategy

In celebration of Customer Experience Day 2023, this post is part of a series on customer experience and the ways that Splunk strives to deliver superior customer experience at every level. Any resilient customer experience (CX) team knows that in order to create superior customer experiences, listening is the first step. This is made apparent when you consider that 73% percent of customers expect companies to have a firm grasp on their unique needs and expectations.

Unify and query private network data in Grafana Cloud: Private Data Source Connect is now GA

You may be ready to make the move to Grafana Cloud, but securely querying private data has been a blocker. If you wanted to query a network-secured data source like a MySQL database or an Elasticsearch cluster that is hosted in an on-premises private network or a Virtual Private Cloud (VPC), you needed to open your network to inbound queries from a range of IP addresses.

How to securely send your telemetry to Grafana Cloud using AWS PrivateLink

Using Grafana Cloud to manage and monitor even your most sensitive data from your AWS services just got easier. If your organization’s workloads are hosted in AWS and you are using a Grafana Cloud instance that’s also hosted in AWS, you can now use AWS PrivateLink to establish a secure connection between your virtual private cloud (VPC) network and Grafana Cloud for all your data.

September Product Updates for Sentry

It’s official, summer is over. So grab yourself a pumpkin-spiced food item of choice and check out what the Sentry team has been up to this past month. From introducing new features, product improvements, and integrations, we can objectively say we made Sentry at least a smidge better this month. Keep reading to see how the latest developments can make your debugging experience less painful.

Full Stack Observability Guide - Examples and Technologies

As modern software systems become increasingly distributed, interconnected, and complex, ensuring production reliability and performance is becoming harder and more stressful. Seemingly nondescript changes to our infrastructure or application can have massive impacts on system uptime, health, and performance, all while the cost of production incidents continues to grow.

Warning: 3 Reasons Why You Shouldn't Pay a "Setup Fee" When Buying a Website Uptime Monitoring Solution

As you may have already discovered (or will soon encounter), many vendors that offer uptime monitoring solutions charge a setup fee. But instead of seeing this as a legitimate cost, you should view it as stop sign. There are three reasons why.

Why Do Monitoring Service Thresholds Overlap?

Although the title of this blog poses the question “Why do Monitoring Service Thresholds Overlap?”, really the question should be: “In Remote Monitoring and Management Solutions, Why Do Some Monitoring Service Thresholds Overlap?”. That’s a bit of a mouthful, but it’s what I’m going to look at in this blog. Here’s why overlapping thresholds in remote monitoring matter.

AWS PrivateLink for Grafana Cloud

Navigating secure connectivity and managing data transfer costs are vital to optimizing your cloud-based services. AWS PrivateLink offers a solution to improve security and reduce costs, especially when integrating with Grafana Cloud. In this tutorial, we delve into the problems solved by PrivateLink, its working mechanism, and a step-by-step guide on configuring it to connect to Grafana services. What You'll Learn.

Upgrade to DX UIM 20.4 CU9 to Leverage New Features and Security Updates

DX Unified Infrastructure Management (DX UIM) is a powerful solution that enables comprehensive infrastructure observability across your digital ecosystems, including private, public, and hybrid clouds. With DX UIM, you can proactively and efficiently manage the performance and availability of your IT infrastructure and applications. DX UIM 20.4 is the current main branch of the solution. This release offers a number of significant capabilities that weren’t available in earlier versions.

Future-Proof Your Observability Strategy With CrowdStrike and Cribl

Traditional logging tools are struggling to keep up with the explosive pace of data growth. Data collection isn’t the most straightforward process — so deploying and configuring all the tools necessary to manage this growth is more difficult than ever, and navigating evolving logging and monitoring requirements only adds another layer of complexity to the situation.

Writing code with empathy: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The job of a backend dev: Build good ACs: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Losing customers because of bad software: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

What you do in practice is what you do in a game: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Bugs in NASAs codebase and importance of QA in engineering : Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Auto Optimize Your Observability with a Time-Based Collection Strategy

Observability has become one of the largest line items in the IT budget, second only to cloud costs. A main reason for this is teams are often stuck collecting significantly more data than they need. This is where Circonus Passport helps. Rather than filter data after it’s collected like current observability data pipeline management tools, Passport is used to filter data before it’s collected.

Our Favorite Grafana Dashboards

Grafana is an open-source visualization and analytics tool that lets you query, graph, and alert on your time series metrics no matter where they are stored - Grafana dashboards provide telling insight into your organization. All data from Grafana Dashboards can be queried and presented with different types of panels ranging from time-series graphs and single stats displays to histograms, heat maps, and many more.

Monitoring your infrastructure with StatsD and Graphite

Collecting metrics about your servers, applications, and traffic is a critical part of an application development project. There are many things that can go wrong in production systems, and collecting and organizing data can help you pinpoint bottlenecks and problems in your infrastructure. In this article, we will discuss Graphite and StatsD, and how they can help form the basis of monitoring infrastructure.

How to Monitor Hybrid Networks for End-to-End Visibility: Hybrid Network Monitoring

Hybrid networks, which combine on-premises infrastructure with cloud-based services, have become the backbone of modern operations. While they offer numerous advantages, they also present unique challenges when it comes to network monitoring and management. Maintaining the health and security of a hybrid network requires a comprehensive understanding of its intricate architecture and real-time visibility into its performance.

How to identify and fix Render-Blocking Resources

Render-blocking resources are JavaScript and CSS files that prevent the web page from loading until they are downloaded. These might be critical resources that don’t get loaded immediately, or non-critical resources that are being loaded at the very beginning. Fixing render-blocking JavaScript and CSS helps improve page load times so sneakerheads don’t bounce to your competitor’s site while waiting for the images of the latest drop to load.

Launch of SigNoz Cloud, Improvements in Logs tab, and Metrics Query Builder - SigNal 29

Welcome to the 29th edition of our monthly product newsletter - SigNal 29! We are excited to share important updates from Team SigNoz. We are pleased to announce the public launch of SigNoz cloud. We’ve also raised funding of $6.5M to fuel the next phase of building and growth. We also shipped many improvements to the product. Let’s dive in to see what humans at SigNoz were up to in the month of September 2023.

Presenting Generative AI Integration in the Apica Ascent Platform

If generative AI is innovative for enterprises in 2023, being cloud-based is ubiquitous. What that means is that the data today is extremely voluminous and complex. Not to mention that all that data needs proactive monitoring and analysis. Thus, data in observability and monitoring can often be complex and challenging to understand due to its sheer volume and diversity.

Maximize the campus experience with next generation observability

See how higher education institutions can leverage full-stack observability to provide the best possible application experiences for students, staff and faculty. Now more than ever, delivering a superior user experience is fundamental to digital transformation — and not just in the corporate world. Higher education has discovered the value of digital experiences for engaging and supporting students, keeping faculty productive and satisfied and creating efficiencies that save money.

Configuration Drift: Understanding, Avoiding, Managing and Resolving in Kubernetes

If you work with Kubernetes, you know that any number of issues can pose a serious threat to the stability and security of your deployments. One that's subtly damaging is configuration drift, which occurs when the actual state of how your system is set up — its configuration — strays from the way you defined. Configuration drift in Kubernetes can happen when people make changes manually, systems aren't synchronized properly or monitoring falls short.

Top 10 Tools to Monitor Core Web Vitals of Your Website

What guarantees the success of a website today isn’t just its content and design; delivering a seamless and efficient user experience (UX) is also extremely critical. This is where Core Web Vitals are important as they provide a collection of performance metrics to evaluate the quality of website user experience. Core Web Vitals are critical to attract visitors and retain them as they directly impact a site’s visibility on Google.

How We Did It: Data Ingest and Compression Gains in InfluxDB 3.0

A few weeks ago, we published some benchmarking that showed performance gains in InfluxDB 3.0 that are orders of magnitude better than previous versions of InfluxDB – and by extension, other databases as well. There are two key factors that influence these gains: 1. Data ingest, and 2. Data compression. This begs the question, just how did we achieve such drastic improvements in our core database? This post sets out to explain how we accomplished these improvements for anyone interested.

Simplifying Microsoft Teams Troubleshooting for IT Teams

Microsoft Teams has become the go-to platform for seamless collaboration and communication. However, like any technology, performance issues can arise, and these issues affect user experience and productivity. For IT teams tasked with Microsoft Teams troubleshooting, having access to comprehensive data is key. In this blog, we explore the challenges faced by IT teams and how harnessing more data can make the process significantly easier.

Harmonizing Digital Channels and Business Operations to Deliver a Good Customer Experience

In celebration of Customer Experience Day 2023, this post is part of a series on customer experience and the ways that Splunk strifves to deliver superior customer experience at every level. Today, customers interact with brands through a variety of channels and platforms. In fact, 57% of customers prefer to engage with brands through digital channels first.

The Single Pane of Glass in Modern Observability

Recently I caught up with Jamie Allen on Episode 67 of the Slight Reliability podcast to discuss the idea of a single pane of glass (SPOG). Jamie had written an article titled The Single Pain of Glass which coincidentally was what I titled Slight Reliability Episode 10. I thought given our shared use of puns and this topic that it was worth a conversation! So, what is a single pane of glass? Is it an idea with practical application? How does it fit into the world of modern observability?

Configure a policy to detect and block attacks and exploits

With Cisco Secure Application, you can configure run-time policies to continuously monitor vulnerabilities and automatically find and block attacks. Your speed and uptime are maximized while the risk to your business is minimized. And your teams gain time to plan and remediate your environment.

The Link Between Early Detection and Internet Resilience: A Lesson from Salesforce's Outage

Almost every study examining the hourly cost of outages invariably leads to a clear and undeniable conclusion: outages are expensive. According to a 2016 study, the average cost of downtime was estimated at approximately $9,000 per minute. In a more recent study, 61% of respondents stated that outages cost them at least $100,000, with 32% indicating costs of at least $500,000 and 21% reporting expenses of at least $1 million per hour of downtime.

An overview of Context Propagation in OpenTelemetry

In today's rapidly evolving landscape of software applications, where complexity often thrives, the need for observability and tracing has never been more pronounced. The ability to comprehend the inner workings of distributed systems and track the journey of requests as they traverse through various components is paramount for maintaining optimal performance and troubleshooting issues. This is where OpenTelemetry, a prominent observability framework, steps in.

OpenTelemetry Exporters - Types and Configuration Steps

In this post, we will talk about OpenTelemetry exporters. OpenTelemetry exporters help in exporting the telemetry data collected by OpenTelemetry. OpenTelemetry frees you from any kind of vendor lock-in by letting you export the collected telemetry data to any backend of your choice. In modern distributed systems, efficiently collecting, transmitting, and analyzing telemetry data from diverse sources poses a significant challenge.

Production vs Local in engineering: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

What is Zero Trust Reliability in engineering: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Demo of Internet Sonar: From Disruption to Instant Detection

Catchpoint's new Internet Sonar shows you global Internet status at a glance in an AI-powered, real-time, interactive dashboard and map. It answers the first question any IT team needs to ask when there's an outage: "Is it me, or is it something else?" Key product features: In this recorded live demo session, leaders from our Product team will walk you through how Internet Sonar works, how you can use it to lower MTTR, and how organizations are using it to save millions.

10 Essential Office Security Upgrades for the Digital Age

In today's modern office environment, businesses are continually evolving their security measures, recognizing the importance of safeguarding sensitive data and assets. With the advent of technology, traditional security methods like locked file cabinets are being complemented and, in some cases, replaced by innovative solutions such as the smart lock for a file cabinet. Here, we will explore ten crucial office security upgrades that are essential in the digital age, each contributing to the protection of the company's data.
Sponsored Post

EXperience Level Agreements (XLA), the Next Step with SLAs

The importance of the network to businesses has increased over the years (obviously). Nowadays, they are the main way that work gets done - they're the main way anything gets done. Consequently, how organizations measure their performance needs to change as well. Rather than just focus on network availability or simple uptime, they need to dig deeper and monitor User Experience. EXperience Level Agreements (XLA) as opposed to the traditional Service-Level Agreements help them reach that goal.

Sponsored Post

meshIQ - Seeking Partners to Deliver Observability Across Integration MESH

In today's rapidly evolving technological and business landscapes, staying competitive requires more than just a great product or service. It demands a technological edge that can drive efficiency, innovation, and overall growth. This is where partnering comes into play - it's like turbocharging your business engine. Today, meshIQ is looking to turbocharge our sales teams, processes, and reach by adding power via partnerships.

Sponsored Post

Better CI/CD with GitHub Actions and deployment tracking

Understanding the impact of each of your deployments is crucial, especially as they become increasingly frequent. Chances are, your team is either aiming to increase shipping velocity or has already started deploying "continuously" (which is to say, multiple times a day). The biggest tech teams at the likes of Amazon and Google deploy thousands of times daily, and Atlassian has found that 75% of enterprise DevOps teams call deployment frequency their most important success criteria. And while CD comes with a host of well-established benefits, it also introduces a heightened risk of introducing new errors and issues.

The Best Cloud Infrastructure Automation Tools

The past decade has seen a drastic growth in the adoption of public cloud. One of the primary reasons for this is its cheaper infrastructure and ease of scale. With such rapid adoption of public cloud, the need for infrastructure automation also arises. This is because teams want to quickly provision infrastructure and automate tasks that previously took weeks in the case of traditional data centers, down to minutes in the public cloud.

What are Prometheus Functions?

Prometheus is a platform for real-time systems and event monitoring and alerting. The Prometheus project is free, open-source, and available on GitHub. Originally developed at SoundCloud, Prometheus became a project of the Cloud Native Computing Foundation in 2016, alongside other popular frameworks such as Kubernetes. The core of the project is the Prometheus server, which acts as the system’s “brain” by collecting various metrics and storing them in a time-series database.

Configuring Python StatsD Client

Building and deploying highly scalable, distributed applications in the ever-changing landscape of software development is only half the journey. The other half is monitoring your application states and instances while recording accurate metrics. There are moments when you wish to check how many resources are being consumed, how many files are under access by the specialized process, etc. These metrics provide valuable insights into our tech stack execution and management.

Essential Metrics for Kafka Performance Monitoring

Apache Kafka is an open-source distributed streaming system that has grown in popularity and usage across the technology industry. Originating from LinkedIn and now part of the Apache Software Foundation, Kafka provides a robust and scalable platform. It’s uniquely designed with an architecture that includes both a storage layer and a compute layer.

Best One To One Plus Alternatives for K-12 in 2023

One to One Plus is a cloud-based, all-in-one software solution, tailored specifically to K-12 institutions. They provide a comprehensive suite of integrated IT asset management (ITAM), help desk software, and inventory management. As they describe on their website, they are “designed for K-12, built by K-12,” and appear to have been started by tech directors working in K12.

Fostering a fearless engineering culture: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The mistake boot in engineering: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

What's missing in engineering today?: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Engineers should have a desire to find bugs: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The only industry not licensed to do their job - Engineering: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Software is ubiquitous and can change our mood: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

My job is an engineer = build ACs: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The job of a backend dev: Build good ACs - Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The only industry not licensed to do their job - Engineering: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Virtana Named in Prestigious Industry Research by Gartner

Virtana’s AI-powered platform is at the forefront of IT infrastructure management, offering a comprehensive suite of tools and services that empower IT leaders to make informed decisions on how to forecast demand and streamline operations. The rapid evolution of technology has ushered in an era of complexity and dynamism that IT leaders must navigate effectively.

Tracing Your Steps Toward Full Kubernetes Observability

Kubernetes is one of the most important and influential technologies for building and operating software today because it’s so incredibly capable. It’s flexible, available, resilient, scalable, feature-rich and backed by a global community of innovators — that’s a pretty impressive list of intangibles to apply to any particular capability.

How to Build a ROI Plan for Cribl Stream

Getting your organization to invest in a new tool requires telling a story that helps decision-makers understand its benefits. In a recent webinar, our experts discussed how to define an ROI for Cribl Stream. They also shared a sample proposal you can use to craft the story you’ll tell to leadership, and gave some tips and tricks for justifying the purchase of these key tools for your business. Engineers and architects understand core technical problems better than anyone.

Cloud Migration Basics: A Beginner's Guide

What is a cloud migration? A cloud migration is the practice of moving IT workloads (data, applications, security, infrastructure, and other objects) to a cloud environment. Quick Links: Cloud migration can take many forms, including: There is also another type of cloud migration called a reverse cloud migration (also known as cloud repatriation or cloud exit) where existing applications are moved from a public cloud back to an on-premises data center.

Anomaly Detection in 2024: Opportunities & Challenges

Anomaly detection is the practice of identifying data points and patterns that may deviate significantly from an established hypothesis. As a concept, anomaly detection has been around forever. Today, detecting anomalies today is a critical practice. That’s because anomalies can indicate important information, such as: Let’s talk a look at the wide world of anomaly detection.

Enterprises Realize Benefits from Migrating to Cloud with Splunk

Today, for a lot of organizations, moving to the cloud provides the best strategy to drive higher business efficiency and scale. But moving to the cloud can be challenging. IT leaders are continuously looking for ways to focus more on driving business value while moving to the cloud.

State of the Internet: Monitoring SaaS Application Performance

With the increasing reliance on SaaS applications in organizations and homes, monitoring connectivity and connection quality is crucial. In this post, learn how with Kentik’s State of the Internet, you can dive deep into the performance metrics of the most popular SaaS applications.

How continuous profiling improved code performance for a new Grafana Loki feature

Throughout the software development process, engineers can use a number of methods and tools to ensure their code is efficient. When using Go, for example, there are built-in tools, including those for benchmarking and CPU/memory profiling, to check how efficiently code will run. Engineers can also run unit tests to validate code quality.

Broadcom Delivers Advanced Network Management Capabilities to Optimize Network Operations, Accelerate Network Transformation and Enhance Connected Experiences

We are pleased to announce the availability of Network Management by Broadcom, which includes DX NetOps 23.3 and AppNeta. It assures end-to-end observability, minimizing the visibility gaps beyond the network borders to Cloud, SaaS, and Sites. The solution provides a unique and industry-leading unified Network Management approach, allowing organizations to optimize network operations, accelerate transformation and enhance connected experiences.

ING's bold leap into the future: Building a global, cloud-based financial messaging system with Elastic

ING Group is a Dutch-based multinational banking and financial services corporation serving more than 38 million customers globally. It’s one of the biggest banks in the world, consistently ranking among the top 30 largest banks globally. At ING, our 20-year-old COBOL-based financial messaging system — which provides electronic instructions to enable financial transactions between banks and customers — is slowly becoming obsolete and difficult to integrate.

The WeWork-ization of software: Piyush Verma - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

The most beautiful thing about Kubernetes: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Stop using debuggers, learn a mental model of a codebase: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Observability-OSS vs Paid vs Managed OSS with Hosted Graphite

Observability is a critical aspect of modern software development and infrastructure management. It involves the ability to gain insights into the internal workings of your systems, applications, and services through monitoring and collecting relevant data. With the increasing complexity of technology stacks and the need for real-time visibility, observability has become a fundamental requirement for businesses across various industries.

Planning and Baselining a Migration to Azure SQL

A migration from on-premises SQL Server to Azure SQL offers many customers a number of advantages. It can enable scalability, reduce costs, enhance security, ensure high availability, and simplifies maintenance. Many organizations are looking to equivalent cloud services to move on-prem workloads such as SQL databases to the cloud, freeing themselves from the overheads of purchasing, configuring and maintaining physical hardware and infrastructure.

In engineering, DON'T BUILD FAST: Bill Kennedy - The Reliability Podcast

The Reliability podcast aims to speak with engineers who have worked on large, complex systems and glean through their learnings. What best practices should one imbibe? What are non-negotiable learnings to become better at a craft? What’s ‘engineering’ going to be like with the advent of AI? We answer these and more tracing personal journeys of engineers who have built stellar careers around decoding the innumerable intricacies of software engineering.

Utilizing Synthetic Testing in Networking (Synthetic Network Testing)

Network performance is the lifeblood of modern enterprises, underpinning everything from communication to data transfer, and even the smooth functioning of mission-critical applications. A single glitch can lead to downtime, productivity losses, and dissatisfied users. This is where synthetic testing steps into the spotlight as an invaluable tool for ensuring your network's reliability and resilience.

Improving Your Interaction to Next Paint (INP)

Interaction to Next Paint (INP) is a newer addition to the Core Web Vital metrics intended to measure how real users perceive the responsiveness of modern web applications. Web Vitals Measurements like INP are becoming increasingly important as web applications and SPA’s run more JavaScript on the client side.
Sponsored Post

Cloud Provider Uptime Monitoring: September 2023 Insights

Check our September 2023 health report on the top most popular cloud providers. We analyze the health of the cloud providers based on the number of outages and problems during the month. The source of the data is made available by the cloud providers themselves via their status page. We normalize it and use it to generate the report.

Best practices for creating custom detection rules with Datadog Cloud SIEM

In Part 1 of this series, we talked about some challenges with building sufficient coverage for detecting security threats. We also discussed how telemetry sources like logs are invaluable for detecting potential threats to your environment because they provide crucial details about who is accessing service resources, why they are accessing them, and whether any changes have been made.

The Top 10 OpenSearch Plugins

OpenSearch is a powerful, open-source analytics and search engine that can be utilized to construct custom search solutions for a broad variety of applications, from websites to enterprise-level systems. It enables flexible search and indexing abilities, making it suitable for a range of uses, a great example of this is scalability. OpenSearch is designed for horizontal scalability, enabling organizations to input additional nodes to their cluster as data volumes and query loads increase.

So We Shipped an AI Product. Did it Work?

Like many companies, earlier this year we saw an opportunity with LLMs and quickly (but thoughtfully) started building a capability. About a month later, we released Query Assistant to all customers as an experimental feature. We then iterated on it, using data from production to inform a multitude of additional enhancements, and ultimately took Query Assistant out of experimentation and turned it into a core product offering.

Reducing data center carbon emissions with Hardware Sentry, Grafana, and OpenTelemetry

With just 30 employees, Sentry Software might be considered a small company, but they’re prioritizing sustainability in a big way. As the makers of Hardware Sentry, an IT monitoring software, a large part of their business relies on maintaining optimal temperature conditions at their data centers — an operation that contributes to the company’s overall carbon footprint.

Triangulate: Add Logs to Your Monitoring Mix

For many IT organizations, triaging or troubleshooting starts with assessing symptoms. As practitioners investigate the causal factors by answering each of the “5 whys,” logs are often where the actual root cause answers lie. This is even more true for issues related to configuration changes, change management, and security. However, diving into log data can be overwhelming as a first step due to the high volume and velocity of logs and missing context.

How to deploy a Hello World web app with Elastic Observability on AWS App Runner

Elastic Observability is the premiere tool to provide visibility into web apps running in your environment. AWS App Runner is the serverless platform of choice to run your web apps that need to scale up and down massively to meet demand or minimize costs. Elastic Observability combined with AWS App Runner is the perfect solution for developers to deploy web apps that are auto-scaled with fully observable operations, in a way that’s straightforward to implement and manage.

Query 3rd Party API Datasets in Real Time with Cribl Search

In today’s world of relentless data growth, security-relevant logs represent a small snapshot of an organization’s overall environment. Teams are beset with a variety of data types, including performance metrics and traces, asset configuration and state, audit logs, and much more. On top of that, teams are expected to scan all of this to compare against industry best practices and join this data with logs and metrics for added context.

When and How to Use Aggregators

There's lots of great reasons to run OTel agents as aggregator / gateway collectors. In this video we discuss 4 of the most common! About ObservIQ: observIQ brings clarity and control to our customer's existing observability chaos. How? Through an observability pipeline: a fast, powerful and intuitive orchestration engine built for the modern observability team. Our product is designed to help teams significantly reduce cost, simplify collection, and standardize their observability data.

Why Does Observability Need OTel?

To successfully observe modern digital platforms, a new data collection approach was needed. And OpenTelemetry (OTel) was the answer - an industry-agreed open standard - not a single vendor's approach - on how observability (O11y) data should be collected from a platform. This separates out data collection from the vendors’ platform of data processing and visualisation, making the data collecting approach vendor agnostic.

Announcing Splunk Federated Search for Amazon S3 Now Generally Available in Splunk Cloud Platform

Splunk is pleased to announce the general availability of Federated Search for Amazon S3, a new capability that allows customers to search data from their Amazon S3 buckets directly from Splunk Cloud Platform without the need to ingest it. Enterprises rely heavily on cloud object storage services as the de facto destination for their new data to leverage the cost, compliance, security, scalability and manageability benefits that cloud platforms can offer.

An Introduction to Cloud Unit Economics in FinOps

The cloud’s elasticity—the ability to scale resources up and down in response to changes in demand—as well as variable cost structures offer significant advantages, enabling enterprises to move from rigid capex models to elastic opex models where they pay for what they provision, with engineers in control and focused on innovation, becoming true business accelerators.

The Future of Open Source: SaaS, the Final Frontier

Open source dominates certain kinds of software: operating systems, programming languages, libraries, frameworks, and developer tools. A few open source applications such as Audacity and VLC have found a place on the desktop. But by and large, software has moved to the cloud … and open source is moving with it. Join us for a discussion with the CEOs of three SaaS companies that adopt an open source strategy for their core product.

Forwarding Windows Events to CLM

Looking at your IT environment, you probably have various machines and applications connected to your networks. From network devices to servers to laptops, you need to know what’s happening at all times. While your log data provides the monitoring information you need, your environment’s diversity makes aggregating and correlating this information challenging. If your company invested in Windows devices, then your struggle is even more real because Microsoft uses proprietary format.

Our uptime check can now verify the absence of a string

The most popular check that Oh Dear offers is, without a doubt, our uptime check. It's enabled for almost every site we monitor. By default, this check will notify you when your site returns a non-2xx response, but you can greatly customize that behavior. You can check if the response has certain headers, if the response contains a particular string, and more! Some of our users requested a new behavior: checking the absence of a string on the response.