Operations | Monitoring | ITSM | DevOps | Cloud

October 2021

Dynamically control your custom metrics volume with Metrics without Limits

Sending custom metrics to Datadog allows you to monitor important data specific to your business and applications, such as latency, dollars per customer, items bought, or trips taken. And tags are key to being able to slice and dice these custom metrics to quickly find the information you need. But collecting enough custom metrics to have complete visibility can be cost prohibitive. For example, you might run microservices instrumented across thousands of containers.

Developers Can Now Debug Running Nomad-Orchestrated Applications Using Lightrun

In basically every modern software organization, building software is not just a matter of writing code – it’s a matter of testing it to ensure it works properly, a matter of creating artifacts out of it that can be used by the end customers, and a matter of deploying them to a customer-accessible location for these customers to be able to actually use it.

Top 7 SolarWinds Competitors and Alternatives to Know in 2021

SolarWinds Inc., based in the United States, is a software company that helps businesses manage their networks, systems, and IT infrastructure. Its headquarters are in Austin, Texas, and it has sales and product development offices around the United States and in a number of other countries. It has acquired a number of other businesses, including Pingdom, Papertrail, and Loggly, which it continues to operate under their original names.

Get control back into the Control Room

This article explains how SquaredUp for SCOM leverages the true power of the SCOM platform: the SCOM object model. I believe in dashboarding you need simplicity and granularity all in one. Simplicity for your Control Room, which gives clear and quick insight. Granularity and detail for your system management engineers to be able to drill-down into details and find that Root-Cause quickly.

MANTL and LogDNA Roundtable

Hear from James Qualls, Director of Engineering at MANTL, on how LogDNA is empowering the developers on his team to own their monitoring. MANTL found that once developers could own their logging and monitoring, the infrastructure team and application architecture team were able to work better together. For MANTL, the ability to remove bottlenecks and scale using LogDNA meant they were able to respond to the needs of their customers quickly and enable more people to bank from the safety of their own homes.

Dashboard Fridays: Sample Azure DevOps Dashboard

Join SquaredUp's Adam Kinniburgh and Azure expert Shaswot Subedi as they showcase this example Azure DevOps Dashboard. This dashboard uses SquaredUp’s WebAPI tile and data from Azure DevOps to give our DevOps Team the performance overview of our build and release pipelines that they always wanted. Tune in to learn how it was made, the challenges it solves, and our experts top tips for building it yourself.

Nastel Technologies Launches New and Improved Website

Nastel Technologies, a global provider of messaging middleware-centric performance and transaction management for mission-critical applications, has officially launched its new and improved website. Founded in 1994, Nastel has built a reputation for excellence and is used by some of the world’s top brands, including Dell Technologies, Citi, BlueCross Blue Shield, and more. With this launch, the company takes another step toward providing an outstanding customer experience.

Kubernetes Monitoring Resources

Heaven knows we all could use some luck these days, and observability may be just the thing we need. But observability isn’t luck, and it isn’t really new either. A few people even know that observability is an aspect of control theory, which dates back to the 1800s! In this blog post, I’ll cover some of the history of observability vs.

Catchpoint Ushers In A New Era Of Visibility With The Addition Of 5G Mobile Edge Nodes

From its inception, Catchpoint has been a pioneer in terms of observability and its ability to deep scan infrastructures and protocols that bind the Internet. Our industry-leading observers gather in-depth data, providing the broadest coverage across wireless, cloud, backbone, and last mile networks. That data arms people across the enterprise with the information they need to provide a superior digital experience.

Will Serverless computing reshape big data and data science?

Serverless development has been turning heads in the market for quite some time now. But it has yet to be accepted by the majority in the development community. With AWS Lambda, Azure Functions, and IBM’s Open Whisk, the market is poised to take a different route in this field. Most of these organizations are spending a lot of money to make the market accept this new paradigm using serverless computing.

Splunking Netflow with Splunk Stream - Part 2: Basic Netflow Analytics

Hi there, I guess that if you are here, you've already read the first part of this series and want some help to quickly get value from your NetFlow data, building trend analysis and advanced analytics with long term data (i.e months), in addition to playing with real-time data. You can take advantage of Splunk’s super flexible schema on read architecture to exploit your real-time data from the very first moment you get the data in.

LogDNA vs. Logz.io

Logz.io is a SaaS (software as a service) provider with an observability offering made up of various managed open source technologies. These technologies include the Elastic Stack for logging and SIEM (security information and event management), Prometheus, for monitoring, and Jaeger for tracing. The company positions itself as an alternative to the Elastic Stack (or ELK Stack), which is made up of Elasticsearch, Logstash, Kibana, and Beats.

Keep Gamers Gaming - Application Monitoring for Unity

Given the millions of registered Unity developers worldwide, Unity is arguably the most popular engine used to develop games. But, whether you’re building the latest FPS or a turn-based classic, you need visibility in how your game is performing on a gamer’s device. More than 800 game development and platform companies rely on Sentry, from OutFit7 to Riot, Epic Games, and Unity.

Introducing Cloud Native Observability

The term ‘cloud native’ has become a much-used buzz phrase in the software industry over the last decade. But what does cloud-native mean? The Cloud Native Computing Foundation’s official definition is: From this definition, we can differentiate between cloud-native systems and monoliths which are a single service run on a continuously available server. Like Amazon’s AWS or Google Azure, large cloud providers can run serverless and cloud-native systems.

Release Webinar: Connection Center for Webhooks Inbound

The release of our Connection Center for Webhooks Inbound means SCOM can now become a Webhook listener, enabling it to automatically receive data, and raise what it receives as SCOM alerts and events. This webinar takes you through all the new features of our latest integration for Inbound Webhooks and showcases how you can use it to make SCOM your central monitoring resource.

Rollbar Pro Tips: Manage Rollbar automatically through the Rollbar Terraform Provider

Terraform is a multi-cloud provisioning product used to create, manage, and update infrastructure resources. The Provider will automate the creation, modification, and removal of resources within your account such as projects, users, and teams. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Dash 2021 Keynote

The Datadog team deliver the annual Dash keynote. At Dash 2021, we announced new products and features that give your team even greater visibility into the health and performance of your code, databases, CI/CD pipelines, and more. Now, you can monitor network devices, get visibility into your services' golden signal metrics without touching a single line of code, and integrate third-party tools into our platform with Datadog Apps. We expanded RUM to include iOS error tracking, Session Replay, and Watchdog Insights. And we introduced Datadog Observability Pipelines, which run on your infrastructure and put you in control of your observability data, from how it’s processed to where it’s sent.

Panel: Improving Monitoring & Reliability with Chaos Engineering - Dash 2021 (Datadog,Gremlin,Pismo)

Monitoring and observability are critical for knowing how your systems are behaving, but how do you create the feedback loops to shift from reactive monitoring for incidents to proactively preventing them? In this roundtable discussion Mauricio Galdieri, Software Architect at Pismo.io and Kolton Andrus, CEO and co-founder of Gremlin join Tay Nishimura, Site Reliability Engineer on the Chaos Engineering team at Datadog to chat about monitoring, Chaos Engineering, and using them together to build more reliable systems.

Scaling HashiCorp's Cloud Platform - Dash 2021 (HashiCorp)

Identifying bottlenecks during times of high load is critical to building a scalable software platform. Stress testing is one way to simulate high load on a system and allows you to proactively capture potential bottlenecks before they impact customers. Once a solution is implemented to address the bottleneck, you need a way to measure success and find a new limit. See how HashiCorp Cloud Platform (HCP) has developed a stress testing framework which heavily relies on Datadog’s custom metric capabilities in combination with some out of the box integrations to give HCP engineers a comprehensive view of their platform and how they used these insights to scale their concurrent data-plane provisioning by 300%.

Panel: Handling Incident Response - Dash 2021 (Datadog, PagerDuty)

When customer-impacting downtime happens, it’s crucial that responders are prepared and can resolve these issues as quickly as possible. Knowing the right tools to use, from wherever you are working from, will help to have a well-defined strategy in place to come together as a team, work the problem, and get to a solution quickly. In this roundtable discussion, PagerDuty and Datadog engineers chat about incident responses and how we use all the tools at our disposal to respond quickly and effectively.

Roundtable: The Complexities of Cloud Migration - Dash 2021 (Datadog, LaunchDarkly, StockX)

Often when completing a migration project, you’re having your organisation straddle between two systems. You’re fighting habits and changing attitudes while also attempting to complete a high-risk operation. Every software team at one stage in their career will have to complete a migration. Whether it’s to improve scalability and performance, or transition between an on-prem to cloud solution, you’ll need a deep understanding of your current environment to create a strategy that minimises downtime for your team.

How to do serverless monitoring right #shorts

Monitoring CPU load and memory usage is common practice, but with serverless no action is required. In this video, we quickly explain that if your Cloud Run instances start hitting high CPU load, Google Cloud will automatically spin up new instances for you, and vice versa!

"Open source done right": Why Canonical adopted Grafana, Loki, and Grafana Agent for their new stack

Michele Mancioppi is a product manager at Canonical with responsibility for observability and Java. He is the architect of the new system of Charmed Operators for observability known as LMA2. Jon Seager is an engineering director at Canonical with responsibility for Juju, the Charmed Operator Framework, and a number of Charmed Operator development teams which operate across different software flavors including observability, data platform, MLOps, identity, and more.

How Pingdom's Real User Monitoring Can Help Optimize Your WordPress Website

Enterprise web applications or medium-to-large, consumer-facing websites are typically built by teams of engineers, administrators, web developers, and other professionals. However, once a site goes live, the operations team is responsible for keeping the site up and running at optimal performance. Online users aren’t forgiving, often abandoning a site as soon as they encounter an issue with functionality, complexity, or performance.

Working With the WordPress REST API

Logging is an important part of every software application. In addition to capturing user activity, well-structured logs can make it easier to debug problems should they occur. But if your application is split up across several different subsystems, collecting and analyzing disparate logs can be a real challenge. Picture this scenario: You work at a startup that uses a CMS managed by a few admins. You also have a standalone front-end application for users to communicate with your platform via an API.

Changes In Technology: How Catchpoint Monitors And Observes The Internet, From 2008 To Today

Last month, Catchpoint celebrated its 13th birthday, a milestone which has us feeling more than a little nostalgic. As we embark on our teenage years as a company, we have also been looking back and reflecting on all the changes the world of technology has seen since Catchpoint was “born” back in 2008. The world looks very different today than it did at our founding and, for that matter, so do we!

Capitalizing on Cloud to Drive Greater Value from Data and Analytics Insights: A New Accenture Splunk Business Group Business Group

We’ve had a busy week in the Splunk Partnerverse! In addition to the new Splunk Partnerverse Program announced last week, we also shared some significant partnership news with one of our longest-standing partners, Accenture.

Having Trouble Getting the Right Insights From Your Cloud Network Monitoring? Preview Splunk NPM Today!

With cloud computing, containers, virtualization and the move to software-defined infrastructure, your cloud infrastructure monitoring is consistently evolving. New challenges can impact the reliability of business services where network performance is a significant subset. Your network is filled with a multitude of hosts, distributed services, and containers. It’s hard to monitor the health of these ephemeral cloud infrastructure components. How is their behavior affecting your applications?

Is Anyone Reading Your Company Comms? How This IT Channel Received +2214% More Views Than Email

An A/B test reveals Nexthink Engage gets 2,214% more views than email. When a company gets big news, they want to share that success with employees and make sure they understand what this information means for the business. Case in point: Recently, a Nexthink customer and leading life sciences company, received FDA approval – a huge milestone for them that would surely amplify the company’s brand and with it, new sales and marketing demands.

LMA 2: Reimaginging observability with MicroK8s and Grafana, Prometheus and Grafana Loki

Juju re-imagines the world of operating software securely, reliably, and at scale. Juju realizes the promise of model-driven operations. Excellent observability is undeniably a key ingredient for operating software well, which is why the Charmed Operator ecosystem has long provided operators the ability to run a variety of open source monitoring software. We collectively refer to these operators as the Logs, Metrics, and Alerts (LMA) stack.

Event Highlights: InfluxDays North America 2021

Roadmap revealed, insight gained, connections made, and InfluxDB Challenge swag won — that’s a wrap for InfluxDays North America 2021! The conference, which brought together the #InfluxDB community to exchange time series acumen and use cases, is always a reminder of why and how we’ve grown. It included subject-matter-expert led courses, on-demand videos and live sessions.

What Is Continuous Security Monitoring?

Today, organizations rely on computers, the internet, and data to perform operations. What's more, due to the COVID-19 pandemic, employees and businesses now operate remotely. The dependency on computer systems and internet technologies also means that many contractors and vendors have provided IT services and software to the average company. Small, medium, and large enterprises depend on third parties to provide various services over the internet.

NiCE DB2 Management Pack 5.00 released

The NiCE DB2 Management Pack 5.0 is an enterprise-ready Microsoft SCOM add-on for advanced IBM Db2 monitoring. It supports the Db2 system, application, and database administrator in centralized IBM Db2 health and performance monitoring to improve user experience and business results. The Management Pack provides clear and precise performance indicators and timely alerts enriched by pinpointing problem identification and troubleshooting information.

Monitor NS1 with Datadog

NS1 is an intelligent DNS and traffic management platform that helps optimize the performance of your network infrastructure and speed application delivery to your end users. Since even a small increase in service latency can lead to churn and revenue loss, it’s critical to remove any inefficiencies embedded in basic network functions. NS1 helps ensure high performance for name resolution and routing through support for the edns0-client-subnet (ECS) DNS extension and for Filter Chain technology.

VoIP Service Provider and Titanium 3CX Partner, Amplisys, uses Obkio to Monitor End-User Network Performance

Learn how Obkio works with 3CX partner and VoIP Provider, Amplisys, to create a streamlined network monitoring and troubleshooting process to quickly and proactively support every one of Amplisys’ VoIP customers for impeccable customer satisfaction and support.

What is Core Web Vitals and How to Monitor It

It used to be simple to improve the performance of a website. However, the introduction of client-side JavaScript has opened up a whole new world of ways for websites to be painfully slow. Measuring this increased slowness will require the development of new measurements. They're known as the Core Web Vitals by Google. Google announced Core Web Vitals in May 2020, a set of three indicators that serve as the gold standard for evaluating a website's user experience.

Streaming Auth0 Logs to Datadog | Sivamuthu Kumar (Computer Enterprises, Inc.)

Are you using Auth0 in your application for user logins? How will you monitor the Auth0 logs and detect user actions that could indicate security concerns? In this session, we will see how Datadog helps you to extend security monitoring by analyzing Auth0 User activities in the logs. And also we will see how to set up threat detection rules to trigger notifications automatically based on them.

Maintaining Operational Sanity Across 100+ AWS Accounts | Eric Mann / Ryan Tomac (Vacasa)

At Vacasa, AWS accounts represent the unit of isolation for distinct applications & services in our software ecosystem, providing security benefits and operational autonomy for our teams as we scale. Managing accounts at this scale requires strong DevOps practices to maintain security, operational sanity, and uniform observability across the system. In this talk, we’ll cover the benefits of such an approach, the practices that make it possible, and the important role Datadog plays.

Democratizing Delivery: Seamless Observability for Optimal Application Performance |Ekim Maurer(NS1)

When application delivery performance issues happen, observability is critical to diagnosing the problem at hand. The adage “it’s always DNS” means that observability must extend to the foundational layers of the application delivery and access networking stacks. Yet granting administrative access to core network services like DNS and DHCP may run contrary to an organization’s least-privileged access policies. In this session, attendees will learn how global internet companies and enterprises use NS1 and Datadog to provide democratized DNS observability and reach optimal application performance.

Observability for Service Organizations | Bart Scheltinga (RawWorks)

Observability is trending. Organizations that rely on cloud infrastructure and cloud applications prioritize observability initiatives to get control over their business’s applications. At the same time, we see the “gap” between the on-premises infrastructure and “non-cloud” infrastructure is becoming bigger. Examples are End User Computing (EUC) and Global networks (SD-WAN).

Metrics for Apache Kafka with Datadog and Aiven | Ryan Martin (Aiven)

Using managed services is all very well, but how do you get the data you need from the different services into Datadog so you can see it all in one place? This session will walk through the configuration for bringing your Aiven-managed Apache Kafka service metrics into your Datadog explorer. You’ll see how to filter the metrics to focus on specific topics or consumer groups, and how to use the Aiven client to create a repeatable, scriptable setup. This session is recommended for anyone living in the as-a-Service world who cares about data and is interested in using metrics to optimize their Kafka clusters.

Monitoring Open Source Success in Arduino | Silvano Cerza (Arduino)

Arduino is an open-source hardware and software company, project, and user community that designs and manufactures single-board microcontrollers and microcontroller kits for building digital devices. In the course of developing software downloaded and used by millions around the world, we have found it vitally important to be aware of the quality and performance of our software.

How Changelog monitors and optimizes website performance with Grafana Cloud

Developers around the world get their news from Changelog, an indie media company on a mission to create inspiring content for software developers. Through their popular podcasts, including The Changelog, Go Time, JS Party, and Ship It!, the team at Changelog helps listeners stay up-to-date on the latest happenings, trends, and tools in a constantly evolving industry.

What Is an OTEL Collector?

The primary goal of OpenTelemetry (OTEL) is to offer vendor-neutral ways of application instrumentation such that customers are able to switch between Telemetry backends. There are three main components of OpenTelemetry: OpenTelemetry SDK, OpenTelemetry API, and OpenTelemetry Collector. In this blog, we will be covering the architecture, deployment, and best practices of consuming an OTEL collector.

We're Making Observability Available in Splunk Enterprise!

For you, one or more of these statements (and / or challenges) likely apply to you, and the organization for which you work. Which of these are you hearing or saying? Splunk can help you with these in many ways. Today, I am highlighting one way to address many of these statements, specifically with the Content Pack for Splunk Observability Cloud.

Check System Health on the Go with Splunk Observability Cloud For Mobile

With the demand to meet service level agreements (SLAs), any on-call SRE can tell you that incidents always happen at the wrong time. Things break when you least expect them to (on a date, about to beat a new level in a video game, pizza delivery just arrived, asleep at 3am). During these inopportune moments, you want to make sure it's easy to get the data you need, no matter which device is nearby.

Start Real-Time Monitoring of Microsoft Outlook in 5 Minutes

Microsoft Outlook is the premier enterprise productivity application that has become ubiquitous with managing every aspect of a employees workday. According to Wikipedia, there are around 400 million users for Outlook. You can install Outlook as a standalone desktop app that connects to Exchange Server – Online or On-premises (still!) or the full featured Outlook Web Access (OWA).

ScienceLogic Announces Integration with ServiceNow Service Graph Connector Program

ScienceLogic announces it has joined the ServiceNow® Service Graph Connector Program by integrating its ScienceLogic SL1 connector with Service Graph, helping customers to quickly, easily and reliably load third-party data into the system, enabling data quality, timeliness and scalability.

Use funnel analysis to understand and optimize key user flows

Monitoring frontend performance and user behavior is essential to ensure that your application is functioning optimally. Datadog RUM enables you to collect key user data and correlate all of it with frontend performance metrics to track how your pages’ performance affects user behavior.

Historical log analysis and investigation with Online Archives

To have full visibility into modern cloud environments, businesses need to collect an ever-growing avalanche of log data from a range of highly complex data sources. Indexing logs is key for real-time monitoring and troubleshooting, but it can quickly become expensive at high volumes, meaning that organizations often must choose which logs to index and which to archive.

Extend your Datadog functionality with Datadog Apps

Last year, we launched the Datadog Marketplace, which lets Datadog partners develop and trade applications that provide custom monitoring solutions for specific use cases. Now, we’re pleased to announce Datadog Apps, which introduces even further customizability to the Datadog platform. With Datadog Apps, you can now build and share your own Datadog UI features that seamlessly combine functionality from your third-party tools with the full range of Datadog’s monitoring capabilities.

Introducing Network Device Monitoring

For many organizations, the success of their business depends on their ability to maintain on-prem or hybrid infrastructure. For instance, some companies rely on data centers for security reasons or to support their large, static workloads, while others must execute their critical business processes as close to the edge as possible to ensure minimal latency.

Dash 2021: Guide to Datadog's newest announcements

Today at Dash 2021, we announced new products and features that give your team even greater visibility into the health and performance of your code, databases, CI/CD pipelines, and more. Now, you can monitor network devices, get visibility into your services' golden signal metrics without touching a single line of code, and integrate third-party tools into our platform with Datadog Apps. We expanded RUM to include iOS error tracking, Session Replay, and Watchdog Insights.

Adaptive Alerts: Easy, actionable alerts for noisy systems

Adaptive Alerts feature provides reliable, informative, and actionable notifications about unexpected issues in monitored applications and services. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Rollbar Pro Tips: Launch Darkly Feature Flag

Enabling the Launch Darkly integration allows engineers to automate Feature Flag toggles based on errors captured in Rollbar. This means that if you ship a feature to users, only 1 user will see an error before Rollbar automatically toggles the feature flag for all subsequent users. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

The Nightmare Before Business: Stay Safe with Uptime.com Status Pages

We’re nearing Halloween and mischief night has stolen tricks from the holiday season. With online sales alone expected to creep up toward $3 billion before the next crescent moon, we’re offering you a solution to keep the angry mobs with pitchforks at bay by giving them a crystal ball into your real-time incident response with Uptime.com Status Pages.

VMworld 2021: "We're Proud To Announce..."

I've never seen so much news during VMworld! It began to seem comical that every speaker at the opening "General Session" and subsequent keynotes used the line "We are proud to announce." By the way, one of the most excellent General Sessions I've ever seen in terms of tempo, delivery, and rhetoric! From October 15, you will be able to find all content on-demand here.

How We Use Sloth to do SLO Monitoring and Alerting with Prometheus

One of the most challenging tasks for Site Reliability Engineers is to align the reliability of the systems with the business goals. There is a constant battle between delivering more features—which increases the product’s value—and keeping the system reliable and maintainable. A significant ally to achieve both objectives is the Service Level Objective Framework.

Install Netdata to get started monitoring Linux in minutes

Install Netdata to monitor your Linux servers using our one-line installer. Install on physical, virtual, container, and IoT nodes. Netdata’s free, open-source monitoring agent works with Netdata Cloud to help you monitor and troubleshoot every layer of your systems to find weaknesses before they turn into outages.

How to monitor Docker containers using Netdata health and performance

Learn how to connect and claim a Docker node to start monitoring with Netdata in minutes. See information like system CPU, available memory, disk usage, total network bandwidth, and much more. Netdata’s free, open-source monitoring agent works with Netdata Cloud to help you monitor and troubleshoot every layer of your systems to find weaknesses before they turn into outages.

Learn how to build interactive dashboards with Netdata Cloud for troubleshooting systems

This video will show you how to build new dashboards with key metrics from any number of distributed systems in one place for a bird's eye view of your infrastructure. Create more meaningful visualizations for troubleshooting or keep a watchful eye on your infrastructure's most meaningful metrics without moving from node to node. Netdata’s free, open-source monitoring agent works with Netdata Cloud to help you monitor and troubleshoot every layer of your systems to find weaknesses before they turn into outages.

How to use Netdata Cloud for infrastructure observability

In this video, we cover how Netdata Cloud provides scalable infrastructure monitoring for any number of distributed nodes running the Netdata Agent. Monitor any system in your infrastructure including physical or virtual machines (VM), containers, cloud deployments, or edge / IoT devices. Netdata’s free, open-source monitoring agent works with Netdata Cloud to help you monitor and troubleshoot every layer of your systems to find weaknesses before they turn into outages.

7 JSON Logging Tips That You Can Implement

When teams begin to analyze their logs, they almost immediately run into a problem and they’ll need some JSON logging tips to overcome them. Logs are naturally unstructured. This means that if you want to visualize or analyze your logs, you are forced to deal with many potential variations. You can eliminate this problem by logging out invalid JSON and setting the foundation for log-driven observability across your applications.

Receive charts of Pandora FMS in Telegram

In this video we will learn how to add graphics to the alerts that we receive in Telegram for Pandora alerting. In the first tutorial we saw how to configure to receive these alerts with text and now we will be able to add that they come accompanied by the graph of the module that triggers it. We also leave you the following links to our wiki where you can learn more about the steps that make up the alerts.

Using Netdata's alerts smartboard for monitoring systems' health and performance

The alerts smartboard gives you high-level information for every node you are monitoring with Netdata Cloud. In this video, you will learn how to navigate through the alerts smartboard as well as what each alert means. Netdata’s free, open-source monitoring agent works with Netdata Cloud to help you monitor and troubleshoot every layer of your systems to find weaknesses before they turn into outages.

What's new in Grafana Cloud for October 2021: Machine Learning, Grafana 8.2, new integrations, and more

Here at Grafana Labs, we’re constantly shipping new features to help our users get the most out of Grafana Cloud. To help our new and existing customers learn about the latest and greatest, here’s a roundup of all the new features and improvements you should know about to make the most of Grafana Cloud.

5 Weird Use Cases for Log Management

We’re all familiar with the typical use cases for log management, such as monitoring cloud infrastructures, development environments, and local IT infrastructure. So we thought it would be fun to cover some of the less usual, more wild use cases for log management, just to show that log management tools are more versatile, and more interesting, than they may seem. If any of these use cases look too interesting to ignore, let us know and we can do a full article on them!

Announcing the Logz.io Search Bar

Engineering teams hoping to gain full-stack observability into their environment need access to the relevant logs, metrics, and traces generated by their cloud infrastructure and applications. Accessing the relevant data quickly is essential – not just because it is more convenient, but because faster engineers are also business-critical for many organizations.

New Microsoft Teams Performance Monitoring Solutions Buyer's Guide Helps Businesses Choose the Right Solution

Many organizations have become reliant on Microsoft Teams as the central hub of the digital workplace, allowing teams to work together more efficiently, amalgamating chat, file sharing, email, calendar, meetings, and integrations with countless third-party solutions all in one place. With this reliance comes the need for 24/7 reliability, so that users can stay productive on Microsoft Teams.

Instrumentation for C# .NET Apps with OpenTelemetry

OpenTelemetry is the recommended path today for instrumenting applications with tracing in a standard, vendor-agnostic and future-proof way. In fact, OpenTelemetry (nicknamed OTEL) encompasses all three pillars of observability: tracing, metrics, and logs. The tracing element of the specification is now stable with the rest following. This is innovative stuff! You can read more on OpenTelemetry and the current release state on this guide.

Untangling Business In The ISP Industry With Elliot Noss | Network AF Episode 4

On today's episode of the Network AF podcast, Avi welcomes Elliot Noss, President, and CEO of Tucows. Elliot has a love and passion for the internet that started the moment he was introduced to it. This passion comes through as he discusses his goals in networking and the positive change he wants to make in solving cybercrime issues at the DNS level. Not only is Elliot an expert in networking, but also a great leader. He shares insight into the importance of providing exceptional customer support and how it starts with building a culture around passionate people at Tucows. Watch it now!

Network AF, Episode 4: Untangling business in the ISP industry with Elliot Noss

Today on the episode 4 of the Network AF podcast, host Avi Freedman welcomes his longtime friend Elliot Noss. For 25 years, Elliot has been the CEO of Tucows, the internet services company with the second-largest domain registrar in the world. Elliot is considered an outlier in the ISP industry, largely due to his transparency and for the stellar customer experiences he encourages through Tucows.

5 Best Network Traffic Monitoring Tools

Monitoring network traffic (which is defined as the data moving across your network at a given time) is important for any business looking to maintain a fast and efficient network. Automating network traffic monitoring and analysis with the support of a tool can help IT teams reduce downtime, identify the causes of bottlenecks, boost the efficiency of troubleshooting efforts, and more.

What's New: Extending our Datadog Capabilities With New PagerDuty Widgets

In the last two years, we have seen the rise of remote and hybrid work, and with that, a proliferation of tools and apps needed to support critical communication and collaboration. Finding that app-life balance has become increasingly complex, so simplifying “how” we work is key for every organization.

Launching Performance and Error Tracing & new Vercel integration.

Today, I'm super excited to announce our new Performance and Error tracing features for all Playwright-based browser checks. One of our key initiatives is supporting you with deeper insights into what is causing issues in the web apps you monitor with Checkly. With this new set of features we give you actionable data for easy debugging; Collected automatically with no extra code needed. Of course, fixing bugs in Production is great, but catching bugs before they go live is even better!

Anodot Captures the 2021 Online Shopping Outlook from Retailers & Consumers

It’s the second holiday season since the pandemic broke out but, with many brick-and-mortar stores reopened, will it be more like 2020 or 2019? Will shoppers stick to online behemoths like Amazon or will they shop in-store? And how are retailers planning to offer a competitive experience amid ongoing supply chain issues? We partnered with Researchscape to survey thousands of eCommerce companies and consumers in the U.S. about their plans for this holiday season.

Migrating from Jaeger client to OpenTelemetry SDK

A couple of years ago, the OpenTelemetry project was founded by the merger of two similarly aimed projects: OpenTracing and OpenCensus. One of the goals of this new project was to create an initial version that would “just work” with existing applications instrumented using OpenTracing and OpenCensus.

Announcing the General Availability of Splunk Mobile RUM for Native Mobile Apps

As the world increasingly works, buys, and communicates through native mobile apps. In 2020 there were 218 billion new app installs globally, 13.4 billion from the US alone. The challenge, while iOS and Android applications make up significant portions of user traffic and business, engineering teams and monitoring tools are split between mobile app and backend developers; this creates siloed visibility on how changes to the app or backend components impact each other, and end user experience.

Top six Amazon S3 metrics to monitor

When you’re planning an application performance monitoring (APM) strategy, collecting metrics from storage services like Amazon S3 may not seem like a priority. After all, part of the point of object storage is that applications can read and write from storage buckets seamlessly, with minimal configuration and overhead. Unlike databases or file systems, storage buckets don’t require complex configurations that could lead to performance issues.

Is 'Change Fatigue' Crippling Today's Digital Workforce?

When it comes to workplace innovation, many forward-thinking businesses follow a simple mantra: “Change before you have to.” They pursue organizational or technological changes in order to stay ahead, rather than catch up – and they rely on their employees to adapt quickly to the new standards and structures they put in place. The pandemic threw a wrench into that neat and tidy outlook on innovation. Businesses had to change.

By Developers, for Developers: New Offerings Announced at InfluxDays North America

By developers, for developers – this has always been InfluxData’s approach when building new tools for users, and it’s certainly the case as we roll out the newest round of features and capabilities at this year’s InfluxDays North America. We know that application building isn’t easy and that development cycles are precious. That’s why we focus everything we do around delivering developer happiness.

InfluxData Kicks Off InfluxDays North America, Releasing New Features to Expedite Application Building

InfluxDB enhancements enable developers to get started on building real-time applications quickly and to scale SAN FRANCISCO, October 26, 2021 – InfluxData, creator of the leading time series platform InfluxDB, today announced new capabilities for developers to expedite application building as part of InfluxDays North America 2021 Virtual Experience, its annual event for customers, partners and developer community.

Web, Citrix, Windows, Mobile:2 Steps A Universal Automator

Did you know, 86% of customers are willing to pay more for an amazing user experience . But how do you ensure your applications are always performing at their best? Slow performance in your applications can significantly impact employee productivity and end-user experience. With 2 Steps you can increase your observability across your applications and are not limited to only monitoring web applications.

Introducing Atatus Synthetic Monitoring

A high-performance customer experience might spell the difference between business success and failure in today's always-on, competitive digital economy. However, in today's complex application environment, ensuring optimal performance across digital services and applications has become increasingly difficult. As a result, application performance management (APM) must adapt to provide real-time visibility into the application landscape and to reveal precisely what is affecting user experiences.

Silect and OpsLogix extends partnership in North America

Last year, we announced that OpsLogix partnered up with the Canadian company Silect, a Microsoft GOLD Partner specializing in Microsoft's System Center; read more here. We are extending this Partnership with Silect that has a strong presence on the North American market, with an additional product - our Oracle Management Pack. This means that Silect will exclusively provide customers in Canada and the US with the OpsLogix Oracle MP.

Netdata's Nodes view for troubleshooting system health and performance

This video introduces you to the Netdata Nodes view. Use this view to visualize and customize metrics from any number of Agent-monitored nodes and navigate to any specific nodes within the dashboard. View key monitoring metrics like CPU utilization, memory usage, disk usage, network traffic, and much more to get started troubleshooting performance issues or anomalies. Netdata’s free, open-source monitoring agent works with Netdata Cloud to help you monitor and troubleshoot every layer of your systems to find weaknesses before they turn into outages.

What is Synthetic Monitoring?

Synthetic monitoring is automated testing of critical business transactions and user experiences. Synthetic monitoring helps businesses find, fix and prevent availability issues, performance issues and 3rd party vendors from giving you an insight into performance improvements that you can make to your website and supply chain to improve conversions and user happiness. Synthetic monitoring is also sometimes called user journey monitoring.

Do You Know What's Keeping Your Cloud Team up at Night?

Cloud teams are busy. In fact, Virtana’s recently published State of Hybrid Cloud and FinOps survey found that 44% of respondents have deployed more than half of their workloads in a public cloud, and 88% have deployed more than one-quarter of their workloads in a public cloud. That’s a lot of migration, optimization, and management on the cloud team’s plate; and it’s not just for right now but for the foreseeable future.

The Benefits of Structuring Logs in a Standardized Format

Image via Pixabay As any developer or IT professional will tell you, when systems experience issues, logs are often invaluable. When implemented and leveraged effectively, the data produced by logging can assist DevOps teams in more quickly identifying occurrences of problems within a system. Moreover, they can prove helpful in enabling incident responders to isolate the root cause of the problem efficiently. With that being the case, maximizing the value of log data is vital.

Azure, Microsoft 365, and the Monitoring Story

Learn how to make the Azure Monitor a valuable tool to monitor your Microsoft 365 subscription, including services such as Exchange, SharePoint, Teams, OneDrive, and more. See how the Metric Explorer in Azure Monitor will work for you, including addressing custom metrics quickly. Understand how to display Log Analytics metric data in PowerBI Reports and Azure Monitor Workbooks. Join this educative and fun session about Azure and Microsoft 365, united into one magnificent monitoring solution.

Monitor your CircleCI environment with Datadog

Datadog CI Visibility provides a unified platform for monitoring your CI/CD pipelines. Now, we are partnering with CircleCI to extend that same critical visibility to your CircleCI environment. Datadog’s integration uses CircleCI webhooks to capture information about the status and performance of your workflows and associated jobs, such as a job’s duration and whether or not it failed or was canceled.

A Proactive Approach To Holiday Season Monitoring

Big sales make up a huge chunk of eCommerce annual business, with shoppers having spent $10 billion plus during Black Friday 2020 alone. The right holiday can mean a big deal for your operations. However, with those windfalls come the breaks aimed squarely at crippling your devops pipeline. In many ways, waves of traffic are what you’ve been building for, but sudden bursts are difficult to test for and anticipate. The situation changes with the tides.

Ask the Citrix Expert How to Troubleshoot Citrix Issues for Remote Workers

In this “Ask the Expert” session we will be talking with Citrix CTP, George Spiers, who will answer your questions around: How can you prove root cause is due to a user's home WIFI, behavior, or an issue within the virtual infrastructure? How do you find root cause of slow logons or poor session performance? And how can you report on remote worker productivity?

Investigate an error with Rollbar

Investigate an error with the help of Rollbar, leveraging item details, traceback, suspect deploy, people tracking, and the rich contextual data available in the UI. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Intro to Netdata Overview for monitoring and troubleshooting your IT infrastructure

Learn how to get started monitoring and troubleshooting your entire Cloud infrastructure with Netdata. In this video, we'll show you how to utilize the Netdata Overview dashboard to gain visibility into the performance, availability, and health of your infrastructure from a single pane of glass. Navigate from a real-time, unified display of all your systems and applications to discover trends and gain better observability, then drill down by grouping metrics by node for root cause analysis.

Logging Ain't Easy ...But It Could Be

Real talk – logging tools aren’t exactly known for being easy to use, manage, or onboard. With the technology landscape continuously evolving, enterprises are often left scrambling to rapidly adapt and conform to new requirements of operating across hybrid infrastructures. It is in these ecosystems that logs are now more crucial than ever in the day-to-day operations of Ops teams.

Log Management 101: Log Sources to Monitor

Log management software gives the primary diagnostic data in your applications’ development, deployment, and maintenance. However, choosing the log sources to log and monitor could often be a daunting task. The primary cause of concern in monitoring all sources is the high price tag that many SIEM tools in the market charge based on the number of users and sources ingesting logs. At observIQ, we offer unlimited users and sources.

What Is Synthetic Monitoring?

Fully functioning sites, stores, and backend systems are essential for attracting customers, delivering services, and managing operations internally. Every organization’s systems and processes have a direct impact on its business, revenues, and reputation. IT performance monitoring should strive to ensure that the systems and processes of any organization operate correctly, round the clock.

What is RMM (Remote Monitoring Management)?

We’re gradually shifting more to working remotely. And, the number of companies embracing remote and hybrid workforces in the wake of the pandemic is a substantial proof. Nevertheless, several organizations do face varied challenges from welcoming these conventional IT solutions as switching to these initiatives tend to seem quite tedious.

The Urgency Driving AIOps into Your Enterprise

AIOps was once considered just a back-office fundamental, a solid suite of tools simplifying routine security and network monitoring tasks that primarily served the IT shop. The accelerated pace of digital transformation is changing that. Now, IT service and operations teams are in the spotlight and tasked with enabling business performance that help their companies provide seamless digital experiences and evolve in a fast-changing economic environment.

The 15 Best Container Monitoring Tools For Kubernetes And Docker

There are many tangible benefits to using containers for your computing needs. Containers help break large applications into smaller packages that are more agile, scalable on-demand, resilient, cost-effective, and less resource-hungry than monolithic apps or workloads running on traditional virtual machines (VMs) or bare metal servers. They also enable developers to develop applications in one environment, deploy them in another, and run them anywhere.

How to deploy a Node.js application to AWS Lambda using Serverless Framework

Being a developer is awesome. Writing code, solving problems, and thinking of ingenious solutions for complicated algorithms is what we live for. But, the grass is not always so green on this side of the fence. Sooner or later, you need to get your hands dirty and deploy the app you worked so hard on. Deployments are not always easy. To be blunt, they can be challenging and time-consuming. That’s what we’ll solve in this tutorial.

Top DevSecOps Tools For 2022

DevSecOps combines the responsibilities of development, security and operations in order to make everyone accountable for security in line with the ongoing activities conducted by development and operations teams. DevSecOps tools serve to assist the user in minimising risk as part of the development process and also support security teams by allowing them to observe the security implications of code in production.

InfluxDB Cloud and Telegraf for the Home Lab

Home labs are popular among technology enthusiasts. Often they are unmonitored and even the smallest home lab can benefit from monitoring. This post will show how getting started with an InfluxDB Cloud account and Telegraf can make this super easy! InfluxDB is an open source time series database. As such, InfluxDB is well-suited for operations monitoring, application metrics, IoT sensor data, and real-time analytics.

Tales From The Good/Bad Old Days Of Freelance Gigs

This week The Founders take a trip down freelancer memory lane and talk about the hot apps they built and which of them are still alive. They also cover NFTs, pivoting to private equity, and candy bar servers. Also, is "spider season" an official season in the Pacific Northwest?!?!? Click to listen now on the interwebs.

Explore Azure App Service with the Datadog Serverless view

Azure App Service is a platform-as-a-service (PaaS) offering for deploying applications to the cloud without worrying about infrastructure. App Service’s “serverless” approach removes the need to provision or manage the servers that run your applications, which provides flexibility, scalability, and ease of use. However, App Service also introduces infrastructure-like considerations that can impact performance and costs.

Best AWS monitoring management tool

Amazon Web Service or AWS has an immense cloud ecosystem now counting at over 200 services and products. Started with first generation services like EC2, S3, and RDS, now you can have satellite control centers, build virtual reality, and compose music using Artificial Intelligence. As the variety of services expanded, the necessity to monitor all those services efficiently became critical. Running a service without a proper monitoring system is no different from running while blind-folded.

Introduction to Custom Metrics in Java with Logz.io RemoteWrite SDK

We just announced the creation of a new RemoteWrite SDK to support custom metrics from applications using several different languages. This tutorial will give a quick rundown of how to use the Java SDK. This SDK – like the others – is completely free and open source, and is meant to apply to any output destination, not just Logz.io.

We Just Gave $154,999.89 to Open Source Maintainers

Sentry is an open source company. We started out in 2008 as a small open source side project, and we grew within the community for years before commercializing in 2012. We’ve worked hard to keep our full product as open source as possible, while scaling as a business. Considering our commitment to open source, we are grateful to be able to give back to the community (and what better time than during Hacktoberfest, amirite?). (P.S.

A look inside how the Prometheus Conformance Program works and why it's important

Prometheus is the industry standard in cloud native metric monitoring with hundreds of thousands of installations, millions of users, and billions in market value. Speaking as a member of the Prometheus team, we have seen the project become a victim of its own success. While most people may be using Prometheus, not everybody is following the same operating standards.

Configuring Kibana for OAuth

Kibana is the most popular open-source analytics and visualization platform designed to offer faster and better insights into your data. It is a visual interface tool that allows you to explore, visualize, and build a dashboard over the log data massed in Elasticsearch clusters. An Elasticsearch cluster contains many moving parts. These clusters need modern authentication mechanisms and they require security controls to be configured to prevent unauthorized access.

Modernizing Your IT Operations with a Secure Foundation

This is the first of a four-part security blog series covering why ScienceLogic is listed in the DoDIN APL catalog, what this means for monitoring critical IT infrastructure, and why APL certification is relevant for all organizations. Part one is all about trust and transparency—foundations for a secure platform.

The Best Tools for Monitoring Your Docker Container

It can be difficult to comprehend and successfully scale your services as modern orchestrated settings grow larger and more sophisticated. Container monitoring allows you to see the health and performance of your dynamic container infrastructure in real-time. Container monitoring is the practice of collecting and analyzing performance metrics to track the performance of containerized applications built on cloud-based microservices.

The Future of Observability with CEO Clint Sharp

Digital transformations, cloud migrations, and persistent security threats turned observability from a niche concern to an essential capability in today’s organizations. We’re still in the early days of observability maturity, but early stumbles point to where observability must go in the future. This talk discusses where observability is today and the three critical areas necessary for observability to deliver on its promises throughout the enterprise.

NPM, encryption, and the challenges ahead: Part 2

In part 1 of this series, I talked a bit about how encryption is shaping network performance monitoring (NPM). Let’s dive in deeper now… Most NetOps and DevOps professionals today hear complaints about network performance when employees work from home. Unless the complaint is coming from all remote users of an application, individuals suffering from slowness are on their own to figure out how to optimize connection speeds.

How to Make SQL Server Faster on Azure VMs

Many organizations have migrated their environments from on-premises to the cloud, and one of the clouds of choice is Microsoft Azure. For SQL Server workloads, organizations can use platform as a service solutions with features like Azure SQL Database and Managed Instance or infrastructure as a service solutions with Azure SQL virtual machines. Azure SQL virtual machines are an easy target for organizations migrating to Azure due to the simplicity of the migration.

How it all began... - StackState's origin story

It’s 2014. A major Dutch bank is struggling with performance problems in highly visible customer-facing applications. These performance problems are proving to be incredibly difficult to resolve. It’s not that there’s no monitoring data that could potentially help. In fact, there’s tons of it, all nicely displayed in pretty dashboard after pretty dashboard.

VMworld 2021: Automation, Elastic Edge, And The Increasing Importance Of User Experience

During this year's VMworld, we announced that our solution Catchpoint Digital Experience Monitoring is now also available for purchase on VMware Marketplace. It is easier than ever for our customers to access, deploy, and start using Catchpoint solutions to realize and achieve their business goals.

Announcing the Preview of Splunk APM's AlwaysOn Profiling

For application developers and service owners who build and troubleshoot modern enterprise software, resolving production issues requires identifying poor performance across multiple networks, operating systems, servers, configs, and third party dependencies. When the problem is the code itself, code profiling helps identify service bottlenecks by periodically taking CPU snapshots, or call stacks, from a runtime environment.

Getting Started with Telegraf

Telegraf is a plugin-driven agent for collecting, processing, aggregating and writing metrics and events. Telegraf ships as a single binary with no external dependencies that runs with a minimal footprint and a plugin system that supports many popular services. Telegraf is used to collect metrics from the system it runs on, applications, remote IoT devices and many other inputs. Telegraf can also capture data from event-driven operations.

Optimize your end user experience with a synthetic monitoring strategy

The digitalization of the end user experience is leading to increasingly rapid changes for both customers and business owners alike. Customers demand and depend on fast, reliable web services whether for informational purposes or to conduct e-commerce transactions. That’s why regardless of the device that is used, slow digital experiences aren’t easily forgiven. If they happen often enough, they will reflect poorly on your brand and will eventually impact your bottom line.

Nightmare on AWS street: mistakes we made and overcoming challenges of starting out with serverless

Tobi started building on AWS in 2018 for a big migration project for a very well-known German car manufacturer. Here's what he's learned from his mistakes from first starting out, which of his assumptions about building on serverless held true, and what came as a complete surprise? Learn more about building on serverless on our blog.

How to monitor a Ceph cluster using Grafana Cloud

Here at Grafana Labs, when we’re building integrations for Grafana Cloud, we’re often thinking about how to help users get started on their observability journeys. We like to focus some of our attention on the different technologies you might come across along the way. That way, we can share our tips on the best ways to interact with them while you’re using Grafana products.

Auto-Instrumenting Python Apps with OpenTelemetry

In this tutorial, we will go through a working example of a Python application auto-instrumented with OpenTelemetry. To keep things simple, we will create a basic “Hello World” application using Flask, instrument it with OpenTelemetry’s Python client library to generate trace data and send it to an OpenTelemetry Collector. The Collector will then export the trace data to an external distributed tracing analytics tool of our choice.

Who Moved My Code?

We’re excited to launch our new integration with GitHub that supports GitHub Enterprise Server customers. This allows companies using GitHub Enterprise on their own domains to access key features in Rollbar that help developers fix errors faster. GitHub Enterprise offers a fully integrated development platform for organizations to accelerate software innovation and secure delivery.

How to use metrics scopes in Cloud Monitoring

You've got Cloud Monitoring all set up in your project - but what do you do if you need to manage multiple projects and unify monitoring across them? In this episode of Engineering for Reliability, we look at Cloud Monitoring metrics scopes and show you how to use them to monitor multiple Cloud projects. Watch to learn how to use the Cloud Console to manage Metrics Scopes, view metrics from resources in multiple projects, and automate configurations using the API!

Installing Additional Modules in the Icinga Web 2 Docker Container

The Docker images we provide for both Icinga 2 and Icinga Web 2 already contain quite a number of modules. For example, the Icinga Web 2 image contains all the Web modules developed by us. But one of the main benefits of Icinga is extensibility, so you might want to use more than what is already included. This might be some third-party module or a custom in-house module.

How Honeycomb Is Using $50M in New Funding to Bring Observability to All

Today, we announced that Honeycomb has raised $50M in Series C funding, in a round led by Insight Partners and joined by all existing investors from our Series B. We’re using this investment to support the growth of our customers and community, ensure the benefits of observability can be realized by all engineering teams, and expand the ways we can better serve you.

The Chip Supply Conundrum Continues with No End in Sight

As the semiconductor chip shortage drags on, allegations and concerns continue to surface about chip stockpiling. The Biden administration continues to face resistance from lawmakers and executives in Taiwan and South Korea, further complicating efforts to resolve bottlenecks in the global chip supply that continues to plague industries from automobiles to consumer electronics.

Delivering Experience-First VDI - Q&A w/ Teodor Olteanu (Flutter Entertainment)

Desktop Virtualization solutions aren’t new to today’s organizations; countless IT teams have implemented some form of virtualization to provide better and more consistent experiences to employees. But during the rise of remote and hybrid working, virtualization has become even more ubiquitous. We recently sat down with Teodor Olteanu, Senior Delivery Manager, Workplace Technology Engineering at Flutter Entertainment, a leading global sports betting, gaming and entertainment organization.

The easiest ways to increase page speed on your website

Page speed is one of the most important factors in judging the performance of an individual webpage. This is a performance evaluation that is made not just by the user visiting your website, but by search engines as well. In recent years, consumers have come to expect virtually instantaneous results from their devices, and of the websites, they visit.

Top 13 Site Reliability Engineer (SRE) Tools

The role and responsibilities of a site reliability engineer (SRE) may vary depending on the size of the organization. For the most part, a site reliability engineer is focused on multiple tasks and projects at one time, so for most SREs, the various tools they use reflect their eve-evolving responsibilities. A typical SRE is busy automating, cleaning up code, upgrading servers, and continually monitoring dashboards for performance, etc., so they are going to see more tools in that toolbelt.

Announcing LogDNA Agent 3.3 GA: Improved Performance for Linux Support

We’re excited to announce the general availability of the LogDNA Agent 3.3, which introduces Linux and ARM64 support to our Rust Agent. This new support in our Rust Agent provides improved performance and enables a few features previously only available for our Kubernetes customers, such as various configurations within the Agent and the ability to run as a non-root user. Additionally, we have added in Prometheus Metrics that help provide insights into your Agent.

Microsoft Co-Sell Ready | M365 Monitor for Azure

NiCE has achieved the Microsoft Co-Sell Ready status with its Microsoft 365 Service monitoring solutions for Teams, SharePoint, OneDrive, and Exchange. Microsoft’s global network of internal and partner executives utilizes the Microsoft Co-Sell Ready status to identify Microsoft accredited solutions. Co-Sell Ready partners are a selected group of software vendors meeting Microsoft technical and business standards delivering optimized and secured solutions.

SolarWinds Gives Data Pros the Tools for DataOps Success With Database Mapper and Task Factory

Company enables data pros to accelerate data delivery and cloud migrations more efficiently; showcases database management solutions and educates on database strategies at upcoming industry events.
Sponsored Post

Australia (Oceania): Have You Checked Your DNS Performance Lately?

Any organization with an online presence understands the importance of its website. That’s why companies invest thousands of dollars in design, user interface (UI) development, and site optimization. But that’s only part of the puzzle. Your website or online application can have all the bells and whistles, but if you aren’t fully optimizing your DNS, you’re losing money—plain and simple.

Oracle Management Pack Update Release (21.9.2568.0)

We’re happy to announce a new update release of our Oracle Management Pack for SCOM. The 21.9.2568.0 release contains new features, fixes, and changes. On top of this, we have also released a new datasheet and a white paper that you can find more information about further down in this article.

Rollbar Pro Tips: Versions

Rollbar allows you to see which versions of your code are throwing exceptions. This is particularly helpful if you are continuously deploying your apps/services because you can see if recent occurrences are coming from the latest deployed version. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Rollbar Pro Tips: Deploy tracking

Notify Rollbar every time you deploy or release your app to unlock several features that will help your debugging process. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Discovering the Differences Between Log Observability and Monitoring

Log observability and monitoring are terms often used interchangeably, but really they describe two approaches to solving and understanding different things. Observability refers to the ability to understand the state of a complex system (or series of systems) without needing to make any changes or deploy new code.

Why your log management software may not give you the real Dashboard experience

Visualizing log data is one of the biggest perks of using good log management software. Data is many businesses’ most critical asset. But, without proper use, a business’ data becomes just an artifact and no longer an asset. Visualization and analysis are the end goals of collating log data from their sources. The need for visualization arises from the fact that we intuitively process visual information faster than a random jumble of numbers and letters.

Master Data Management and Data Governance-How to Build an Effective Strategy

As organizations marshal their data to drive business value throughout the enterprise, it’s becoming more important than ever to ensure the quality of this data. Building data quality, data validation testing, and aligning the data with your organization’s master data (such as products, customers, assets, and locations) is critically important to ensure your reporting is consistent and accurate.

Web Performance of the World's Top 50 Blogs

Am I the only one who thinks blogs were in their prime in the mid-2010s, before the internet began consuming content at wicked fast rates (thanks, Instagram and TikTok)? When someone says the word blog, I think of the chokehold 2015 southern American female fashion bloggers had on the internet. Pure pumpkin spice latte fashion. And though those blogs have, probably for the best, slipped away from the limelight, blogs are still alive and well.

Vendor Switching With OpenTelemetry (OTel)

You might already know that OpenTelemetry is the future of instrumentation. It’s an open-source and vendor-neutral instrumentation framework that frees you from the trap of using proprietary libraries simply to understand how your code is behaving. Best of all, you can instrument your applications just once and then take that instrumentation to any other backend system of your choice. This blog shows you exactly how to use OpenTelemetry to ✨break the vendor lock-in cycle.✨

OpsRamp, TietoEVRY Partner to Speed AIOps Adoption in Nordics

Managed service providers have to manage complex and diverse customer environments that typically include hybrid infrastructure and multiple monitoring feeds. They need to be able to discover and monitor these environments, correlate alerts from multiple systems into single events, and map business services to the infrastructure and application services that support them, building customer-centric dashboards for real-time service and application health in the process.

The Future of Sumo Logic Observability

I have always found data collection to be a fascinating area of work at Sumo Logic. Collecting data is a critical first step for all the solutions we develop for our customers. After all, to observe the health and performance of your applications, you must first collect all relevant data. It's also an area that has seen some significant activities by the open-source community over the years, which is completely changing the landscape of observability as we know it.

5 times domains have been hijacked

It’s a common belief that once we purchase a domain, it’ll be ours for as long as we like. Big mistake. Mainly because there are genuine threats to your domain online that mostly go unthought of. For example, hackers can gain access to your system and take your domain for ransom or cause malicious damage to you and your business. Surprised? Well, we have 5 examples of exactly when this has happened, and how hackers have managed to gain access to domains and cause mass disruption.

All of your questions answered about server monitoring

Here at StatusCake, we get asked a lot of questions about server monitoring, and more specifically, when we know your server has exceeded your set limits, for example. Understanding how we know can show you why server monitoring is so important to your website, and that’s why we’ve collated the most frequently asked questions so you can see for yourself!

Building Automated Monitoring with Icinga and SIGNL4

How many servers can be managed by one system administrator? This question is not an easy one since it definitely depends on the tasks that need to be operated. However, it´s quite clear that the amount of servers one engineer can manage has increased enormously over time, and is still growing. Yet, public and private clouds, combined with automation tools, enable us to automate many daily tasks. In a modern IT infrastructure almost everything can, and should, be automated.

IoT Data With LogDNA

Consider the following question: Why do most teams face pressure to rethink traditional logging and observability approaches? Asking this question to most engineers would likely result in answers centered on the challenges posed by microservices apps. Because microservices are more complex than monoliths and involve more moving parts, they require more sophisticated, granular log collection, correlation, and analysis.

11 Ways to Fix the ERR_CONNECTION_TIMED_OUT Error

When your internet connection times out and you can't access a specific webpage, it's one of the most annoying errors. The message "The webpage not available" ERR_CONNECTION_TIMED_OUT will appear on the screen. This error usually occurs when there is an issue with the internet connection and the website does not load. The name of this error can be seen in the notification on your screen.

Announcing Logz.io's New Data Parsing and Log Transformation Tool

We all know the importance of cataloging, organizing, and breaking down the data in your logs. That process, parsing, makes information easier to find and simplifies subsequent analysis. Now, with Logz.io’s upgraded self-parsing tool, custom parsing rules, and log organization is easier than ever. What’s important is parsing that data out correctly. The better parsed, the easier to query.

Grafana EMEA meetup recap: accessibility, k6 testing, and multi-DC observability stacks

On Oct. 5, we hosted the first Grafana Virtual Meetup for an EMEA-based audience. Each Grafana meetup features “bite-sized” presentations from our user community and members of the Grafana Labs team. We want to provide opportunities (even virtually!) for members of our community to connect with one another and share what they’re working on or have learned.

Google Cloud Monitoring 101: Understanding metric types

Whether you are moving your applications to the cloud or modernizing them using Kubernetes, observing cloud-based workloads is more challenging than observing traditional deployments. When monitoring on-prem monoliths, operations teams had full visibility over the entire stack and full control over how/what telemetry data is collected (from infrastructure to platform to application data).

Going Beyond Observability with Rollbar and Datadog

In this webinar, we explore some of the common objectives shared by users of both Datadog and Rollbar and how best to accomplish those goals. Datadog provides comprehensive observability covering a large swath of services and components, while Rollbar’s advanced intelligence and code improvement features help to make code insights more actionable and easily fixable.

Best Practices for Proactive Monitoring is Self-Service

Large financial organizations depend on monitoring teams to create the monitoring scenarios necessary for the organization’s business-critical apps. This can be a challenge, with hundreds or even thousands of apps to monitor with limited resources. This challenge is often compounded by the constant updating and changing of these apps by larger application teams.

Migrating from Epsagon to Dashbird

With over 200 products offered by AWS, when designing a solution, such as a micro-services based system using a number of these services at its core, it becomes rather challenging to not only monitor them but on the onset of a problem troubleshooting it and resolving it within the least amount of time becomes a daunting task. Building a monitorable system requires a deep understanding of the failure domain of the critical components, which is a tall order for a fairly complex system.

Compare monitoring tools: Epsagon vs Dashbird

As the world is increasingly reliant on technology, software developers, cloud architects, and DevOps practitioners bear a responsibility similar to that of the health industry or to airplane pilots, for example. In this reality, cloud monitoring isn’t optional, it’s a matter of being professional. What is optional, however, is what monitoring solution to go for. Obviously, the one that best fits your specific needs but which one is it?

5 Ways Cloud Costs Can Spiral Out of Control

As CFO of Virtana, I face many of the same challenges as every CFO of a SaaS or enterprise software company today: cost containment, surprises, and an ever-escalating AWS bill. We all need help keeping these things in check. These challenges become even more difficult when your organization goes hybrid cloud. Fortunately, there are tools out there to help our teams help us manage these costs.

How companies can prevent employee turnover in the 'great resignation' era

What do you like about the place you work? Now there’s a question that many people have very different answers for than they did two years ago. After all, many of the perks of working in an office are no longer being enjoyed by employees. The person who loved walking into their sparkling office building with the stocked kitchen and comfy chairs? They’ve been working from their bedroom for over a year now, left to stock their own kitchen.

Energy Regulators Driving Cloud-First Strategies in Race to Net Zero Carbon

Every government in the world is evaluating the steps necessary to radically reduce carbon emissions. The UK Government has been especially proactive, not just assessing these steps, but rolling out aggressive carbon-control strategies and legislation. Originally, the UK Government’s Climate Change Act 2008 set a goal of an 80 percent reduction in the country’s carbon emissions by 2050.

Tucker Callaway on the State of the Observability Market

Tucker Callaway is the CEO of LogDNA. He has more than 20 years of experience in enterprise software with an emphasis on developer and DevOps tools. Tucker drives innovation, experimentation, and a culture of collaboration at LogDNA, three ingredients that are essential for the type of growth that we've experienced over the last few years.

5 Examples of Metrics or Log Data That Drives Observability

Which data sources do DevOps teams need in order to achieve observability? At a high level, that’s an easy question to answer. Concepts like the “three pillars of observability”—logs, metrics, and traces—may come to mind. Or, you may think in terms of techniques like the RED Method or Google’s Golden Signals, which are other popular frameworks for defining which types of data teams should collect for monitoring and observability purposes.

ManageEngine captures top honors at GovTech Innovation Awards 2021

ManageEngine captured the Network Management & Monitoring Vendor of the Year honor at the GovTech Innovation Awards 2021 conducted by Tahawul Tech. Elitser Technologies, a regional distributor for ManageEngine, accepted the award on behalf of ManageEngine on September 27 in Dubai, UAE.

Website Monitoring for Holiday Shopping Seasons

The events of 2020 accelerated ecommerce sales. According to Adobe Analytics (analyzing website transactions from 80 of the top 100 U.S. online retailers), shoppers shelled out $10.8 billion online during Cyber Monday 2020 — a single day of shopping — for a 15.1% year-over-year increase. 2020 was just the precursor to 2021, which may actually warrant use of the word “epic”, making online shopping more appealing than ever before.

Three reasons to upgrade to SquaredUp SCOM Edition

If your organization uses SCOM, you are sitting on a treasure trove of juicy data. Wouldn’t it be a dream to be able to effortlessly leverage all that data via a native integration? You can easily do that with the SCOM Edition of our SquaredUp dashboarding suite. If you are using our free Community Edition at the moment, here are three reasons you should upgrade to SCOM Edition if you are dashboarding SCOM: Let’s break it down.

NUGGET 2021: Introducing Netreo Path Insight!

Path Insight simplifies how IT organizations deliver secure, optimal performance to all users by providing a visual, hop-by-hop analysis along critical network paths. Join Andy Markowitz, technical product manager at Netreo, as he does any early walk through of this soon-to-be-released feature for Netreo that can help your engineering and operations team deliver a great customer experience by pin pointing problems quickly and fixing those problems even faster.

NUGGET 2021: Business Transformation by Revolutionizing Your Monitoring

In this NUGGET Session, Matt Neifer, Director, Network Management Systems & Automation at Transaction Network Services (TNS) will share his firsthand experience in achieving business transformation through revolutionized monitoring practices. Matt will share examples of continuous improvement efforts in the formalization of the TNS product/service catalog, standardizing service implementation and delivery, and normalizing configuration management data structures within our CCMDB in order to automate the management of monitoring ecosystems.

Network AF, Episode 3: Uniting networking pros with Salesforce's Janine Malcolm

If you’ve ever thought networking is bewildering as a newcomer, you’re not alone. In episode 3 of Network AF, meet Janine Malcolm, the director of network engineering at Salesforce. She joins podcast host Avi Freedman to chat about some of the experiences she’s had throughout her career and how to make network engineering a more accessible profession. At Salesforce, Janine is currently focused on uniting groups of people as one overall network engineering team.

What You Should Know About Cloud Solutions For Real-Time Analytics

Experts from all industries admit the need to use and analyze data, especially generated in real-time. Therefore, business decision-makers need to know how these processes occur, under whose control they are and how to optimize them. But what technologies can help companies better cope with masses of data?

GSLB on NetScaler

This blog post is the first in a series highlighting actual questions asked to eG Enterprise during customer support calls and our answers to those. At first, it appeared that this customer had a simple active/passive web server set-up to provide failover resilience. However, it transpired that they were using Citrix ADC’s (was NetScaler) GSLB (Global Server Load Balancing) features.

11 Key Stats to Know if You Work in HR & Tech Support

If you are reading this, chances are you aren’t at your office. And if you are, that office is very different than how it once was. Heck—we’re all different now, we think about work differently, we interact with colleagues almost exclusively through our screens, not face-to-face. So if offices aren’t the same and we’re not the same, then the teams that support us shouldn’t be the same either, right?

Windows 11 is here. Are you ready to deliver a new experience?

After much anticipation, Windows 11 has finally arrived, taking over as Microsoft’s primary Operating System. With a modern, sleek aesthetic and rearranged start menu and taskbar navigation, the new OS has caught the eyes of the members of the global workforce who will be using it daily very soon (or, at least, before Windows 10’s 2025 end-of-life date). But it’s also caught the eye of a group of people who are a bit more anxious about this change: IT professionals.

The Power Couple: How HR & IT Can Prevent Employee Turnover & Burnout

Forget the vanity metrics and the steady but painfully slow progress plans. HR professionals want to meet the demands of modern employees and rise to the challenge of a hybrid workforce by making a lasting change right now. That’s the dream, right? A happy employee stays your employee. A happy employee gets work done. A happy employee makes your company profitable. If we all had happy employees, we’d be happy.

Hook Relay Launched! Was it Fireworks or Crickets?

This week the Founders recap the initial Hook Relay launch and cover things they learned along the way. Also discussed is if developers will struggle to find purpose if products like Hook Relay make their lives too easy. Lastly, do you remember the days of converting PSDs to HTML? Tune in and prepare for launch!

Rollbar Pro Tips: Service links

Rollbar is not the “last stop” for monitoring and debugging - often you might also be interested in what your other monitoring and performance tools are telling you. Those tools were a few clicks and copy-pastes away - now with Service Links, they are just 1 click away. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Rollbar Pro Tips: Code context

By enabling Code context, Rollbar can show you additional lines of context for each entry in a traceback, saving your trouble of jumping to your source code to figure out where exactly an exception occurred. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Add More Metadata to Your Front-end JavaScript Errors in AppSignal

Our front-end JavaScript library has been updated with an easier way to add more metadata to front-end errors using the sendError and wrap helpers. Previously, sendError and wrap helpers only supported customizing tags and the namespace for the error. More information could be set on error spans if they were manually created, but now that type of information can be added to errors sent using these helpers.

A Primer on Prometheus Metrics

Keeping an eye on metrics is one of the essential things that every developer should do for their applications. Prometheus is a tool that has gained a lot of popularity in the field of metrics. When you start with Prometheus, you may not recognize its potential at first because it takes time to explore the entire tool. But once you become acquainted with the features, you will find it highly useful.

Achieving 86% Productivity Gains Through ITSM Automation

Manual incident management is an enormous challenge facing today’s enterprises. It wastes time and money, and often results in unhappy customers who have to deal with unreliable services because of persistent, unresolved issues. Manual ticket generation can take 20 to 30 minutes, and routing another 90—assuming the ticket is delivered to the right team.

WordPress Error Logs and Activity Logs

Logging is a fundamental part of software development. While an app is being developed, we rely on logging to confirm our inputs and outputs match our expectations. In production, logging can be an invaluable resource for tracking down bugs or measuring how users interact with the app. We can also consider logs as a sort of time-series value, where a timestamp is associated with a user’s specific action. These logs can be structured, gathered, and analyzed to provide teams with more information.

The Magic of Metrics-and How It Can Burn You

As product developers, our responsibility continues beyond shipping code. To keep our software running, we need to notice whether it’s working in production. To make our product smoother and more reliable, we need to understand how it’s working in production. We can do this by making the software tell us what we need to know. How can we notice when the software is running smoothly? Make it tell us!

Asia Pacific Firms Need Analytics to Survive the Cloud Era

Many companies in Asia Pacific (APAC) were caught in a digital tailspin when Covid-19 hit, sacrificing security practices in their hurry to adjust to the new reality of remote work. Two years on, hybrid work is still the norm as the pandemic continues and seems to be a new way of life moving forward. Catalyzed by the coronavirus, firms big and small are now adopting cloud technologies as we tread deeper into a new data age.

Network monitoring tools: The good, the bad, and the ugly

For organizations that rely on their network to support their daily operations, how well the network operates can make or break their business. As a network admin, your core objective is to ensure that day-to-day business operations are carried out successfully. This involves optimizing the network for maximum performance and minimizing service disruptions.

Introducing Grafana Machine Learning for Grafana Cloud, with metrics forecasting

At GrafanaCONline in June, we talked about the future of machine learning at Grafana Labs. Four months later, we are excited to introduce Grafana Machine Learning for Grafana Cloud, with our metrics forecasting capability. It’s available now to all customers on Pro or Advanced plans. If you’re not already using Grafana Cloud, you can sign up for a free 14-day trial of Grafana Cloud Pro here.

CDN Logs and Why You Need Them

A Content Delivery Network (CDN) is a distributed set of servers that are designed to get your web-based content into the hands of your users as fast as possible. CDNs produce CDN logs that can be analyzed, and this information is invaluable. Why? CDNs host servers all over the world and are designed to help you scale your traffic without maxing out your load balancers. A CDN also gives you added protection against many of the most common cyber attacks. This activity needs to be closely monitored.

How we're building a production readiness review process at Grafana Labs

Production readiness review (PRR) is a process that originated at Google, described as the first step of site reliability engineering engagement in the company’s famous SRE book. The idea of thoroughly reviewing a product before handing over the pager is a really good one, but except for Google-scale companies, there aren’t that many organizations that can afford dedicated SRE teams.

Webinar: How Medtronic Monitors a Billion Lambda Requests a Month

Jeff Barr AWS Chief Evangelist, Daniel Modlinger Medtronic Director of Data & AI, and Erez Berkner Lumigo CEO discuss serverless observability at scale and review how Medtronic's Artificial Pancreas Project runs a billion AWS Lambda requests a month while maintaining high-availability. Watch and learn how to scale your serverless workloads.

NPM, encryption, and the challenges ahead: Part 1 of 2

It’s interesting to observe how encryption and network performance monitoring (NPM) have evolved over time. When I first entered the networking industry right out of college, many applications sent passwords over the network in clear text, unencrypted. Since just about everyone’s PC was wired back to a repeater (i.e., not a switch), we could observe each other’s traffic with free packet analyzers and laugh.

How to Prepare for a SQL Server DBA Interview and Questions

So, you have an interview lined up for a sweet new gig as a SQL Server database administrator (DBA). What interview questions will you be asked? How can you make sure you ace the interview? What will make you stand out from the other candidates? There are no concrete answers, because… “it depends.” However, you can count on at least two major components of your interview – a technical component and a non-technical component, often focusing on soft skills.

Polishing the Icinga DB Web User Interface

When redesigning the new Icinga DB Web interface elements we already started establishing consistent design elements. This is even more supported by developing the Icinga PHP Library (IPL) from the ground up. IPL makes developing reusable widgets a lot easier for developers. For the release of Icinga DB Web RC2 we’re going the extra mile to polish many of our user interface elements.

What is Splunk? - A Summary for UK Public Sector

To quote the UK National Data Strategy: Splunk is an advanced data platform that delivers right-time analytics from diverse data sets and that enables organisations to ask questions of all their data. It can be used to mitigate cyber security risk, improve performance, increase reliability and observe what is happening in the cloud.

Smooth Transitions - Moving the Right Employees from On-Premises to Cloud Desktops

As the Senior Delivery Manager for Flutter UK&I, I’m tasked with overseeing the digital experience for roughly 7,000 employees across Europe. Ultimately, my team and I are responsible for ensuring things like server hosting, Office 365, SSO, Azure and other digital work components function in a frictionless manner for our employees—both in office and remotely. That’s the underlying philosophy, at least.

TL;DR InfluxDB Tech Tips - Creating a Telegraf Configuration with the InfluxDB UI

The InfluxDB UI offers a wide variety of features for time series analysis, data lifecycle management, and time series visualization. The InfluxDB UI also shines when it comes to onboarding new users, whether they’re an InfluxDB OSS or free tier InfluxDB Cloud user. The InfluxDB UI allows you to easily leverage Telegraf, a plugin-driven collection agent for collecting, processing, and writing metrics and events.

Why everyone should be using a website monitoring tool right now

It’s easy for us to cruise the net, not really thinking of anything other than exactly what we’re hoping to achieve whether it’s making a payment via online banking or buying a new laptop on Black Friday. But we forget that there’s a whole system working in the background of each and every website, keeping it online and making sure it works to the best of its ability.

Announcing the Control API Suite

As LogDNA has grown, many of our customers have too, meaning that they are bringing in more ingestion data sources and expanding their use cases for their logs. To help with managing more data, we’re excited to introduce the Control API suite. We’ve built 4 individual APIs that will help companies programmatically configure their data and how they want to ingest logs. Below, we’ll cover each new API in detail as well as why they are massively impactful for our customers.‍

Should you care about AIOps? Obviously.

There's a lot of hype in the marketplace about AIOps right now, and there's a lot of people who've got some interesting ideas about what it should be. The most common idea that I hear is that it's essentially a layer of AI magic that sits across everything that you've got in your IT tooling today and then make sense of all of that for you and then we'll decrease the number of incidents you have and reduce your MTTR...

Event and Log Management for Optimized Security and Performance

The full stack isn't just cloud-based, microservices apps, but includes on-premises and hybrid private cloud infrastructure and packaged applications. The challenges associated with aggregating, analyzing, reporting, and alerting intelligently on logs have become more complex than ever due to the acceleration of packaged and customized application deployment in support of business transformation, alongside the growing requirements needed to ensure security and compliance. This webinar will explore multiple methods to ensure compliance, identify threats, and optimize MTTR by monitoring, analyzing, and managing logs across all types of application and infrastructure architectures.

5 Business Insights to Gain From APM

With soaring demand on digital infrastructures, increasing deployment of B2B and B2C custom applications, and ever-tightening budget constraints, it’s more important than ever to ensure service delivery in a cost-effective manner. Delays experienced by your business partners and customers mean higher costs, lost revenue, and lost market share at a time when you can least afford it. In effect, slow is the new down.

New in Grafana 8.2: Test contact points for alerts before they fire

Grafana 8.2 was released last week, and we’re excited to announce one of its new features: contact point testing. Now users of Grafana 8 alerting can test their contact points right from the contact points page. This feature makes it easier to configure Grafana 8 alerting and gives you the confidence in knowing that your contact points are working as expected before they fire. Here are the basics.

Global Manager: Operate Complex IT Environments Across SL1 Stacks In a Single View

Customer Experience is at the heart of ScienceLogic product design, especially making the power of the SL1 platform easy to use. The latest release brings a slew of user interface (UI) and workflow improvements, as well as significant enhancements to the SL1 Global Manager. In this video you’ll learn how Global Manager allows teams to consolidate operations across multiple environments a single dynamic view. You’ll also discover how supporting distributed operations, regional data homing to support regulatory requirements like GDPR, multitenancy, and specialist team support is easier than ever with ScienceLogic SL1.

Resolve Customer Cases Faster with Automated Case Management

When problems occur, IT Operations must often address a recurring challenge first: wasted time. Before they can resolve the actual issue, they must manually collect critical diagnostic data from a variety of tools, open tickets, and finally route fully populated incidents details to the correct team for resolution. These delays extend service outages, increase operational costs, and worst, frustrate customers. In this video you’ll learn how ScienceLogic SL1 shrinks MTTR by automatically opening tickets, populating critical troubleshooting details, and route them to the right person at the right time.

Troubleshooting Pod issues in Kubernetes with Live Tail

With the advent of IaaS (Infrastructure as a service) and IaC (Infrastructure as Code), it is now possible to manage versioning, code reviews, and CI/CD pipelines at the infrastructure level through resource provisioning and on-demand service routing. Kubernetes is the indisputable choice for container orchestration.

Welcoming Newcomers to the Networking Industry with Janine Malcolm | Network AF Podcast Ep. 3

Today's conversation is with Janine Malcolm, Director of Network Engineering at Salesforce. Janine takes us through her journey to get to where she is today and how she became interested in networking itself. Not only do they get into the nitty-gritty of networking, but also what you can do to get into the industry and why having a college degree isn't always necessary. Janine shares how we can make this space more welcoming to newcomers and advice on how you can start learning more and get your career going.

Future Trends & Technology for Your Integration Infrastructure

Integration is now the #1 IT expense category at many enterprises and new complexities increase the burden on Service Delivery, CI/CD, IBM MQ administration, and other “integration professionals” every day. Your enterprise has Microservices, Mobile, Mainframes, Cloud, and more applications and application updates than you can count and it takes the routing of transactions, messages and more through a rapidly growing integration infrastructure layer to make it all work together.

How to Quickly Identify Performance Issues in Azure SQL Database

Moving your database to the cloud using a PaaS option such as Azure SQL Database or Azure SQL Managed Instance reduces the maintenance overhead required from the database administrator (DBA). The DBA no longer must worry about managing backups or configuring high availability, for example. That said, the DBA still needs to tune the database’s performance and monitor SQL workloads. The Azure Portal offers several services to quickly identify performance issues for Azure SQL Databases.

Don't let Prometheus Steal your Fire

Prometheus is an open-source, metrics-based event monitoring and alerting solution for cloud applications. It is used by nearly 800 cloud-native organizations including Uber, Slack, Robinhood, and more. By scraping real-time metrics from various endpoints, Prometheus allows easy observation of a system’s state in addition to observation of hardware and software metrics such as memory usage, network usage and software-specific defined metrics (ex.

Monitoring as a service, here we come!

On the way to perfecting its services, Pandora FMS launches one of the most advanced and complete solutions in its history as monitoring software: Monitoring as a Service (MaaS). As we all know by now, Pandora FMS is a software for network monitoring that, among many other possibilities, allows visually monitoring the status and performance of several parameters from different operating systems (servers, applications, hardware systems, firewalls, proxies, databases, web servers, routers…).

How Time Series Databases Work-and Where They Don't

In my previous post, we explored why Honeycomb is implemented as a distributed column store. Just as interesting to consider, though, is why Honeycomb is not implemented in other ways. So in this post, we’re going to dive into the topic of time series databases (TSDBs) and why Honeycomb couldn’t be limited to a TSDB implementation. If you’ve used a traditional metrics dashboard, you’ve used a time series database.

AIOps Summit Recap: Why Observability is Key, and How to Get There

“It’s not rocket science.” In the past, we’ve all heard that statement made. Quite often, it’s applicable. It’s true we can overthink or unnecessarily overcomplicate matters. Don’t tell that to someone who’s responsible for network performance and continuity today, however.

6 All Too Common Network Security Hacks Your Team Should Know About

As an IT pro, you’re probably used to doing the heavy lifting when it comes to network security. You might even find your team responsible for educating the rest of your company on best practices when it comes to network common security hacks and how to prevent them. Today, we’re here to lighten that load a little.

Illuminate 2021 - Embracing open standards for big picture observability

We just wrapped up a fantastic 5th Illuminate, Sumo Logic’s user conference, filled with amazing customer speakers, partners, and Sumo Logic experts all sharing their insights and expertise. The level of engagement taking place during presentations, workshops and executive meetings showed the high level of interest in open telemetry, unified analytics and full-stack observability to solve the challenges inherent in application modernization and cloud migration.

How to Pivot Your Data in Flux: Working with Columnar Data

Relational databases are by far the most common type of database, and as software developers it’s safe to say that they are the kind of database most of us got started on, and probably still use on a regular basis. And one thing that they all have in common is the way they structure data. InfluxDB, however, structures data a little bit differently.

Webinar - Beyond Selenium, Synthetic Monitoring for the 2020s

Monitoring the Impossible What if a solution could look for user interface elements based on their appearance and interact with them via mouse and keyboard events, just like a person? Understand the challenges and limitations of Selenium and how you can overcome them with 2 Steps technology in this webinar. The first instalment in the Monitoring the Impossible Webinar series was brought to you by 2 Steps.

Network monitor 101: All about network monitors

With the increase in the number of organizations that have a sizable network infrastructure, the need for a network monitor also increases substantially. With properly established network monitoring protocols in place, organizations can work more efficiently, have more visibility over their network devices, and avoid unexpected downtimes and network issues.

Introduction to Go Custom Metrics with Logz.io RemoteWrite SDK from Logz.io

We recently announced the release of our RemoteWrite SDK to support custom metrics from applications using several different languages – The first SDKs allow shipping of metrics from Golang (Go), Python, Java, Node.js, and.NET. This tutorial will cover the Golang SDK. The SDKs cover not just Logz.io, but can be used by any platform that supports the Prometheus remote write endpoint.

What is Magento Performance Monitoring?

When it comes to creating an online business, Magento is one of the greatest options. Magento is used by 31.4% of the top 100,000 eCommerce stores, according to data. 15 percent are utilizing Magento's community edition, and 5% are using Magento's enterprise edition, which is also known as Adobe Commerce. Magento offers a wealth of features, out-of-the-box functionality, and customization options to help businesses attract and convert more leads.

Understanding the Three Pillars of Observability

Observability and its implementation may look different to different people. But, underneath all the varying definitions is a single, clear concept: Most software that’s run today uses microservices or loosely coupled distributed architecture. While this design makes scaling and managing your system more straightforward, it can make troubleshooting issues more difficult. The three pillars of observability are different methods to track software systems, especially microservices.

How To Monitor Network Traffic

In today’s digital world, almost every business is driven by technology, whether big or small. With the advancement of technology, the associated threats to the security of the IT infrastructure come hand in hand. Therefore, businesses need network traffic analysis tools to keep an eye on their network. Monitoring network traffic continuously gives a business the insight to optimize and manage performance, minimize attack surfaces, intensify security, and better manage its resources.

Network AF, Episode 2: Backbone engineering and interconnection with Nina Bargisen

In episode 2 of Network AF, meet Nina Bargisen. Nina has spent over two decades in network engineering and talks to podcast host Avi Freedman about her history in interconnection and peering. She’s worked for companies like Netflix and TDC (formerly Tele Danmark Communications). Now Nina is Kentik’s director of GTM strategy focused on supporting service providers. Nina joins Avi on the Network AF podcast to discuss: Nina’s entry into networking was non-standard.

The benefits of hybrid cloud for financial institutions

Financial institutions and their service providers have traditionally owned and run their technology infrastructure on their own premises or in their own data centres. However, that’s changing with the advent of powerful public cloud services, such as Amazon Web Services, Google Cloud and Microsoft Azure.

How to check NFT supply with AWS Lambda?

Non-Fungible Tokens, or short NFTs, are all the rage right now. Everyone and their pets are starting an NFT project. Some people got rich from using NFTs; others did not. Some say it’s the savior that will rip the power away from big corporations and give it back to the creators; others say it’s just a giant pyramid scheme.

Why Hybrid, Multi-Cloud Visibility Is Everybody's Problem

When you have workloads running in a hybrid, multi-cloud environment, it’s hard to get a unified view of your entire infrastructure. In fact, Virtana’s recently published State of Hybrid Cloud and FinOps survey reveals that only 36% of respondents said they have comprehensive, unified visibility and management capabilities across all their public clouds, leaving more than two-thirds (68%) with less-than-ideal conditions for managing their multi-cloud infrastructure.

Resolve AWS Lambda function failures faster by monitoring invocation payloads

In a serverless application, AWS Lambda functions are typically invoked by JSON-formatted events from other AWS services—like API Gateway, S3, and DynamoDB—and respond with JSON-formatted payloads. Having visibility into these function request and response payloads can provide context around your function invocations and help you uncover the root causes of Lambda function failures.

Making data accessible with sound, a Grafana Labs Hackathon project by Kostas Pelelis

We learned from a visually impaired astronomer that it was possible to use sonification to understand astronomical spectra. So during a hackathon at Grafana Labs we decided to turn time series into audio, and add sound to our alerting systems too. Kostas Pelelis is a Software Engineer at Grafana Labs living in Greece.

Bootstrapping a multi DC cloud native observability stack by Bram Vogelaar

An introduction to Observability and how to setup a highly available monitoring platform, across multiple data centers. During this talk we investigate how to config a monitoring setup across 2 DCs using Prometheus, Loki, Tempo, Alertmanager and Grafana. Bram Vogelaar spent the first part of his career as a Molecular Biologist, he then moved on to supporting his peers by building tools and platforms for them with a lot of Open Source technologies. He now works as a DevOps Cloud Engineer at The Factory.

Tales of A11y In Grafana OS: Introducing Pa11y CI into our pipeline by Alexa Vargas

We want to make Grafana accessible to everyone! In this talk, Alexa will share how Grafana recently introduced Pa11y CI into the Grafana Continuous Integration pipeline. The library supports our developers and contributors to highlight a11y issues. And more importantly, it acts as a gatekeeper, stopping new A11y issues from making it into the project. You will additionally hear about the alternatives that were considered and their challenges. This talk will have everything!

Using Thanos to gain a unified way to query over multiple clusters by Wiard van Rij

When using Thanos on top of Prometheus we can leverage this for a unified way in a single data source to query all our data across multiple clusters, servers and Prometheis. Wiard van Rij is an Engineer at Fullstaq helping people, teams, and organizations with various cloud-native challenges with a strong focus on Kubernetes and Observability. Wiard is a Thanos team member, open source enthusiast and has extra fun with security and hacking.

10 Best Linux Monitoring Tools and Software to Improve Server Performance [2021...

Linux is one of the most popular operating systems today, powering a large portion of the Internet. According to W3Techs, almost half of today’s top-ranked 1 million websites currently run on Linux systems. So, if you want your site—and the application(s) running on it—to be high-performing with lots of uptime, you need to ensure the availability and reliability of your Linux-based servers.

How Monitor Google Cloud Interconnect & Network Performance | Obkio

How to Monitor Google Cloud Interconnect and Network Performance Google Cloud Interconnect promises data transfers with low latency, and high availability - but how can you make sure that it’s actually performing as promised? Monitoring Google Cloud performance is the key to identifying slowdowns, high levels of packet loss, and other problems affecting Google Cloud. Measuring and monitoring is the first step to troubleshooting network problems.

Rollbar Pro Tips: Item searching and filtering

On the Items view, you can filter your Items by many different properties. Some properties are direct properties of the items themselves, while others are evaluated against the occurrences of the item. Many more search options are available via the text box. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

What Is Kubernetes Pod Disruption?

Kubernetes pods are the smallest deployable units in the Kubernetes platform. Each pod signals a single running process within the system and functions from a node or worker machine within Kubernetes, which may take on a virtual or physical form. Occasionally, Kubernetes pod disruptions may occur within a system, either from voluntary or involuntary causes.

Monitoring and Securing Office 365 | Government and Education Webinar

Learn how SolarWinds solutions can help you monitor Microsoft Office 365® (O365) performance and security. As organizations adopt O365, their IT team maintains responsibility for ensuring effective operations, so users can continue to collaborate. Moving to the cloud still requires management, and IT pros need to maintain visibility to understand and resolve problems. Access rights and permissions are also important in securing your O365 implementation.

The Role of Fintech In The Financial Industry

With the rise and advancement of technology and artificial intelligence, there is no question that new rationalists are joining this group to enforce their ideas. However, most Rationalists and highly successful people do not hold back anything, and maybe this is because the technological and financial industry embraces disruptors of the norm. One rationalist concept adopted by the technology and financial sector is Fintech.

Daily SQL Server Performance Checklist for DBAs

A main focus for database administrators (DBAs) is to ensure server environments are optimized and performance is at its peak. Whether you’re a DBA starting in a new role and are evaluating an existing environment for the first time, or you’re a senior database professional with the ongoing task of maintaining optimal performance, following key SQL Server best practices for installing, configuring, and ensuring new instances of SQL Server are consistently deployed is all in a day’s work.

Incident Review - An Account Of The Telia Outage And Its Ripple Effect

Another major outage on the Internet has taken place today. Telia, a major backbone carrier in Europe, suffered from a network routing issue between 16:00 and 17:05 UTC. This had a huge ripple effect, causing issues for multiple key companies providing critical cloud and infrastructure services. Companies affected include: - Google Cloud - Equinix Metal - Cloudflare - Fastly - NS1 It’s always arresting to see the secondary and tertiary effects that a major outage can have.

Top 5 APM Tools to Keep Your Application Healthy

Developing modern applications is harder than ever, with microservices and cloud deployment models making it harder to get things working than ever before. However, anyone who’s deployed an application knows that that’s just the beginning of the work. The biggest part comes later: ensuring it works correctly, with maximum efficiency and great performance.

6 Website Monitoring Metrics Every Business Should Track

Your website is critical to your business. It is your digital shopfront where you engage with the world and turn prospects into customers. To convert the maximum number of people, you need to monitor your website to ensure an optimal experience. This guide will discuss six monitoring metrics that every business should track. By monitoring these metrics, you can ensure that you're providing the best experience to site visitors. Let's jump into this guide.

How to choose the best IT infrastructure monitoring tool for your business

Business growth can be measured by the performance of IT infrastructure, which is an integral part of any organization. According to research done by Riverbed, 73% of C-suites say that granular visibility into the network and other components opens the door for business innovation. Simply put, fostering the growth of business health lies in enhanced visibility and well-organized management of IT infrastructure.

Filter dashboards faster with template variable available values

Datadog’s template variables help you quickly scope your dashboards to specific contexts using tags, so you can visualize data from only the hosts, containers, services, or any other tagged objects you care about. This helps you build more flexible dashboards so you can access the insights you’re looking for as quickly as possible. We’re proud to announce new features for the template variable workflow that enable you to make highly dynamic, shareable dashboards more efficiently.

You can now customise how we handle redirects

A good deal of sites redirect visitors to a specific more relevant page. Think, for instance, of a site that redirects you from / to a page in a relevant language, for instance /en or /nl. A single redirect is often not a problem, but having multiple redirects in a chain can hinder your site's user experience. Modern browsers also limit the number of redirects.

Comparing Rails Application Performance Monitoring Tools

Monitoring an application’s performance is the basics of building a successful software product. With the popularity that Rails has always been riding on in the start-up world, it makes all the more sense to look for tools that help you keep your Ruby on Rails application in shape. In this guide, we will look at some of the top APM tools for Rails applications and compare them along some standard benchmarks to help you get an insight into which tool fits your use case the best.

Trigger a Kubernetes HPA with Sysdig metrics

In this article, you’ll learn, through an example, how to configure Keda to deploy a Kubernetes Horizontal Pod Autoscaler (HPA) that uses Sysdig Monitor metrics. Keda is an open source project that allows using Prometheus queries to scale Kubernetes pods. In Trigger a Kubernetes HPA with Prometheus metrics, you learned how to install and configure Keda to create a Kubernetes HPA triggered by a standard Prometheus query.

Trigger a Kubernetes HPA with Prometheus metrics

In this article, you’ll learn how to configure Keda to deploy a Kubernetes HPA that uses Prometheus metrics. The Kubernetes Horizontal Pod Autoscaler can scale pods based on the usage of resources, such as CPU and memory. This is useful in many scenarios, but there are other use cases where more advanced metrics are needed – like the waiting connections in a web server or the latency in an API.

What's New With Pingdom?

SolarWinds Pingdom is focused on making web application monitoring simple and easy to use, yet still powerful and affordable. In the past six months, we’ve made several updates we’re excited to share with you. These new features come at no additional cost and all customers regardless of tier level have access to them. The updates were launched with two main benefits in mind: ease of use and quick set up. We’ll walk through our latest updates from 2020 including.

Modernizing Network Operations: How to Check All the Boxes

Stakeholders expect their network to be fast, prioritized for business, secure, compliant, and cost-efficient. With increased traffic, applications, and services and the complexity of networking technologies to support business needs, companies need a strategy to efficiently monitor and manage the network to quickly respond to business requirements.

Understand the End-User Experience With Holistic Monitoring

You’re expected to monitor the entire end-user experience, so you need to be able to do it all. SolarWinds can help. Together, SolarWinds Web Performance Monitor (WPM) and SolarWinds Pingdom can give you the holistic monitoring you need, especially with the recent integration we’ve announced. While WPM monitors the experience of the end users behind your firewall, Pingdom monitors the customers and prospects on the outside of it. With WPM and Pingdom, you can have a holistic view of the performance and availability of your various applications. We’ll show you the benefits of adding SolarWinds Pingdom, including the following.

Grafana 8.2 released: Dynamic plugin catalog, new fine-grained access control permissions, and more

Grafana 8.2 is here! This release marks the start of our work focused on measurable improvements to Grafana’s accessibility — part of our continuing mission to democratize metrics for everyone. The initial changes to Grafana in 8.2 are focused on navigation, with more to come. We’ll be sharing more about our accessibility roadmap in an upcoming blog post.

Automate, Group, and Get Alerted: A Best Practices Guide to Monitoring your Code - Part 2

Missed part one? Check out the full guide here. As companies grow, so do their products, teams, and the number of external tools. For engineers, that can mean code sprawl, data silos, notification fatigue, and some “what the…?” moments along the way as they try to make sense of it all.

PagerDuty Integration Spotlight: Honeycomb

Honeycomb delivers observability for modern engineering and DevOps teams to observe, debug, and improve production systems efficiently. The PagerDuty + Honeycomb integration uses Honeycomb Triggers to notify on-call responders based on alerts sent from Honeycomb. This integration is maintained and supported by Honeycomb. Liz Fong-Jones from Honeycomb joined us live on Twitch to share more about how Honeycomb and PagerDuty can be used together to help your teams and to do some live investigation into Honeycomb’s own performance data.

Logs for Ops

The evolution of machine data and logging in general has shifted multiple times over the last couple of decades. The log began with Unix and was rooted in command line actions like tail or grep. It evolved from system-based logs to application-based logs and eventually became more UI-friendly and readable. Not only has the log itself evolved, but the purpose of the log and audience for the log has morphed over time as well.

Dashboard Fridays

We are excited to announce a new community initiative – Dashboard Fridays. Dashboard Fridays is a bite-sized video series where we share and discuss a range of different dashboards created for the community, by the community. Each video is no longer than 20min, so grab a coffee and let’s talk dashboards! Each episode, we will zoom in on one stellar dashboard put together by a member of the community.

Facebook Outage Underscores Need for Real-Time Monitoring

On October 4, Facebook and its family of apps, including Instagram and WhatsApp, suffered a global outage of its services that lasted approximately six hours. The massive outage has been blamed on configuration errors in backbone routers that are used to connect network traffic to the company’s data center. Facebook apologized to its 3.5 billion users who were unable to access any of the company’s services during the downtime.

Get started and keep using AWS for free

Getting started with AWS and adding your credit card to your own account feels scary, but there are ways to get free credits so you can sleep better in the beginning. In this article, we’ll cover some tricks and tips to get started and keep using AWS for free. Stepping into some new terrain is hard. This is already true if it’s only about learning something new.

Driving Data Innovation With MLTK v5.3

Many of you may have seen our State of Data Innovation report that we released recently; what better way to bring data and innovation closer together than through Machine Learning (ML)? In fact, according to this report, Artificial Intelligence (AI)/ML was the second most important tool for fueling innovation. So, naturally we have paired this report with a new release of the Machine Learning Toolkit (MLTK)!

Monitor the impossible with 2 Steps - Testimonial with Citrix

Having trouble monitoring your Citrix applications? Are you limited by Selenium testing? Automate and measure workflows and user journeys Find all your performance data within Splunk With 2 Steps Synthetic Monitoring"2 Steps is a Citrix Ready Partner providing valuable synthetic monitoring capabilities for applications being accessed via Citrix.” John Panagulias Director Citrix Ready Citrix
Sponsored Post

Application Logging in 2021

Have you ever written a Hello, World! application? In most of these tutorials the first step is to log words to the console. It's an easy way to understand what is going on with your application and readily available in every programming language. The console output is incredibly powerful, and it has become easier than ever to capture that output as logs. As your application grows and evolves you need to implement a structured application log approach.

Facebook outage highlights need for DNS monitoring

If you were one of billions of frustrated users of Facebook who weren’t able to access their accounts Monday, rest assured in knowing that downtime is a thing of the distant past and the mega-social media platform is back online. End users can now relax knowing that the brush fire has been extinguished. Remarkably, the nearly seven-hour outage could not be attributed to the deluge of recent high-profile attacks on government, enterprise, and educational servers throughout the world.

Rollbar Pro Tips: Merging Items

Item merging allows you to combine multiple items into one 'group' for easier management and more accurate metrics. All past and future occurrences of any merged items will automatically be combined. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

5 things to look for in a third-party monitoring tool

A key finding from Redgate’s recent State of Database Monitoring Survey of over 2,500 IT professionals was that 79% of respondents reported using either a third-party or in-house monitoring tool. It’s notable because it’s an increase of 10 percentage points from the same survey last year – and the satisfaction rate with third-party monitoring tools also saw an increase of 18 percentage points to 86%.

Observing container environments with Cloud Operations

Did you know GKE isn’t the only place you can run containers in Google Cloud? In this episode of Engineering for Reliability, we show three options for running containers, as well as how to instrument each one for observability with Cloud Operations. Watch to learn how Cloud operations can help visualize metrics and analyze logs emitted by container workloads running on GKE, on Cloud Run, and on an Anthos cluster!

Facebook Outage: The Case for Configuration & Change Management

In the age of cloud, digital transformation, application modernization, and the mobile economy, the network is the lifeblood behind enabling excellent customer experiences. Network Operations (NetOps) and IT Operations (ITOps) teams are constantly aware that a disruption in core network systems performance can have a massive impact on their business.

Is service catalog the modern CMDB?

SquaredUp recently launched a PowerShell tile that lets you visualize data returned from a PowerShell script. This has opened virtually infinite doors to the sources you can get data from. PowerShell can work with crazy text formats obscure databases, and endpoints that are open on the internet. If you can access it, PowerShell can work with it. And SquaredUp lets you leverage that power so you can get the information you need and visualize it in a format that makes sense.

What is Digital Experience Monitoring?

Businesses globally have been steadily shifting to digital as early as a decade ago. With the coronavirus pandemic happening, the digital transformation has now shifted into fifth gear. Digital experience is the key to business success. As of 2020, there were almost 30 billion end users that’s connected to the internet. Digital revenue has increased dramatically and digital will surely drive retail sales up.

10 SQL Server Performance Tuning Best Practices

There are a large number of best practices around SQL Server performance tuning – I could easily write a whole book on the topic, especially when you consider the number of different database settings, SQL Server settings, coding practices, SQL wait types, and so on that can affect performance.

A snapshot of my daily work

Today I show you a snapshot of my daily work. It is especially interesting this time, because it’s a not-so simple problem to solve. It’s not difficult per se, but involves quite some understanding of the Icinga Web 2 framework and how it communicates with the web server. Disclaimer: What I’m going to show, is not a feature preview or anything. It’s more of a proof of concept, and it may be that forever and won’t be continued further.

Honeycomb Differentiators Series: SLOs That Tell the Whole Story

In the recent past, most engineering teams had a vague notion of what Service Level Agreements (SLAs) and Service Level Objectives (SLOs) were—mainly things that their more business-focused colleagues talked about at length during contract negotiations. The success or failure of SLAs were tallied via magic calculations (what is “available” anyway?!) at the end of the month or quarter, and adjustments were made in the form of credits or celebrations in the break room.

Mastering AWS identity and access management

From the basic to advanced concepts of AWS own service for identity and access management: users, groups, permissions for resources and much more. For seriously working with AWS, there’s no way around its Identity and Access Management (IAM) service. Skipping to understand its core principles will bite you again and again in the future️. Take the time to do a deep dive, so you won’t be frustrated later.

What is a Site Reliability Engineer (SRE)?

A site reliability engineer, or SRE, is a role that that encompasses aspects of both software engineering and operations/infrastructure. It also encompasses a strategy and set of practices and principles across service offerings and is closely tied to DevOps and operations. The term site reliability engineering first came into existence at Google in 2003 when a site reliability team was created. At that time, the team was made up of software engineers.

Stop coding, start clicking with 2 Steps - Customer Testimonial Edenor

Spending too much time coding or embedding agents for your monitoring? With agentless synthetic monitoring from 2 Steps, Anyone can build tests. Realise the value of 2 Steps quickly with no agents, no code, no worries. “We were amazed by getting value from 2 Steps so rapidly. The implementation was fast and it was easy to use without any previous expertise in building user transactions.” Alejandro Marzullo Systems Analyst Edenor
Sponsored Post

Build successful React Native apps with Raygun

React Native has come a long way since an internal prototype at Facebook to where it is now. The cross-platform framework is now a go-to tool for businesses to develop natively rendered mobile apps for iOS and Android. Thanks to its reduced development time with hot reloading to its focus on runtime performance with natively rendered views, React Native has gained traction with large-scale companies like Shopify and Tesla. Web developers have made the leap from the browser to mobile bringing with them all of their tools and expertise, and it is consistently the most popular cross-platform framework.

Improved Error Tracking for Node.js in AppSignal

Good news for Node.js developers using AppSignal: a new version of our Node.js library is available on npm with improved error tracking. We’ve added two new helpers to make your life easier as a Node.js developer. One helper allows you to track errors whenever you need to, no matter how many nested spans you have in your current context. The other lets you send an isolated error with no spans or context involved (for more information about spans, check out our docs).

What is Observability?

Observability is a term that is becoming commonplace in both startups and enterprises. Log observability is different from monitoring, as it provides visualized metrics from a variety of different systems in a single pane of glass view. This is invaluable for organizations to understand the interdependencies and links between external events and internal performance.

Don't Let Third-Party Providers Bring Your Uptime Down

False positives are sometimes real alerts in disguise. And they can contribute to some major downtime if you don’t resolve them quickly. They can also put quite a strain on your resources trying to figure out why they’re happening, and if you work with third-party providers, errors may be even harder to locate.

Monitoring Kubernetes with Prometheus

Kubernetes is among the emerging open-source products expanding in the market at a very fast rate. It is a portable, extensible, and open-source platform used for managing containerized workloads and services. Companies are widely adopting it for the development of their major products. Docker is always used for running Kubernetes servers on local systems for testing purposes. It becomes essential for companies to monitor their Kubernetes container.

Rollbar Pro Tips: Custom Data Fields

Rollbar will accept and report any additional data fields that you choose to configure, allowing you to send extra data with each occurrence payload. This extra data can be used for things like analytics, custom grouping, and alert routing. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Better Kubernetes application monitoring with GKE workload metrics

The newly released 2021 Accelerate State of DevOps Report found that teams who excel at modern operational practices are 1.4 times more likely to report greater software delivery and operational performance and 1.8 times more likely to report better business outcomes. A foundational element of modern operational practices is having monitoring tooling in place to track, analyze, and alert on important metrics.

AIOps and Performance Monitoring: A One-Two Punch for IT Operations

Sugar Ray Robinson and Jake LaMotta. Marvelous Marvin Hagler and Tommy Hearns. Muhammad Ali and Joe Frazier. All were among history’s greatest boxers, but when they met in the ring, each brought out the best in the other. It’s the same in IT management. There are tools and platforms that on their own are essential to IT operations; but when paired as an infrastructure management tandem, each complements the other, ensuring maximal efficacy of both systems.

PagerDuty Integration Spotlight: InfluxData

InfluxData is an Open Source Platform built for metrics and events — a platform that is purpose-built for time series data. The essential time series toolkit — dashboards, queries, tasks and agents all in one place. InfluxDB is even more programmable and performant with a common API across OSS, cloud and enterprise editions. Send events to PagerDuty to keep your teams informed. Check out InfluxData’s integration.

Facebook, Instagram, and Whatsapp's Outage - Understanding MTTR

Yesterday the most used social media platforms in the world were inaccessible for 6 hours straight. Later, in a press release, Facebook revealed that the outage was due to configuration changes in their routers. There is no doubt that Facebook has an intense incident response plan, yet a small blind spot resulted in a significant business interruption. So how do we avoid this? The truth is, outages and performance issues are bound to happen in any network.

The Future of AIOps Includes an ITOps Strategy

One of the questions I get asked a lot by customers, prospects, and partners is, “Will AIOps make them irrelevant?” To them, AIOps is often equivalent to automated remediation; an AIOps system automatically detects an incident and kicks off a remediation process in response to this incident, knowing exactly what process will solve the problem. IT is out of the loop, data centers and NOCs just keep humming along unattended, end users are none the wiser.

Facebook, Instagram, and WhatsApp Down for Over Five Hours

Did you unconsciously open Instagram, Facebook, or WhatsApp several times throughout the day on Monday, only to get a “Couldn’t refresh feed” message? Did you try again every 20 or so minutes? Did you maybe even restart your phone, not once thinking websites as large as Facebook could possibly go down and believing it must be your own technology? Rejoice: it’s not you, it’s them.

Reliable Alerting with Icinga and SIGNL4

You’ve probably been in this situation before – you’re using Icinga to monitor your infrastructure and Icinga detects a critical issue but nobody notices it. It might be an urgent maintenance request, an unexpected breakdown, or a service quality issue. But your technicians or service engineers are neither in the control room nor in front of the dashboard to see the issue and its urgency.

Announcing General Availability of the Honeycomb Query Data API

The Query Data API is a Honeycomb Enterprise feature. With a Honeycomb Enterprise account, you can use this API today. Head over to our API docs to learn how to get access to your data. If you aren’t yet a Honeycomb Enterprise user, try it out by requesting an Enterprise Trial. Starting today, Honeycomb Enterprise customers can use the Honeycomb Query Data API to programmatically run queries and retrieve their results, and pull query results into any data visualization tool of their choice.

Elastic Observability: Driving mean time to resolution to zero

At ElasticON Global 2021, Tanya Bragin, VP Product, Observability, and the Elastic Observability team showed how ongoing innovations continue to deliver actionable insights and faster root cause detection, reducing mean time to resolution (MTTR). The adoption of cloud, microservices, and ephemeral infrastructure is driving increased complexity, requiring an observability solution to provide end-to-end visibility.

ITOps Needs Observability Like Batman Needs Lucius Fox

Some things just go better together. Like barbeque and blues, sunsets and beaches, cheese and fine wine — hey, even software and superheroes go better together! That’s why in this blog we are going to look at why IT Operations and Observability just go better together, through a superhero analogy. Enter the Dark Knight himself — Batman! He will represent observability. IT Operations will be represented by Lucius Fox.

20 KPIs Your MSP Should be Tracking (Webinar)

As an MSP owner or manager, you want your teams to be continually improving and increasing efficiency. To make that happen, you know you need to be tracking MSP KPIs—after all, what gets measured gets improved. But which numbers do you track, how many do you track, and how do you actually use them to get better?

Ready. Set. WAIT-New Report Shows Why Your Device is Slow to Start

Every workday you open your laptop or start your desktop, and you wait. For some, that wait is a mere blip in the day, a few seconds, for others that wait can seem interminable. A few months ago, our engineering team set out with the task of exploring what variables really impact a slow device performance. During the course of their research, the team uncovered answers to very specific questions like: What is the average startup time for a work device? (Hint: it’s less than five minutes).

Announcing Early Access to Variable Retention on LogDNA

The massive proliferation of log data forces teams to manage the costs to process, route, and store it. Teams need access to this data to gain critical insights into their services, but for many organizations this presents a challenge for their budget. Logging can get expensive, fast, which often results in teams making difficult tradeoffs between aggregating enough logging information to be useful and controlling the cost of storing all those logs.

Debug iOS crashes efficiently with Datadog RUM

Unsurprisingly, application crashes due to fatal errors can be a major pain point for iOS users. Recent research shows that roughly 20 percent of mobile application uninstalls were due to crashes or other code errors. As a developer, it’s paramount to manage this potential churn by capturing comprehensive crash data in order to track, triage, and debug recurring issues in your iOS apps.

Sponsored Post

Troubleshooting Office 365 Issues Made Simple

Do you often ask yourself the question - Is there an Office 365 problem today? While you try to find the answer, your customers (end-users) complain because they can't access their business applications. Apart from all this, your boss needs an immediate status update. Trust me. It doesn't feel great to be in that situation. And we know it. Despite Microsoft claiming to provide 99.9% SLA, issues will occur with the Office 365 applications such as Teams, Outlook, OneDrive, Exchange Online, SharePoint, Yammer, etc. Often, the issues aren't even Microsoft's problems but an ISPs or internal network change. There can be lot of reasons (Network, OS, browser, personal device, upgrade errors, Internet, and much more), but which one is it?

Metrics Dashboard, Scale testing upto 500K events/sec - Signal 05

A month and thousands of code lines later, we're here with our monthly product update - Signal #05. We squashed bugs, shipped custom metric dashboard along with improvisations in our frontend. We also got featured by one of the top online analytics magazines as one of the leading Data Observability platforms. 🥳 Let's dive in to see what humans at SigNoz have been up to!

How Do You Monitor Cassandra Performance: Key Metrics to Measure

Apache Cassandra is a distributed database known for its high availability, fault tolerance, and near-linear scaling. It was initially developed by Facebook, but it is a widely used open-source system used by the largest tech companies in the world. There are numerous reasons behind its popularity, including no single point of failure, exceptional horizontal scaling with a data layout designed as a perfect fit for time-series data.

7 Best Log and Syslog Viewers

Many devices—such as switches, routers, firewalls, servers, and printers—support syslog protocol. This standard for sending log messages within a network offers critical information about your system. Consequently, monitoring your network and its syslog messages should be a top priority. Many IT professionals use log and syslog monitors or viewers to gather logs and syslog messages from across their network in a centralized location.

Incident Review For the Facebook Outage: When Social Networks Go Anti-social

The following is an analysis of the Facebook incident on 10/4/2021. Marking a highly unusual state of events, Facebook, Instagram, WhatsApp, Messenger, and Oculus VR were down simultaneously around the world for an extended period of time Monday. The social network and some of its key apps started to display error messages before 16:00 UTC. They were down until 21:05 UTC, when things began to gradually return to normality.

The Blog Is Dead; Long Live the Blog

Ever since the very beginning, Honeycomb has poured a lot of heart and soul into our blog. We take pride in knowing it isn’t just your typical stream of feature updates and marketing promotions, but rather real, meaty pieces of technical depth, practical how-to guides, highly detailed retrospectives, and techno-philosophical pieces. One of my favorite things is when people who aren’t customers tell me how much they love our blog.

Leverage Correlation Analysis to Address the Challenges of Digital Payments

In the first four parts of our series on correlation analysis, we discussed the importance of this capability in root cause analysis in a number of business use cases, and then specifically in the context of promotional marketing, telco and algorithmic trading. In this blog we walk through how to leverage correlation analysis to address the challenges in ensuring a seamless online payment experience by the end-user.

Deploy the RESTMon Microservice in Minutes

Within any enterprise, IT operations teams use a variety of solutions to monitor their technology ecosystem. These products are often business critical and cannot easily be replaced or migrated. Ultimately, it’s important that teams can analyze and correlate data from these different tools so they can produce the insights they need to improve decision making. To help address these requirements, Broadcom offers RESTMon.

Empowering Digital Employee Experience with Browser-based Real User Monitoring

Service Watch Browser provides Real User Monitoring (RUM) for SaaS and web applications you don't control. The browser-based plugin can be securely installed as an Internet Explorer or Chrome extension to get the end-user perspective of mission-critical apps such as Microsoft 365, Salesforce, Workday, AWS, etc. Service Watch Browser enables IT admins or app owners to proactively detect and troubleshoot Wi-Fi, ISP, and local network connectivity issues causing the slowness.

Latest top 21 APM tools [open-source included]

Application Performance Monitoring (APM) tools are a critical component of distributed applications now. But choosing the right APM tool can be tricky. In this article, we go through a list of the top 21 APM tools including open-source APM tools which can help monitor and improve your application performance.

9 Stackify Competitors to Know in 2021

Stackify is a software company based in Leawood, Kansas, United States. Matt Watson, an American entrepreneur, founded it in January 2012. With a suite of tools like Prefix and Retrace, Stackify aids software developers in troubleshooting and provides support. According to Stackify, standard APM software is insufficient for managing application code.

Dashboard Fridays: Sample VMware Status Dashboard

Join SquaredUp's Adam Kinniburgh and fellow virtualisation expert Shawn Williams as they showcase the VMware Status Dashboard. Built in SquaredUp with the easy-to-use PowerShell tile, this dashboard surfaces data from vCenter for Hosts and VM to provide a Virtualization Administrator the information they need at a glance. Tune in to learn how it was made, the challenges it solves, and Shawn's top tips for building it yourself.

Rollbar Pro Tips: People tracking

Leverage Rollbar's People tracking feature and get additional visibility over which of your users are affected by each error, the history of errors experienced by a particular person, as well as the list of all people who have ever experienced an error. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

The Importance of Sharing "Pretty" Things | An IT Journey to Monitoring Glory: Session 3

During this THWACK® Livecast series, we're focused on you, the IT professional. Whether you're an accidental admin and just getting started, welcoming scope creep and getting noticed at your company, or the monitoring engineer who's ready to shine, these sessions are for you. Attendees will learn how to leverage SolarWinds tools to communicate clearly and concisely to management and become heroes in the Monday morning postmortem meetings. Join the sessions as they happen to ask questions directly to the presenters and get your answers live. Monitoring is a journey, not a destination—we may start at SNMP and WMI, but we'll help you end up victorious.

Splunk Performance Improvements Using Cribl LogStream

LogStream is a data pipeline solution that can help you transform your unstructured data to be more structured before it persists to disk. This doesn’t only improve sending to Splunk, but also sending to other observability solutions like Datadog, Wavefront, the Elastic Stack, or Sumo Logic, as well as writing to an S3-compliant API, GCP Cloud Storage, or Azure Blob Storage.

Incident Review - Slack Outage Impacts A Subset Of Users Worldwide Due To DNS Issue

DNS observability is an essential part of any Ops team’s strategy. Looking for proof? It’s happening right now. It has been a busy week for Ops teams across the globe. Many were forced to urgently rotate SSL certificates after one of Lets Encrypt’s root certificates expired. Collaboration plays a critical role during such situations where members in a team or multiple teams must communicate and work with each other to rapidly and efficiently complete a collective task.

Lessons From An Internet Outage - Issues Caused By Let's Encrypt DST Root CA X3 Expiration

As a monitoring and observability company, we have a lot of monitoring built into our systems, as well. We have the standard monitoring to make sure that systems are performing properly, data is flowing through our infrastructure, etc. At the same time, we have monitoring for any sudden changes to tests that our customers are running. On September 29, 2021, 19:21:40 UTC, we started to see a tsunami of alerts at Catchpoint.

"Experience is truth": ABN AMRO's Real-World XLA's

“Experience is truth.” That was one of the slogans my colleagues and I came up with in our first meeting as the newly-formed Digital Employee Experience team at ABN AMRO, one of the largest banks of the Netherlands. The subtext being that Digital Employee Experience, had to be top of mind for every IT project, even if that meant some unconventional thinking. But we were ready for unconventional.

Everyone Says It's a Bad Idea; Should You Do It Anyway?

It's a special edition episode this week as Ben chats with Felix Livni of Schedulista to talk startups. There are plenty of hot takes to go around such as ignoring good advice when starting a business, how boostrappers should do the exact opposite things that a venture funded company does, and why you may consider direct mail for a SaaS business. Grab your pitchforks and tune in!