Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Showcase dashboards securely and effortlessly with Skykit's offering in the Datadog Marketplace

For many organizations, making the most of the visibility Datadog offers into the health and performance of their infrastructure means displaying dashboards to stakeholders in various settings continuously and in real time. But the standard solutions for sharing dashboards to large-format displays can be onerous, involving sundry software and hardware and restrictive manual setups. These solutions can also pose significant security risks, since they tend to involve sharing passwords or devices.

IBM MQ vs Apache Kafka: How Do They Differ?

Asynchronous communication between various CX applications has long been made possible by enterprise messaging solutions like IBM MQ and – more recently – Apache Kafka. Developers might assume that these two technologies are interchangeable. However, once they scrape the surface, critical differences between IBM MQ and Apache Kafka come to light.

Introducing Honeycomb Service Map: A Dynamic, Interactive, and Actionable View of Your Entire Environment

Today, we're announcing the launch of Honeycomb Service Map. This isn't your grandparent's version of a service map. This feature reimagines what it is that you want to know or investigate when looking at visualizations of how your services communicate with one another.

HAProxy Logging Configuration Explained: How to Enable and View Log Files

HAProxy is generally the frontend layer of your application, which means it plays a critical role since all traffic first lands on this layer. Because of this, you need to make sure everything is working at this layer all the time, as any issue can directly impact your business. Therefore, having visibility on this layer is crucial. Visibility can come from two aspects: the metrics HAProxy emits and the logs it generates while handling requests.

The Roblox Outage

Just before Halloween 2021, Roblox engineers experienced a horror story: a service outage that also took down critical monitoring systems. It seemed like the issue was a hardware problem, but it wasn’t. Users were frustrated, and the clock was ticking. After three full days of downtime, service was finally restored on Halloween day. While the incident itself was an IT nightmare, Roblox’s detailed technical post-mortem several months later was an excellent way to bounce back.

Monitoring your Network SNMP devices using Hosted Graphite

When you design architecture to monitor your digital assets - either software applications or hardware devices, you need to use different strategies depending on your monitoring target. The factors you want to consider can vary including methods of retrieving monitoring data, frequency of data collection, and how you want to surface metrics and insight you find to stakeholders. In this article, we will mainly discuss how we can monitor your network SNMP devices using Hosted Graphite.

How to monitor server load

We often hear the term "load" used to describe the state of a server or a device. But what does it really mean? System load is a measure of the amount of computational work that a system performs. An overloaded system, by definition, isn't able to complete all its tasks per schedule - this affects the performance and productivity of the system. And while "load" often gets conflated with CPU usage there's a lot more to it.