Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

A Snapshot of our IT Ops Predictions for 2023

Today executives and customers expect IT and digital services to be available and performant at all times; compromised availability or performance is no longer tolerable. Think about it; when was the last time a digital service was unavailable and it didn’t make the news or social media? When was the last time you visited a website that was unavailable and you waited for the outage to be over, rather than finding an alternative in the moment?

Profiling 101: Why profiling?

This is part 2 of a 3-part series on profiling. If you’re not yet familiar with the what profiling is, check out the first part in our series. By this point, you’re probably already convinced that good performance is important for your app’s success. There are many tools available for performance, but profiling in production with a modern profiling tool is one of the easiest and most effective ways to get a full understanding of your app’s performance.

Running API and Browser Checks Using Terraform, AWS, and Checkly Private Locations

When adding new Checks in Checkly a number of locations are available to check your endpoints from multiple locations around the world. For most use cases this is more than enough to ensure your resources are online. However, these locations are outside of your network and are unable to check on resources deployed more securely inside your private network.

Release 1.38.0 - DBENGINE v2, Functions, Events, Notifications, Role Based Access, and much more!

The Netdata team is very excited to introduce you to all the new features and improvements in the new version. HIGHLIGHTS: DBENGINE v2 The new open-source database engine for Netdata Agents, offering huge performance, scalability and stability improvements, with a fraction of memory footprint! FUNCTION: Processes Netdata beyond metrics! We added the ability for runtime functions, that can be implemented by any data collection plugin, to offer unlimited visibility to anything, even not-metrics, that can be valuable while troubleshooting.

Get to know TraceQL: A powerful new query language for distributed tracing

At Grafana Labs, we love tracing, which is why we’ve been hard at work on Grafana Tempo, an open source, highly scalable distributed tracing backend. Tempo just had its 2.0 release. In conjunction with that release, we are excited to show off TraceQL — a powerful new query language designed for distributed tracing. In this blog, we’ll provide an overview of why we created TraceQL, how it works, how you can put it to use today, and what we have planned for future iterations.

AppSignal for Elixir Now Supports Oban

If you're using Oban for managing background jobs in your Elixir application and want to gain a deeper data-driven understanding of how they perform, you've come to the right place. AppSignal for Elixir now automatically instruments Oban, meaning you can now monitor the performance of your background jobs through an AppSignal Magic Dashboard, which gives you detailed information on queue times, processing times, and notifies you of any exceptions.

Root cause log analysis with Elastic Observability and machine learning

With more and more applications moving to the cloud, an increasing amount of telemetry data (logs, metrics, traces) is being collected, which can help improve application performance, operational efficiencies, and business KPIs. However, analyzing this data is extremely tedious and time consuming given the tremendous amounts of data being generated. Traditional methods of alerting and simple pattern matching (visual or simple searching etc) are not sufficient for IT Operations teams and SREs.