Operations | Monitoring | ITSM | DevOps | Cloud

Grafana Alerting: How to monitor alerts for better alert management

With the release of Grafana 10.2, we made a number of enhancements to Grafana Alerting. These updates included the rollout of Insights, a new section of the Grafana Alerting home page. Available now to all Grafana Cloud users, Insights offers valuable information, such as statistics on alert rules and notifications, to help you monitor alerting data and quickly analyze alert performance.

Top Data Center Management Trends to Watch in 2024

In the blink of an eye, 2023 has come to an end and the data center industry saw lots of movement towards sustainability, AI, and operational efficiency. Data center management is ever-changing and evolving, and it’s important to stay on top of the latest trends to guide you to success in the new year. With 2024 just days away, here are the top 10 emerging data center management trends that you should watch out for.

The Advent of Monitoring Day 1: What Are Synthetics and Why They Are Needed

This is the first part of our 12-day Advent of Monitoring series. In this series, Checkly's engineers will share practical monitoring tips from their own experience. Hey there! Here is my take on what synthetic monitoring means and why it’s awesome! I think it’s a very complicated word for a very straightforward concept. In fact, I am convinced, that once you've used it, you will never want to live without it.

Performance optimization techniques in time series databases: sync.Pool for CPU-bound operations

Internally, VictoriaMetrics makes heavy use of sync.Pool, a data structure built into Go’s standard library. sync.Pool is intended to store temporary, fungible objects for reuse to relieve pressure on the garbage collector. If you are familiar with free lists, you can think of sync.Pool as a data structure that allows you to implement them in a thread-safe way.

IT Automation Powers SRE Practices as System Complexity, Consumer Demands Grow

Site Reliability Engineers (SREs) use automation and orchestration capabilities to scale security and performance, ensuring sites are reliable and efficient. Site Reliability Engineering (SRE) can be applied to a wide range of use cases and industries, where software systems and services are critical to business operations.

Monitor your chaos engineering experiments with Steadybit's offering in the Datadog Marketplace

Steadybit is a software reliability platform that uses chaos engineering and fault injection to help organizations improve the stability and performance of their applications. By allowing customers to simulate turbulent scenarios in a controlled environment, Steadybit enables you to identify and mitigate potential system issues to reduce downtime and improve resilience.

Now in beta: alerting for modern DevOps teams

Although FireHydrant has spent five years focused on what happens after your team (erg, I mean service 🙄) gets paged, the topic of alerting often comes up in discussions with our community. People are tired of paying big bucks for software that’s expensive, bloated, and hasn’t seen much innovation. Clearly, there’s a problem here – and we’re tackling it head on.

Correlate AWS and Prometheus with SquaredUp's data mesh

I recently delved into the idea of using labels within Prometheus to craft objects and hierarchies where none initially existed. Check out that piece here. The essence was harnessing the prowess of OTEL to achieve more, faster. The ambition? Transform these abstract virtual objects and integrate them into SquaredUp's knowledge graph, thereby unlocking the potential of data mesh and correlation.

How-to surface your multi-cloud costs with SquaredUp

Working in the cloud is certainly convenient, but the convenience comes at a price. With more and more organizations transitioning to the cloud, and a rise in preference towards cloud-native applications, hosting most, if not all the components of your business in the cloud is becoming increasingly common.