Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Creating alerts with Grafana | Grafana for Beginners Ep 11

When observing your data with Grafana, you don't need to be glued to your dashboard 24/7. Join Senior Developer Advocate, Lisa Jung to learn how to set up Grafana to keep an eye on your data and alert you if something needs your attention! The following are covered in this episode: ☁️ Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more. We also have plans for every use case.

Datadog on Site Reliability Engineering #shorts #datadog #observability

There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. With over 22,000 customers sending trillions of data points each day, keeping Datadog reliable is critical to our business.

The Data Lake Dilemma: Why Businesses Need a New Approach

In today’s data-driven landscape, every organization knows the immense value their data holds, but with the explosion of data from diverse sources, traditional data storage and management solutions are proving inadequate. Organizations are urgently seeking new ways to handle their data effectively.

How the Prometheus community is investing in OpenTelemetry

Goutham Veeramachaneni, a product manager at Grafana Labs, and Carrie Edwards, a senior software engineer at Grafana Labs, are both contributors to the Prometheus open source project. This post, which they wrote together, was originally published on the Prometheus.io blog in March 2024. The OpenTelemetry project is an observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs.

Which is Better for Monitoring: Datadog or AWS CloudWatch?

Observability is the process of understanding complex systems by analyzing their outcomes and enhancing those outcomes by monitoring events within the system. Today, observability is essential for IT services to achieve a better user experience and optimize software performance. With cloud platforms dominating the IT services landscape, organizations are inclined to deploy their software and hardware systems in the cloud to reduce operational costs and enhance flexibility.

Our Check Overview Page Has a Fresh New Look

We are very excited to announce that we redesigned our monitoring results chart to make it easier for you to understand check performance over time and easily investigate any past anomaly. The redesign is a result of our UX research that showed that the old check overview chart made it challenging for users to find check results from the past. While we were redesigning our monitoring results charts, we wanted to achieve two things: And, we achieved this in three attempts. Let’s dive in.

How to use AIOps to Modernize Without Compromise

While the Biden administration aggressively pushes federal agencies to modernize their IT infrastructures, ITOps managers are left wondering how to do so without making network management more complex than it already is. Modernization necessitates the addition of more tools, which can easily lead to tool sprawl and increase technical debt. Managers are already using multitudes of vendor-specific tools to monitor different devices and applications. The last thing they want is to add more.

What If You Could Pull Metrics Out of Your Events?

As data keeps growing at incredible rates, it’s becoming increasingly difficult to store and monitor at a reasonable cost leaving you to cherry-pick which data to store. As developers are accustomed to integrating metrics within their logs and spans, this can result in poor monitoring & analysis, alert fatigue, and longer MTTR. Teams are left having to dig out the most relevant data, which results in missed trends and analysis.

How an APM Alternative Helps You Do Observability Right

Every software-driven business strives for optimum performance and user experience. Observability—which allows engineering and IT Ops teams to understand the internal state of their cloud applications and infrastructure based on available telemetry data —has emerged as a crucial practice to help engage this process. For years, application performance monitoring (APM) was the de facto practice and tooling that organizations have used to keep tabs on their critical systems.