Operations | Monitoring | ITSM | DevOps | Cloud

How to use AIOps to Modernize Without Compromise

While the Biden administration aggressively pushes federal agencies to modernize their IT infrastructures, ITOps managers are left wondering how to do so without making network management more complex than it already is. Modernization necessitates the addition of more tools, which can easily lead to tool sprawl and increase technical debt. Managers are already using multitudes of vendor-specific tools to monitor different devices and applications. The last thing they want is to add more.

Our Check Overview Page Has a Fresh New Look

We are very excited to announce that we redesigned our monitoring results chart to make it easier for you to understand check performance over time and easily investigate any past anomaly. The redesign is a result of our UX research that showed that the old check overview chart made it challenging for users to find check results from the past. While we were redesigning our monitoring results charts, we wanted to achieve two things: And, we achieved this in three attempts. Let’s dive in.

The AIOPs and Automation Handshake: Managing the Modern IT Stack

To increase business agility, IT organizations are deploying dynamic, modern architectures enabled by virtualization technologies. That includes containers, elastic clouds, microservices, and virtual machines. If you are rethinking your IT stack, you must also reconsider its management. IT operational silos limit business velocity.

Which is Better for Monitoring: Datadog or AWS CloudWatch?

Observability is the process of understanding complex systems by analyzing their outcomes and enhancing those outcomes by monitoring events within the system. Today, observability is essential for IT services to achieve a better user experience and optimize software performance. With cloud platforms dominating the IT services landscape, organizations are inclined to deploy their software and hardware systems in the cloud to reduce operational costs and enhance flexibility.

How the Prometheus community is investing in OpenTelemetry

Goutham Veeramachaneni, a product manager at Grafana Labs, and Carrie Edwards, a senior software engineer at Grafana Labs, are both contributors to the Prometheus open source project. This post, which they wrote together, was originally published on the Prometheus.io blog in March 2024. The OpenTelemetry project is an observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs.

Beyond Microservices: Miniservices, Macroservices, and the in between

Containerized microservices have been the gold standard for cloud computing since they replaced the monolith architecture over a decade ago. The flexibility, scalability, and velocity they enable for teams make them an obvious choice. Yet, a strict interpretation of one service for one function doesn’t quite serve everyone, especially when architectures get large. We’ll discuss how flexibility in service architecture might be the way to go.

The Data Lake Dilemma: Why Businesses Need a New Approach

In today’s data-driven landscape, every organization knows the immense value their data holds, but with the explosion of data from diverse sources, traditional data storage and management solutions are proving inadequate. Organizations are urgently seeking new ways to handle their data effectively.

Datadog on Site Reliability Engineering #shorts #datadog #observability

There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. With over 22,000 customers sending trillions of data points each day, keeping Datadog reliable is critical to our business.