Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Announcing Service Performance Monitoring in Early Access

Today, we’re thrilled to announce the early access of our Service Performance Monitoring capability. As today’s DevOps teams know all too well, monitoring application requests in modern microservices architectures is extremely difficult. Requests typically travel across a vast ecosystem of microservices and, as a result, it is often a significant challenge to pinpoint a specific failure in one of these underlying services.

AMA Responses: Icinga for Windows

00:10 Will there be any further Windows development in Icinga 2 except for the Windows agent part?

01:10 Are the Windows plugins considered to be deprecated?

02:12 Is it possible to only have the Icinga agent and the plugins without having the whole Icinga for Windows framework?

03:38 Are there plans to provide the PowerShell plugins as standalone, so one can use the plugins without the framework stuff?

Introducing the AWS CloudWatch integration, Grafana Cloud's first fully managed integration

At Grafana Labs, we are continuing to build integrations that make it easier than ever to observe your systems, no matter which tools or software you choose. Today, we’re excited to talk about the latest integration available in Grafana Cloud: the AWS CloudWatch metrics integration, the first of our fully managed integrations that makes it simple to connect and visualize your data in Grafana.

Best Practices for Cloud Monitoring

In our last episode, we covered best practices for deploying and using Cloud Operations in an enterprise environment. But we still left some questions unanswered. How should you monitor your services? How should you deal with alerts? And what about managing cost? In this episode of Engineering for Reliability, Yuri discusses best practices for setting up and using Cloud Monitoring and optimizing monitoring costs.

Logz.io Anomaly Detection: Shedding Light on "Unknown Unknowns"

Moving beyond traditional monitoring to embrace full stack observability offers a seemingly endless range of benefits. Beyond unifying logs, metrics, and traces in a single platform, the opportunity to enlist advanced analytics and engage a more predictive approach represents another huge step forward.

5 lessons from the October 2021 Facebook outage

On October 4, 2021, Facebook services went off the grid gradually, and then suddenly at 15:39 UTC. It took nearly six hours to restore service to normal. With over 3.5 billion users facing a lengthy downtime using one or multiple products from Facebook, Inc. (now known as Meta Platforms, Inc.) conversations flooded the internet about what caused the downtime issues on the American social networking service.