Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Seven Critical Capabilities to Look for in an AIOps Tool

In 2017, McAfee found that an average enterprise uses 464 custom applications. A large enterprise — a company with over 50,000 employees — uses 788 custom apps! The more applications you have, the more complex your application environment is. This means that you are more susceptible to outages. So, the tolerance for downtime is impossibly low. Mission-critical applications must be available at all times.

Observability in Practice

After years of helping developers monitor and debug their production systems, we couldn’t help but notice a pattern across many of them: they roughly know that metrics and traces should help them get the answers they need, but they are unfamiliar with how metrics and traces work, and how they fit into the bigger observability world. This post is an introduction to how we see observability in practice, and a loose roadmap for exploring observability concepts in the posts to come.

AWS Fargate Monitoring

How do you perform AWS Fargate monitoring? Today, we’ll discuss the background of AWS Fargate and using Retrace to monitor your code. As companies evolve from a monolithic architecture to microservice architectures, some common challenges often surface that companies must address during the journey. In this post, we’ll discuss one of these challenges: observability and how to do it in AWS Fargate.

Updated ELK Stack Guide For 2022 (Installation, Tutorials & More)

The ELK Stack has millions of users globally due to its effectiveness for log management, SIEM, alerting, data analytics, e-commerce site search and data visualisation. In this extensive guide (updated for 2021) we cover all of the essential basics you need to know to get started with installing ELK, exploring its most popular use cases and the leading integrations you’ll want to start ingesting your logs and metrics data from.

The eCommerce Holiday Calendar for DevOps

Seasonal spikes in consumer activity are expected, if not depended on, by online retailers throughout the calendar year. However, as shoppers rush to compete over door-buster deals and order holiday must-haves, web traffic escalates to levels standard resource allocation cannot easily sustain. This spike in traffic can lead to unresponsive checkouts, lost or abandoned carts, and slow-loading pages, ultimately resulting in thousands of dollars in lost revenue.

Innovations in cloud network security

Learn about innovations in cloud network security over a global network. This includes Google Cloud innovations released this year from DDoS and Web Application Firewall (WAF), Google Cloud Armor, Google Cloud firewalls, and Google Cloud IDS - the newest network based intrusion detection solution.

3 Ways Ops Teams Benefit From LM Logs

Sifting through logs in real-time or post-mortem to pinpoint the problem can take hours – and is often like trying to find the needle in the alert/log haystack. Further, keeping the troubleshooting process efficient can be a challenge due to context switching and relying on manual interpretation of events and technology-specific knowledge.

AMA Responses: Icinga Web and Modules

00:10 Why are there some issues and PRs that have not been looked at for some time?

01:34 Are there plans to increase the number of people working on the Director?

01:51 Why is there such a discrepancy between the HA functionality in Icinga 2 versus Icinga Web 2 and its modules? And will this improve in the future?

03:17 Will it be possible to tunnel module traffic with the Icinga traffic? Is something planned for managing for example x509 in a distributed setup?