Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

A fresh perspective: Rethinking TCO and expanding visibility to fuel growth

Over the past nine years, Freshworks has experienced record growth using Sumo Logic to ensure a reliable experience for its SaaS customers. Through continuous optimizations, the SRE team has improved their Total Cost of Ownership (TCO) and gained better visibility and insights into the logging patterns of their applications. Developers investigate issues more quickly using features like Live Tail, Anomaly Detection and Time Comparison, maximizing service reliability. Learn how Freshworks is doubling down on Sumo Logic to achieve full-stack visibility.

Make Your MSP a Recession-Proof Business

Many believe we’re either in a recession or on the brink of one. It’s a familiar cycle: high inflation, international strife, supply chain challenges, and tightening monetary policies are all driving fears of a downturn. While the IT industry has exceeded expectations for the past few years, it isn’t immune to a recession. As a result, many IT MSPs are thinking about what they can do to ensure a recession-proof business over the next year and beyond.

Part 4: Causal Observability - Level 3

It’s not surprising that most failures are caused by a change somewhere in a system, such as a new code deployment, configuration change, auto-scaling activity or auto-healing event. As you investigate the root cause of an incident, the best place to start is to find what changed. To understand what change caused a problem and what effects propagated across your stack, you need to be able to see how the relationships between stack components have changed over time.

Quick Bytes - Getting started with ECS monitoring

Lumigo provides visibility into your ECS clusters and the underlying services and tasks in real-time by leveraging out-of-the-box dashboards and turn-key integrations with AWS. All the key metrics you need to monitor your clusters, services and tasks are displayed with easy access to corresponding traces. With one-click distributed tracing, Lumigo lets developers effortlessly find and fix issues in serverless and containerized environments

Grafana alerts as code: Get started with Terraform and Grafana Alerting

Alerting infrastructure is often complex, with many pieces of the pipeline that often live in different places. Scaling this across many teams and organizations is an especially challenging task. As organizations grow in size, the observability component tends to grow along with it. For example, you may have many components, each of which needs a different set of alerts. You may have several teams, each with a different channel where notifications should be delivered.

Harness Continuous Observability to Continuously Predict Deployment Risk

In my previous blog, I discussed how continuous observability can be used to deliver continuous reliability. We also discussed the problem of high change failure rates in most enterprises, and how teams fail to proactively address failure risk before changes go into production. This is because manual assessment of change risk is both labor intensive and time consuming, and often contributes to deployment and release delays.

Relational Databases vs Time Series Databases

Databases are often the biggest bottleneck when it comes to application performance. Over the years a number of new database designs have emerged to help with not only basic scalability and performance but also to help improve developer productivity and make building certain types of applications easier. That isn’t to say these new databases are magical — there are always trade-offs being made and certain things are sacrificed for gains in other areas.