Operations | Monitoring | ITSM | DevOps | Cloud

How shuffle sharding in Cortex leads to better scalability and more isolation for Prometheus

For many years, it has been possible to scale Cortex clusters to hundreds of replicas. The relatively simple Dynamo-style replication relies on quorum consistency for reads and writes. But as such, more than a single replica failure can lead to an outage for all tenants. Shuffle sharding solves that issue by automatically picking a random “replica set” for each tenant, allowing you to isolate tenants and reduce the chance of an outage.

Observability: It's the User Experience, Stupid!

Observability, which originated from control theory, measures how well you can understand a system’s internal states from its external outputs. Observability uses instrumentation to provide insights that aid monitoring. In DevOps, gaining observability is achieved through a set of monitoring solutions. The shift to use one vendor platform to do so, versus multiple solutions, make sense as.

Dashboard Server: Working with the Elasticsearch Tile

I’ll come clean and admit it – this part of the series will be a bit interesting given the fact that I know very little about Elasticsearch. So really, this is an honest test of the question – “can I still build something good with Dashboard Server even if I only have nominal knowledge of the tool where the data is sourced from?”

Replay Single Transactions for Root Cause Analysis

Speedscale was built primarily to provide engineering teams with better insight into their applications over time, replaying single transactions for root cause analysis that give developers and SREs confidence that tomorrow’s application code will work just as well in production as it did yesterday.

6 Things You Can Do Right Now to Make Your Remote Team More Productive

Remote work has existed for decades as an alternative to the traditional in-office work environment. When it first became a viable option—people have been working remotely since the dawn of email in 1971 and before via telephone—distributed work was often met with trepidation by companies and employees alike. In recent years, however, working remotely, “working from home,” and other similar terms have become commonplace.

In-Depth Guide to Digital Experience Monitoring

How a software product feels is easy to overlook, but how the product works matters just as much, if not more. Results from digital experience monitoring point to how apps feel as the key determinant of their success. “That’s how it is with people. Nobody cares how it works as long as it works.” This famous line from The Matrix Reloaded (2003) resonates with the way many developers approach maintaining apps. Someone has to keep watch.

How to Become Data Centric

According to Dr. Stephen Hawking and the conservation of quantum information theory, information can neither be created nor destroyed… unless you work in IT. OK, he didn’t really say the part about IT; I did. In the physical world, information is constantly generated, curated, and consumed—from emails to cat videos to this blog post. Not to mention error messages, system logs, and alert emails you never read.

Identifying Bottlenecks in DigitalOcean Before Your Customers Do

Hosting your application on DigitalOcean is an easy way for teams to deploy and scale applications without worrying about the details of the infrastructure. But what happens when your application starts causing bottlenecks and you need to track down the root cause? In this article, we’ll look at how SolarWinds® AppOptics™ works together with DigitalOcean to help you identify and fix performance issues with your application.

What's New with JFrog Xray and DevSecOps

As we look to improve the quality and capabilities of the JFrog DevOps Platform, especially in the world of DevSecOps, we have added powerful new features to further enhance the award-winning JFrog Xray. The capabilities detailed below cement Xray’s position as a universal software composition analysis (SCA) solution trusted by developers and DevSecOps teams globally to quickly and continuously identify open source software vulnerabilities and license compliance violations.