The latest News and Information on DevOps, CI/CD, Automation and related technologies.
As systems become increasingly complex, we’ve seen the growth of engineering tools to abstract away and manage the complexity. But often our tools are “opinionated” and the default actions or settings may not align with how our systems are intended to work or how we think they work. Chaos Engineering is a good way to not only test your applications, but also the tools you use to build them.
Log messages help us to understand data flow through applications, as well as spot when and where errors are occurring. There are a lot of resources for how to store and view logs for applications running on traditional services, but Kubernetes breaks the existing model by running many applications per server and abstracting away most of the maintenance for your applications. In this blog post, we focus on log management for applications running in Kubernetes by reviewing the following topics.
Prometheus is a CNCF graduated project for monitoring and alerting. It is one of the most widely used monitoring and alerting tools in the Kubernetes ecosystem. Rancher users can leverage Prometheus quickly by using the built-in monitoring stack. Prometheus stores its metrics as a time series database on the local disk. Prometheus local storage is limited by the size of the disk and amount of metrics it can retain.
In 2019, the Netdata team already knew that a Netdata Cloud solution in the form of an online platform would greatly complement Netdata’s distributed monitoring by making it much easier to organize large infrastructures and by enabling new ways for teams to collaborate. The old node registry available at the time wasn’t enough for Netdata’s users. Building an online platform, even one that does not directly process users’ metrics, is challenging.
I am making a digital transformation during this novel work-from-home (WFH) era due to a COVID-19 quarantine. Many of you are going through the same and distractions abound while sharing a workspace with housemates, children, and pets. Moreover, we have to contend with an increased risk to cybersecurity, given recent attacks on work-related software such as Slack and Zoom.
Think of orgs with lots of data and it’s impossible to not think of Netflix. In a new Netflix Technology Blog, titled "Byte Down: Making Netflix’s Data Infrastructure Cost-Effective", their Platform Data Science & Engineering team describe their data infrastructure "which is composed of dozens of data platforms, hundreds of data producers and consumers, and petabytes of data.” At this scale, cost-effectiveness is a critical matter of success and failure.