Operations | Monitoring | ITSM | DevOps | Cloud

Kublr

Running Spark with Jupyter Notebook & HDFS on Kubernetes

Kublr and Kubernetes can help make your favorite data science tools easier to deploy and manage. Hadoop Distributed File System (HDFS) carries the burden of storing big data; Spark provides many powerful tools to process data; while Jupyter Notebook is the de facto standard UI to dynamically manage the queries and visualization of results.

Kubernetes and the Data Layer

Once you get your head around the concept of containers, and subsequently the need for management and orchestration with tools like Kubernetes, what started off as a weekend project suddenly starts to raise more questions than answers. Kubernetes removes much of the complexity of managing the interaction between applications and the underlying infrastructure. It is designed to let developers focus on the applications and solutions rather than worrying about the complexity of the hosting platform.

Kubernetes, Data Science, and Machine Learning

Enabling support for data processing, data analytics, and machine learning workloads in Kubernetes has been one of the goals of the open source community. During this meetup we’ll discuss the growing use of Kubernetes for data science and machine learning workloads. We’ll examine how new Kubernetes extensibility features such as custom resources and custom controllers are used for applications and frameworks integration. Apache Spark 2.3.’s native support is the latest indication of this growing trend. We’ll demo a few examples of data science workloads running on Kubernetes clusters setup by our Kublr platform.