Demystifying Spark Jobs to Optimize for Cost and Performance

Time to read
less than
1 minute

Demystifying Spark Jobs to Optimize for Cost and Performance

Tue, 16/04/2019 - 10:00
cloudera

Apache Spark is one of the most popular engines for distributed data processing on Big Data clusters. Spark jobs come in all shapes, sizes and cluster form factors. Ranging from 10’s to 1000’s of nodes and executors, seconds to hours or even days for job duration, megabytes to petabytes of data and simple data scans to complicated analytical workloads. Throw in a growing number of streaming workloads to huge body of batch and machine learning jobs — and we can see the significant amount of infrastructure expenditure on running Spark jobs.

Read Full Story