Beyond Shuffling - Tips & Tricks for scaling your Apache Spark programs. This talk walks through ...
Beyond Shuffling - Tips & Tricks for scaling your Apache Spark programs. This talk walks through a number of common mistakes which can keep our Spark programs from scaling and examines the solutions, as well as general techniques useful for moving from beyond a prof of concept to production. It covers topics like effective RDD re-use, considerations for working with key/value data, and finishes up with a preview of some of the work being done to add code generation to Spark ML.