Why MapReduce does not always work Write a lot of stuff to disk for failover Inefficient for small to medium problems [(k, v)] mapper [(k, v)] reducer [(k, v)] Data and model params as (k, v) pairs? Complex to leverage for Iterative Algorithms
IPython.paral el • Paral el Processing Library • Interactive Exploratory Shell Multi Core & Distributed
The AllReduce Pattern • Compute an aggregate (average) of active node data • Do not clog a single node with incoming data transfer • Traditional y implemented in MPI systems
Working in the Cloud • Launch a cluster of machines in one cmd: starcluster start mycluster -b 0.07 starcluster sshmaster mycluster • Supports spotinstances! • Ships blas, atlas, numpy, scipy! • IPython plugin!