You are not a bad person. But your Apache Spark job is failing. It is running out of m...
You are not a bad person. But your Apache Spark job is failing. It is running out of memory. It is stalled. It is complaining that no executors have registered or spitting out "Filesystem closed" exceptions with lines upon lines of $anon$1's or being consumed by a swarm of locusts the likes of which have not been seen since Moses crossed the Red Sea. Or it's completing -- 20 times as slow as it should reasonably take. Why? In this talk, you'll learn the internals of Spark jobs, the root causes of such ailments, and tuning strategies for avoiding them.
Sandy Ryza is a data scientist at Cloudera, an Apache Hadoop committer, and a Spark contributor. Sandy is also the co-author of Advanced Analytics with Spark.