As Spark jobs are used for more mission critical tasks, beyond exploration, it is important to ha...
As Spark jobs are used for more mission critical tasks, beyond exploration, it is important to have effective tools for testing. This talk expands on “Effective Testing For Spark Programs” (not required to have been seen) to discuss how to create large scale test jobs without depending on collect & parallelize which limit the sizes of datasets we can work with. Testing Spark Streaming jobs can be especially challenging, as the normal techniques for loading test data don’t work and additional work must be done to collect the results and stop streaming. We will explore the difficulties with testing Streaming Programs, options for setting up integration testing, beyond just local mode, with Spark, and also examine best practices for acceptance tests.