SPARK-785 [CORE] ClosureCleaner not invoked on most PairRDDFunctions

srowen · JoshRosen · commit 63c0ff9923e2 · 2014-12-15T16:08:53.000-08:00
This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it. Author: Sean Owen <sowen@cloudera.com> Closes apache#3690 from srowen/SPARK-785 and squashes the following commits: 8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey (cherry picked from commit 2a28bc6) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
diff --git a/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala b/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
@@ -87,7 +87,10 @@ class PairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)])
         throw new SparkException("Default partitioner cannot partition array keys.")
       }
     }
-    val aggregator = new Aggregator[K, V, C](createCombiner, mergeValue, mergeCombiners)
+    val aggregator = new Aggregator[K, V, C](
+      self.context.clean(createCombiner),
+      self.context.clean(mergeValue),
+      self.context.clean(mergeCombiners))
     if (self.partitioner == Some(partitioner)) {
       self.mapPartitionsWithContext((context, iter) => {
         new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, context))

Original file line number	Diff line number	Diff line change
`@@ -87,7 +87,10 @@ class PairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)])`
`87`	`87`	`throw new SparkException("Default partitioner cannot partition array keys.")`
`88`	`88`	`}`
`89`	`89`	`}`
`90`		`- val aggregator = new Aggregator[K, V, C](createCombiner, mergeValue, mergeCombiners)`
	`90`	`+ val aggregator = new Aggregator[K, V, C](`
	`91`	`+ self.context.clean(createCombiner),`
	`92`	`+ self.context.clean(mergeValue),`
	`93`	`+ self.context.clean(mergeCombiners))`
`91`	`94`	`if (self.partitioner == Some(partitioner)) {`
`92`	`95`	`self.mapPartitionsWithContext((context, iter) => {`
`93`	`96`	`new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, context))`