Skip to content

Commit b84a92c

Browse files
committed
[SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking
## What changes were proposed in this pull request? StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It may cause some deadlock since it's called inside StandaloneAppClient.ClientEndpoint. This PR just changed CoarseGrainedSchedulerBackend.removeExecutor to be non-blocking. It's safe since the only two usages (StandaloneSchedulerBackend and YarnSchedulerEndpoint) don't need the return value). ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#14882 from zsxwing/SPARK-17316.
1 parent 412b0e8 commit b84a92c

File tree

1 file changed

+9
-8
lines changed

1 file changed

+9
-8
lines changed

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -364,14 +364,15 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
364364
conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
365365
}
366366

367-
// Called by subclasses when notified of a lost worker
368-
def removeExecutor(executorId: String, reason: ExecutorLossReason) {
369-
try {
370-
driverEndpoint.askWithRetry[Boolean](RemoveExecutor(executorId, reason))
371-
} catch {
372-
case e: Exception =>
373-
throw new SparkException("Error notifying standalone scheduler's driver endpoint", e)
374-
}
367+
/**
368+
* Called by subclasses when notified of a lost worker. It just fires the message and returns
369+
* at once.
370+
*/
371+
protected def removeExecutor(executorId: String, reason: ExecutorLossReason): Unit = {
372+
// Only log the failure since we don't care about the result.
373+
driverEndpoint.ask(RemoveExecutor(executorId, reason)).onFailure { case t =>
374+
logError(t.getMessage, t)
375+
}(ThreadUtils.sameThread)
375376
}
376377

377378
def sufficientResourcesRegistered(): Boolean = true

0 commit comments

Comments
 (0)