Skip to content

Commit 4d90d7b

Browse files
authored
Merge pull request #84152 from dagiro/ts_spark14
ts_spark14
2 parents ac4756b + 515564d commit 4d90d7b

4 files changed

+128
-168
lines changed

articles/hdinsight/spark/TOC.yml

+1-7
Original file line numberDiff line numberDiff line change
@@ -53,13 +53,7 @@
5353
- name: Troubleshoot
5454
items:
5555
- name: OutOfMemoryError exception
56-
items:
57-
- name: Insufficient heap memory
58-
href: ./apache-spark-troubleshoot-outofmemory-heap.md
59-
- name: Java heap space
60-
href: ./apache-spark-troubleshoot-outofmemory-heap-space.md
61-
- name: Livy Server fails to start
62-
href: ./apache-spark-troubleshoot-outofmemory-native-thread.md
56+
href: ./apache-spark-troubleshoot-outofmemory.md
6357
- name: Apache Spark job fails - NoClassDefFoundError
6458
href: ./apache-spark-troubleshoot-job-fails-noclassdeffounderror.md
6559
- name: Apache Spark job fails - InvalidClassException

articles/hdinsight/spark/apache-spark-troubleshoot-outofmemory-heap-space.md

-59
This file was deleted.

articles/hdinsight/spark/apache-spark-troubleshoot-outofmemory-heap.md

-95
This file was deleted.

articles/hdinsight/spark/apache-spark-troubleshoot-outofmemory-native-thread.md renamed to articles/hdinsight/spark/apache-spark-troubleshoot-outofmemory.md

+127-7
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,132 @@
11
---
2-
title: Livy Server fails to start on Apache Spark cluster in Azure HDInsight
3-
description: Livy Server fails to start on Apache Spark cluster in Azure HDInsight
2+
title: OutOfMemoryError exceptions for Apache Spark in Azure HDInsight
3+
description: Various OutOfMemoryError exceptions for Apache Spark in Azure HDInsight
44
ms.service: hdinsight
55
ms.topic: troubleshooting
66
author: hrasheed-msft
77
ms.author: hrasheed
8-
ms.date: 07/29/2019
8+
ms.date: 08/02/2019
99
---
1010

11-
# Scenario: Livy Server fails to start on Apache Spark cluster in Azure HDInsight
11+
# OutOfMemoryError exceptions for Apache Spark in Azure HDInsight
1212

1313
This article describes troubleshooting steps and possible resolutions for issues when using Apache Spark components in Azure HDInsight clusters.
1414

15-
## Issue
15+
## Scenario: OutOfMemoryError exception for Apache Spark
16+
17+
### Issue
18+
19+
Your Apache Spark application failed with an OutOfMemoryError unhandled exception. You may receive an error message similar to:
20+
21+
```error
22+
ERROR Executor: Exception in task 7.0 in stage 6.0 (TID 439)
23+
24+
java.lang.OutOfMemoryError
25+
at java.io.ByteArrayOutputStream.hugeCapacity(Unknown Source)
26+
at java.io.ByteArrayOutputStream.grow(Unknown Source)
27+
at java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source)
28+
at java.io.ByteArrayOutputStream.write(Unknown Source)
29+
at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Unknown Source)
30+
at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(Unknown Source)
31+
at java.io.ObjectOutputStream.writeObject0(Unknown Source)
32+
at java.io.ObjectOutputStream.writeObject(Unknown Source)
33+
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
34+
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
35+
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:239)
36+
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
37+
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
38+
at java.lang.Thread.run(Unknown Source)
39+
```
40+
41+
```error
42+
ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
43+
44+
java.lang.OutOfMemoryError
45+
at java.io.ByteArrayOutputStream.hugeCapacity(Unknown Source)
46+
...
47+
```
48+
49+
### Cause
50+
51+
The most likely cause of this exception is not enough heap memory. Your Spark application requires enough Java Virtual Machines (JVM) heap memory when running as executors or drivers.
52+
53+
### Resolution
54+
55+
1. Determine the maximum size of the data the Spark application will handle. Make an estimate of the size based on the maximum of the size of input data, the intermediate data produced by transforming the input data and the output data produced further transforming the intermediate data. If the initial estimate is not sufficient, increase the size slightly, and iterate until the memory errors subside.
56+
57+
1. Make sure that the HDInsight cluster to be used has enough resources in terms of memory and also cores to accommodate the Spark application. This can be determined by viewing the Cluster Metrics section of the YARN UI of the cluster for the values of Memory Used vs. Memory Total and VCores Used vs. VCores Total.
58+
59+
![yarn core memory view](./media/apache-spark-ts-outofmemory/yarn-core-memory-view.png)
60+
61+
1. Set the following Spark configurations to appropriate values. Balance the application requirements with the available resources in the cluster. These values should not exceed 90% of the available memory and cores as viewed by YARN, and should also meet the minimum memory requirement of the Spark application:
62+
63+
```
64+
spark.executor.instances (Example: 8 for 8 executor count)
65+
spark.executor.memory (Example: 4g for 4 GB)
66+
spark.yarn.executor.memoryOverhead (Example: 384m for 384 MB)
67+
spark.executor.cores (Example: 2 for 2 cores per executor)
68+
spark.driver.memory (Example: 8g for 8GB)
69+
spark.driver.cores (Example: 4 for 4 cores)
70+
spark.yarn.driver.memoryOverhead (Example: 384m for 384MB)
71+
```
72+
73+
Total memory used by all executors =
74+
75+
```
76+
spark.executor.instances * (spark.executor.memory + spark.yarn.executor.memoryOverhead)
77+
```
78+
79+
Total memory used by driver =
80+
81+
```
82+
spark.driver.memory + spark.yarn.driver.memoryOverhead
83+
```
84+
85+
---
86+
87+
## Scenario: Java heap space error when trying to open Apache Spark history server
88+
89+
### Issue
90+
91+
You receive the following error when opening events in Spark History server:
92+
93+
```
94+
scala.MatchError: java.lang.OutOfMemoryError: Java heap space (of class java.lang.OutOfMemoryError)
95+
```
96+
97+
### Cause
98+
99+
This issue is often caused by a lack of resources when opening large spark-event files. The Spark heap size is set to 1 GB by default, but large Spark event files may require more than this.
100+
101+
If you would like to verify the size of the files that you are trying to load, you can perform the following commands:
102+
103+
```bash
104+
hadoop fs -du -s -h wasb:///hdp/spark2-events/application_1503957839788_0274_1/
105+
**576.5 M** wasb:///hdp/spark2-events/application_1503957839788_0274_1
106+
107+
hadoop fs -du -s -h wasb:///hdp/spark2-events/application_1503957839788_0264_1/
108+
**2.1 G** wasb:///hdp/spark2-events/application_1503957839788_0264_1
109+
```
110+
111+
### Resolution
112+
113+
You can increase the Spark History Server memory by editing the `SPARK_DAEMON_MEMORY` property in the Spark configuration and restarting all the services.
114+
115+
You can do this from within the Ambari browser UI by selecting the Spark2/Config/Advanced spark2-env section.
116+
117+
![Advanced spark2-env section](./media/apache-spark-ts-outofmemory-heap-space/image01.png)
118+
119+
Add the following property to change the Spark History Server memory from 1g to 4g: `SPARK_DAEMON_MEMORY=4g`.
120+
121+
![Spark property](./media/apache-spark-ts-outofmemory-heap-space/image02.png)
122+
123+
Make sure to restart all affected services from Ambari.
124+
125+
---
126+
127+
## Scenario: Livy Server fails to start on Apache Spark cluster
128+
129+
### Issue
16130

17131
Livy Server cannot be started on an Apache Spark [(Spark 2.1 on Linux (HDI 3.6)]. Attempting to restart results in the following error stack, from the Livy logs:
18132

@@ -72,15 +186,15 @@ Exception in thread "main" java.lang.OutOfMemoryError: unable to create new nati
72186
## using "vmstat" found we had enough free memory
73187
```
74188

75-
## Cause
189+
### Cause
76190

77191
`java.lang.OutOfMemoryError: unable to create new native thread` highlights OS cannot assign more native threads to JVMs. Confirmed that this Exception is caused by the violation of per-process thread count limit.
78192

79193
When Livy Server terminates unexpectedly, all the connections to Spark Clusters are also terminated, which means that all the jobs and related data will be lost. In HDP 2.6 session recovery mechanism was introduced, Livy stores the session details in Zookeeper to be recovered after the Livy Server is back.
80194

81195
When large number of jobs are submitted via Livy, as part of High Availability for Livy Server stores these session states in ZK (on HDInsight clusters) and recover those sessions when the Livy service is restarted. On restart after unexpected termination, Livy creates one thread per session and this accumulates a certain number of to-be-recovered sessions causing too many threads being created.
82196

83-
## Resolution
197+
### Resolution
84198

85199
Delete all entries using steps detailed below.
86200

@@ -121,10 +235,16 @@ Delete all entries using steps detailed below.
121235
> [!NOTE]
122236
> `DELETE` the livy session once it is completed its execution. The Livy batch sessions will not be deleted automatically as soon as the spark app completes, which is by design. A Livy session is an entity created by a POST request against Livy Rest server. A `DELETE` call is needed to delete that entity. Or we should wait for the GC to kick in.
123237

238+
---
239+
124240
## Next steps
125241

126242
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
127243
244+
* [Spark memory management overview](https://spark.apache.org/docs/latest/tuning.html#memory-management-overview).
245+
246+
* [Debugging Spark application on HDInsight clusters](https://blogs.msdn.microsoft.com/azuredatalake/2016/12/19/spark-debugging-101/).
247+
128248
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
129249
130250
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience by connecting the Azure community to the right resources: answers, support, and experts.

0 commit comments

Comments
 (0)