|
1 | 1 | ---
|
2 |
| -title: Livy Server fails to start on Apache Spark cluster in Azure HDInsight |
3 |
| -description: Livy Server fails to start on Apache Spark cluster in Azure HDInsight |
| 2 | +title: OutOfMemoryError exceptions for Apache Spark in Azure HDInsight |
| 3 | +description: Various OutOfMemoryError exceptions for Apache Spark in Azure HDInsight |
4 | 4 | ms.service: hdinsight
|
5 | 5 | ms.topic: troubleshooting
|
6 | 6 | author: hrasheed-msft
|
7 | 7 | ms.author: hrasheed
|
8 |
| -ms.date: 07/29/2019 |
| 8 | +ms.date: 08/02/2019 |
9 | 9 | ---
|
10 | 10 |
|
11 |
| -# Scenario: Livy Server fails to start on Apache Spark cluster in Azure HDInsight |
| 11 | +# OutOfMemoryError exceptions for Apache Spark in Azure HDInsight |
12 | 12 |
|
13 | 13 | This article describes troubleshooting steps and possible resolutions for issues when using Apache Spark components in Azure HDInsight clusters.
|
14 | 14 |
|
15 |
| -## Issue |
| 15 | +## Scenario: OutOfMemoryError exception for Apache Spark |
| 16 | + |
| 17 | +### Issue |
| 18 | + |
| 19 | +Your Apache Spark application failed with an OutOfMemoryError unhandled exception. You may receive an error message similar to: |
| 20 | + |
| 21 | +```error |
| 22 | +ERROR Executor: Exception in task 7.0 in stage 6.0 (TID 439) |
| 23 | +
|
| 24 | +java.lang.OutOfMemoryError |
| 25 | + at java.io.ByteArrayOutputStream.hugeCapacity(Unknown Source) |
| 26 | + at java.io.ByteArrayOutputStream.grow(Unknown Source) |
| 27 | + at java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source) |
| 28 | + at java.io.ByteArrayOutputStream.write(Unknown Source) |
| 29 | + at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Unknown Source) |
| 30 | + at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(Unknown Source) |
| 31 | + at java.io.ObjectOutputStream.writeObject0(Unknown Source) |
| 32 | + at java.io.ObjectOutputStream.writeObject(Unknown Source) |
| 33 | + at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) |
| 34 | + at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101) |
| 35 | + at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:239) |
| 36 | + at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) |
| 37 | + at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) |
| 38 | + at java.lang.Thread.run(Unknown Source) |
| 39 | +``` |
| 40 | + |
| 41 | +```error |
| 42 | +ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] |
| 43 | +
|
| 44 | +java.lang.OutOfMemoryError |
| 45 | + at java.io.ByteArrayOutputStream.hugeCapacity(Unknown Source) |
| 46 | + ... |
| 47 | +``` |
| 48 | + |
| 49 | +### Cause |
| 50 | + |
| 51 | +The most likely cause of this exception is not enough heap memory. Your Spark application requires enough Java Virtual Machines (JVM) heap memory when running as executors or drivers. |
| 52 | + |
| 53 | +### Resolution |
| 54 | + |
| 55 | +1. Determine the maximum size of the data the Spark application will handle. Make an estimate of the size based on the maximum of the size of input data, the intermediate data produced by transforming the input data and the output data produced further transforming the intermediate data. If the initial estimate is not sufficient, increase the size slightly, and iterate until the memory errors subside. |
| 56 | + |
| 57 | +1. Make sure that the HDInsight cluster to be used has enough resources in terms of memory and also cores to accommodate the Spark application. This can be determined by viewing the Cluster Metrics section of the YARN UI of the cluster for the values of Memory Used vs. Memory Total and VCores Used vs. VCores Total. |
| 58 | + |
| 59 | +  |
| 60 | + |
| 61 | +1. Set the following Spark configurations to appropriate values. Balance the application requirements with the available resources in the cluster. These values should not exceed 90% of the available memory and cores as viewed by YARN, and should also meet the minimum memory requirement of the Spark application: |
| 62 | + |
| 63 | + ``` |
| 64 | + spark.executor.instances (Example: 8 for 8 executor count) |
| 65 | + spark.executor.memory (Example: 4g for 4 GB) |
| 66 | + spark.yarn.executor.memoryOverhead (Example: 384m for 384 MB) |
| 67 | + spark.executor.cores (Example: 2 for 2 cores per executor) |
| 68 | + spark.driver.memory (Example: 8g for 8GB) |
| 69 | + spark.driver.cores (Example: 4 for 4 cores) |
| 70 | + spark.yarn.driver.memoryOverhead (Example: 384m for 384MB) |
| 71 | + ``` |
| 72 | +
|
| 73 | + Total memory used by all executors = |
| 74 | +
|
| 75 | + ``` |
| 76 | + spark.executor.instances * (spark.executor.memory + spark.yarn.executor.memoryOverhead) |
| 77 | + ``` |
| 78 | +
|
| 79 | + Total memory used by driver = |
| 80 | +
|
| 81 | + ``` |
| 82 | + spark.driver.memory + spark.yarn.driver.memoryOverhead |
| 83 | + ``` |
| 84 | +
|
| 85 | +--- |
| 86 | +
|
| 87 | +## Scenario: Java heap space error when trying to open Apache Spark history server |
| 88 | +
|
| 89 | +### Issue |
| 90 | +
|
| 91 | +You receive the following error when opening events in Spark History server: |
| 92 | +
|
| 93 | +``` |
| 94 | +scala.MatchError: java.lang.OutOfMemoryError: Java heap space (of class java.lang.OutOfMemoryError) |
| 95 | +``` |
| 96 | +
|
| 97 | +### Cause |
| 98 | +
|
| 99 | +This issue is often caused by a lack of resources when opening large spark-event files. The Spark heap size is set to 1 GB by default, but large Spark event files may require more than this. |
| 100 | +
|
| 101 | +If you would like to verify the size of the files that you are trying to load, you can perform the following commands: |
| 102 | +
|
| 103 | +```bash |
| 104 | +hadoop fs -du -s -h wasb:///hdp/spark2-events/application_1503957839788_0274_1/ |
| 105 | +**576.5 M** wasb:///hdp/spark2-events/application_1503957839788_0274_1 |
| 106 | +
|
| 107 | +hadoop fs -du -s -h wasb:///hdp/spark2-events/application_1503957839788_0264_1/ |
| 108 | +**2.1 G** wasb:///hdp/spark2-events/application_1503957839788_0264_1 |
| 109 | +``` |
| 110 | + |
| 111 | +### Resolution |
| 112 | + |
| 113 | +You can increase the Spark History Server memory by editing the `SPARK_DAEMON_MEMORY` property in the Spark configuration and restarting all the services. |
| 114 | + |
| 115 | +You can do this from within the Ambari browser UI by selecting the Spark2/Config/Advanced spark2-env section. |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +Add the following property to change the Spark History Server memory from 1g to 4g: `SPARK_DAEMON_MEMORY=4g`. |
| 120 | + |
| 121 | + |
| 122 | + |
| 123 | +Make sure to restart all affected services from Ambari. |
| 124 | + |
| 125 | +--- |
| 126 | + |
| 127 | +## Scenario: Livy Server fails to start on Apache Spark cluster |
| 128 | + |
| 129 | +### Issue |
16 | 130 |
|
17 | 131 | Livy Server cannot be started on an Apache Spark [(Spark 2.1 on Linux (HDI 3.6)]. Attempting to restart results in the following error stack, from the Livy logs:
|
18 | 132 |
|
@@ -72,15 +186,15 @@ Exception in thread "main" java.lang.OutOfMemoryError: unable to create new nati
|
72 | 186 | ## using "vmstat" found we had enough free memory
|
73 | 187 | ```
|
74 | 188 |
|
75 |
| -## Cause |
| 189 | +### Cause |
76 | 190 |
|
77 | 191 | `java.lang.OutOfMemoryError: unable to create new native thread` highlights OS cannot assign more native threads to JVMs. Confirmed that this Exception is caused by the violation of per-process thread count limit.
|
78 | 192 |
|
79 | 193 | When Livy Server terminates unexpectedly, all the connections to Spark Clusters are also terminated, which means that all the jobs and related data will be lost. In HDP 2.6 session recovery mechanism was introduced, Livy stores the session details in Zookeeper to be recovered after the Livy Server is back.
|
80 | 194 |
|
81 | 195 | When large number of jobs are submitted via Livy, as part of High Availability for Livy Server stores these session states in ZK (on HDInsight clusters) and recover those sessions when the Livy service is restarted. On restart after unexpected termination, Livy creates one thread per session and this accumulates a certain number of to-be-recovered sessions causing too many threads being created.
|
82 | 196 |
|
83 |
| -## Resolution |
| 197 | +### Resolution |
84 | 198 |
|
85 | 199 | Delete all entries using steps detailed below.
|
86 | 200 |
|
@@ -121,10 +235,16 @@ Delete all entries using steps detailed below.
|
121 | 235 | > [!NOTE]
|
122 | 236 | > `DELETE` the livy session once it is completed its execution. The Livy batch sessions will not be deleted automatically as soon as the spark app completes, which is by design. A Livy session is an entity created by a POST request against Livy Rest server. A `DELETE` call is needed to delete that entity. Or we should wait for the GC to kick in.
|
123 | 237 |
|
| 238 | +--- |
| 239 | + |
124 | 240 | ## Next steps
|
125 | 241 |
|
126 | 242 | If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
|
127 | 243 |
|
| 244 | +* [Spark memory management overview](https://spark.apache.org/docs/latest/tuning.html#memory-management-overview). |
| 245 | +
|
| 246 | +* [Debugging Spark application on HDInsight clusters](https://blogs.msdn.microsoft.com/azuredatalake/2016/12/19/spark-debugging-101/). |
| 247 | +
|
128 | 248 | * Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
|
129 | 249 |
|
130 | 250 | * Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience by connecting the Azure community to the right resources: answers, support, and experts.
|
|
0 commit comments