|
| 1 | +--- |
| 2 | +title: Apache HBase Master fails to start in Azure HDInsight |
| 3 | +description: Apache HBase Master (HMaster) fails to start in Azure HDInsight |
| 4 | +ms.service: hdinsight |
| 5 | +ms.topic: troubleshooting |
| 6 | +author: hrasheed-msft |
| 7 | +ms.author: hrasheed |
| 8 | +ms.date: 08/06/2019 |
| 9 | +--- |
| 10 | + |
| 11 | +# Apache HBase Master (HMaster) fails to start in Azure HDInsight |
| 12 | + |
| 13 | +This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters. |
| 14 | + |
| 15 | +## Scenario: Atomic renaming failure |
| 16 | + |
| 17 | +### Issue |
| 18 | + |
| 19 | +Unexpected files identified during startup process. |
| 20 | + |
| 21 | +### Cause |
| 22 | + |
| 23 | +During the startup process, HMaster performs many initialization steps, including moving data from scratch (.tmp) folder to data folder. HMaster also looks at WALs (Write Ahead Logs) folder to see if there are any dead region servers. During all these situations, it does a basic `list` command on these folders. If at any time it sees an unexpected file in any of these folders, it will throw an exception and hence not start. |
| 24 | + |
| 25 | +### Resolution |
| 26 | + |
| 27 | +In such a situation, check the call stack to see which folder might be causing problem (for instance is it WALs folder or .tmp folder). Then via Cloud Explorer or via hdfs commands to locate the problem file. The problem file is usually a `*-renamePending.json` file (a journal file used to implement Atomic Rename operation in WASB driver). Due to bugs in this implementation, such files can be left over in cases of process crash. Force delete this file via Cloud Explorer. In addition, there might be a temporary file of the nature $ in this location. The file cannot be seen via cloud explorer and only via hdfs `ls` command. You can use hdfs command `hdfs dfs -rm //\$\$\$.\$\$\$` to delete this file. |
| 28 | + |
| 29 | +Once the problem file has been removed, HMaster should start up immediately. |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## Scenario: No server address listed |
| 34 | + |
| 35 | +### Issue |
| 36 | + |
| 37 | +HMaster log shows an error message similar to "No server address listed in hbase: meta for region xxx." |
| 38 | + |
| 39 | +### Cause |
| 40 | + |
| 41 | +HMaster could not initialize after restarting HBase. |
| 42 | + |
| 43 | +### Resolution |
| 44 | + |
| 45 | +1. Execute the following commands on HBase shell (change actual values as applicable): |
| 46 | + |
| 47 | + ``` |
| 48 | + scan 'hbase:meta' |
| 49 | + delete 'hbase:meta','hbase:backup <region name>','<column name>' |
| 50 | + ``` |
| 51 | +
|
| 52 | +1. Delete the entry of hbase: namespace as the same error may be reported while scan hbase: namespace table. |
| 53 | +
|
| 54 | +1. Restart the active HMaster from Ambari UI to bring up HBase in running state. |
| 55 | +
|
| 56 | +1. Run the following command on HBase shell to bring up all offline tables: |
| 57 | +
|
| 58 | + ``` |
| 59 | + hbase hbck -ignorePreCheckPermission -fixAssignments |
| 60 | + ``` |
| 61 | +
|
| 62 | +--- |
| 63 | +
|
| 64 | +## Scenario: java.io.IOException: Timedout |
| 65 | +
|
| 66 | +### Issue |
| 67 | +
|
| 68 | +HMaster times out with fatal exception like `java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned`. |
| 69 | +
|
| 70 | +### Cause |
| 71 | +
|
| 72 | +The time-out is a known defect with HMaster. General cluster startup tasks can take a long time. HMaster shuts down if the namespace table isn’t yet assigned. The lengthy startup tasks happen where large amount of unflushed data exists and a timeout of five minutes is not sufficient. |
| 73 | +
|
| 74 | +### Resolution |
| 75 | +
|
| 76 | +1. Access Ambari UI, go to HBase -> Configs, in custom `hbase-site.xml` add the following setting: |
| 77 | +
|
| 78 | + ``` |
| 79 | + Key: hbase.master.namespace.init.timeout Value: 2400000 |
| 80 | + ``` |
| 81 | +
|
| 82 | +1. Restart required services (Mainly HMaster and possibly other HBase services). |
| 83 | +
|
| 84 | +--- |
| 85 | +
|
| 86 | +## Scenario: Frequent regionserver restarts |
| 87 | +
|
| 88 | +### Issue |
| 89 | +
|
| 90 | +Nodes reboot periodically. From the regionserver logs you may see entries similar to: |
| 91 | +
|
| 92 | +``` |
| 93 | +2017-05-09 17:45:07,683 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 31000ms |
| 94 | +2017-05-09 17:45:07,683 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 31000ms |
| 95 | +2017-05-09 17:45:07,683 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 31000ms |
| 96 | +``` |
| 97 | +
|
| 98 | +### Cause |
| 99 | +
|
| 100 | +Long regionserver JVM GC pause. The pause will cause regionserver to be unresponsive and not able to send heart beat to HMaster within the zk session timeout 40s. HMaster will believe regionserver is dead and will abort the regionserver and restart. |
| 101 | +
|
| 102 | +### Resolution |
| 103 | +
|
| 104 | +Change the zookeeper session timeout, not only hbase-site setting `zookeeper.session.timeout` but also zookeeper zoo.cfg setting `maxSessionTimeout` need to be changed. |
| 105 | +
|
| 106 | +1. Access Ambari UI, go to **HBase -> Configs -> Settings**, in Timeouts section, change the value of Zookeeper Session Timeout. |
| 107 | +
|
| 108 | +1. Access Ambari UI, go to **Zookeeper -> Configs -> Custom** zoo.cfg, add/change the following setting. Make sure the value is the same as hbase `zookeeper.session.timeout`. |
| 109 | +
|
| 110 | + ``` |
| 111 | + Key: maxSessionTimeout Value: 120000 |
| 112 | + ``` |
| 113 | +
|
| 114 | +1. Restart required services. |
| 115 | +
|
| 116 | +--- |
| 117 | +
|
| 118 | +## Scenario: Log splitting failure |
| 119 | +
|
| 120 | +### Issue |
| 121 | +
|
| 122 | +HMasters failed to come up on a HBase cluster. |
| 123 | +
|
| 124 | +### Cause |
| 125 | +
|
| 126 | +Misconfigured HDFS and HBase settings for a secondary storage account. |
| 127 | +
|
| 128 | +### Resolution |
| 129 | +
|
| 130 | +set hbase.rootdir: wasb://@.blob.core.windows.net/hbase and restart services on Ambari. |
| 131 | +
|
| 132 | +--- |
| 133 | +
|
| 134 | +## Next steps |
| 135 | +
|
| 136 | +If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support: |
| 137 | +
|
| 138 | +* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/). |
| 139 | +
|
| 140 | +* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts. |
| 141 | +
|
| 142 | +* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/). |
0 commit comments