[SPARK-11821] Propagate Kerberos keytab for all environments

woj-i · Marcelo Vanzin · commit 99dc1335e2f6 · 2015-12-01T11:06:08.000-08:00
andrewor14 the same PR as in branch 1.5 harishreedharan Author: woj-i <wojciechindyk@gmail.com> Closes apache#9859 from woj-i/master. (cherry picked from commit 6a8cf80) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -545,6 +545,10 @@ object SparkSubmit {
       if (args.isPython) {
         sysProps.put("spark.yarn.isPython", "true")
       }
+    }
+
+    // assure a keytab is available from any place in a JVM
+    if (clusterManager == YARN || clusterManager == LOCAL) {
       if (args.principal != null) {
         require(args.keytab != null, "Keytab must be specified when principal is specified")
         if (!new File(args.keytab).exists()) {
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
@@ -358,14 +358,14 @@ If you need a reference to the proper location to put log files in the YARN so t
   <td>
   The full path to the file that contains the keytab for the principal specified above.
   This keytab will be copied to the node running the YARN Application Master via the Secure Distributed Cache,
-  for renewing the login tickets and the delegation tokens periodically.
+  for renewing the login tickets and the delegation tokens periodically. (Works also with the "local" master)
   </td>
 </tr>
 <tr>
   <td><code>spark.yarn.principal</code></td>
   <td>(none)</td>
   <td>
-  Principal to be used to login to KDC, while running on secure HDFS.
+  Principal to be used to login to KDC, while running on secure HDFS. (Works also with the "local" master)
   </td>
 </tr>
 <tr>
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
@@ -1614,7 +1614,8 @@ This command builds a new assembly jar that includes Hive. Note that this Hive a
 on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries
 (SerDes) in order to access data stored in Hive.
 
-Configuration of Hive is done by placing your `hive-site.xml` file in `conf/`. Please note when running
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` (for security configuration),
+ `hdfs-site.xml` (for HDFS configuration) file in `conf/`. Please note when running
 the query on a YARN cluster (`cluster` mode), the `datanucleus` jars under the `lib_managed/jars` directory
 and `hive-site.xml` under `conf/` directory need to be available on the driver and all executors launched by the
 YARN cluster. The convenient way to do this is adding them through the `--jars` option and `--file` option of the
@@ -2028,7 +2029,7 @@ Beeline will ask you for a username and password. In non-secure mode, simply ent
 your machine and a blank password. For secure mode, please follow the instructions given in the
 [beeline documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients).
 
-Configuration of Hive is done by placing your `hive-site.xml` file in `conf/`.
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
 
 You may also use the beeline script that comes with Hive.
 
@@ -2053,7 +2054,7 @@ To start the Spark SQL CLI, run the following in the Spark directory:
 
     ./bin/spark-sql
 
-Configuration of Hive is done by placing your `hive-site.xml` file in `conf/`.
+Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` and `hdfs-site.xml` files in `conf/`.
 You may run `./bin/spark-sql --help` for a complete list of all available
 options.