Microsoft Hive ODBC Driver Install Guide
Microsoft Hive ODBC Driver Install Guide
Microsoft Hive ODBC Driver Install Guide
Trademarks
Simba, the Simba logo, SimbaEngine, SimbaEngine C/S, SimbaExpress and SimbaLib
are registered trademarks of Simba Technologies Inc. All other trademarks and/or
servicemarks are the property of their respective owners.
Cyrus SASL
Copyright (c) 1998-2003 Carnegie Mellon University. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are
permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list
of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
3. The name "Carnegie Mellon University" must not be used to endorse or promote
products derived from this software without prior written permission. For
permission or any other legal details, please contact
Office of Technology Transfer
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213-3890
(412) 268-4387, fax: (412) 268-7395
tech-transfer@andrew.cmu.edu
4. Redistributions of any form whatsoever must retain the following
acknowledgment:
"This product includes software developed by Computing Services at Carnegie
Mellon University (http://www.cmu.edu/computing/)."
CARNEGIE MELLON UNIVERSITY DISCLAIMS ALL WARRANTIES WITH REGARD
TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL CARNEGIE MELLON
UNIVERSITY BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
OR PERFORMANCE OF THIS SOFTWARE.
OpenSSL
Copyright (c) 1998-2008 The OpenSSL Project. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are
permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list
of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
3. All advertising materials mentioning features or use of this software must display
the following acknowledgment:
"This product includes software developed by the OpenSSL Project for use in the
OpenSSL Toolkit. (http://www.openssl.org/)"
4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to
endorse or promote products derived from this software without prior written
permission. For written permission, please contact openssl-core@openssl.org.
5. Products derived from this software may not be called "OpenSSL" nor may
"OpenSSL" appear in their names without prior written permission of the
OpenSSL Project.
www.simba.com
ii
Apache Hive
Copyright 2008-2011 The Apache Software Foundation.
Apache Thrift
Copyright 2006-2010 The Apache Software Foundation.
Expat
Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd
Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the ""Software""), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons
to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or
substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NOINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
Printed in Canada
www.simba.com
iii
Table of Contents
Introduction ..................................................................................................................... 1
Windows Driver ............................................................................................................... 1
System Requirements ................................................................................................. 1
Installing the Driver ...................................................................................................... 1
Configuring ODBC Connections .................................................................................. 2
Configuring Authentication .......................................................................................... 5
Using No Authentication .......................................................................................... 5
Using Kerberos ........................................................................................................ 5
Using User Name .................................................................................................... 6
Using User Name and Password ............................................................................. 6
Using User Name and Password (SSL) ................................................................... 6
Using Windows Azure HDInsight Emulator .............................................................. 7
Using Windows Azure HDInsight Service ................................................................ 7
Configuring DSN-Less Authentication ......................................................................... 7
Features .......................................................................................................................... 8
SQL Query versus HiveQL Query ............................................................................... 8
SQL Connector............................................................................................................ 8
Data Types .................................................................................................................. 8
Catalog and Schema Support...................................................................................... 9
Hive System Table ...................................................................................................... 9
Server-side Properties ................................................................................................. 9
Contact Us .................................................................................................................... 10
Appendix A: Authentication Options .............................................................................. 11
Using No Authentication ............................................................................................ 11
Using User Name ...................................................................................................... 12
Using Kerberos.......................................................................................................... 12
Appendix B: Configuring Kerberos Authentication for Windows ..................................... 13
Download and install MIT Kerberos for Windows 4.0.1 .............................................. 13
Set up the Kerberos configuration file in the default location...................................... 13
Set up the Kerberos configuration file in another location .......................................... 13
Set up the Kerberos credential cache file .................................................................. 14
Obtain a ticket for a Kerberos principal using password ............................................ 14
Obtain a ticket for a Kerberos principal using a keytab file ......................................... 15
Obtain a ticket for a Kerberos principal using the default keytab file .......................... 15
www.simba.com
iv
www.simba.com
Introduction
Welcome to the Microsoft Hive ODBC Driver. ODBC is one the most established and
widely supported APIs for connecting to and working with databases. At the heart of the
technology is the ODBC driver, which connects an application to the database.
Microsoft Hive ODBC Driver is used for direct SQL and HiveQL access to Apache
Hadoop / Hive distributions, enabling Business Intelligence (BI), analytics and reporting
on Hadoop / Hive-based data. The driver efficiently transforms an applications SQL
query into the equivalent form in HiveQL. Hive Query Language is a subset of SQL-92. If
an application is Hive-aware, then the driver is configurable to pass the query through.
The driver interrogates Hive to obtain schema information to present to a SQL-based
application. Queries, including joins, are translated from SQL to HiveQL. For more
information about the differences between HiveQL and SQL, refer to the section
Features on page 8.
Microsoft Hive ODBC Driver complies with the ODBC 3.52 data standard and adds
important functionality such as Unicode and 32- and 64-bit support for high-performance
computing environments.
This guide is suitable for users who are looking to access data residing within Hive from
their desktop environment. Application developers may also find the information helpful.
Please refer to your application for details on connecting via ODBC.
Windows Driver
System Requirements
You install Microsoft Hive ODBC Driver on client computers accessing data in a Hadoop
cluster with the Hive service installed and running. Each computer where you install the
driver must meet the following minimum system requirements:
One of the following operating systems (32- and 64-bit editions are supported):
o
Windows Vista
Windows 7 Professional
Important: To install the driver, you need Administrator privileges on the computer.
The driver is suitable for use with all versions of Apache Hive.
www.simba.com
You can install both versions of the driver on the same computer.
Note: For an explanation of how to use ODBC on 64-bit editions of Windows, see
http://www.simba.com/wp-content/uploads/2010/10/HOW-TO-32-bit-vs-64-bit-ODBCData-Source-Administrator.pdf
To install Microsoft Hive ODBC Driver:
1. Depending on the bitness of your client application, double-click to run
HiveODBC32.msi or HiveODBC64.msi
2. Click Next
3. Select the check box to accept the terms of the License Agreement if you agree,
and then click Next
4. To change the installation location, click the Change button, then browse to the
desired folder, and then click OK. To accept the installation location, click Next
5. Click Install
6. When the installation completes, click Finish
7. If you are installing a driver with an evaluation license and you have purchased a
perpetual license, then copy the License.lic file you received via e-mail into the
\lib subfolder in the installation folder you selected in step 4.
www.simba.com
8. Select Microsoft Hive ODBC Driver and then click Finish. The Microsoft Hive
ODBC Driver DSN Setup window opens.
9. In the Data Source Name text box, type a name for your DSN.
10. Optionally, In the Description text box, type relevant details related to the DSN.
11. In the Host text box, type the IP address or hostname of the Hive server.
12. In the Port text box, type the listening port for the service.
13. In the Database text box, type the name of the database schema to use when a
schema is not explicitly specified in a query.
Note: Queries on other schemas can still be issued by explicitly specifying the
schema in the query. To determine the appropriate database schema to use,
type the show databases command at the Hive command prompt to inspect
your databases.
14. For the Hive Server Type, select either Hive Server 1 or Hive Server 2.
15. In the Authentication area, configure authentication as needed. For detailed
instructions, refer to the section "Configuring Authentication" on page 5.
16. Optionally, click Advanced Options. In the Advanced Options window:
a) Select the Use Native Query checkbox to disable the SQL Connector
feature.
Note: The SQL Connector feature has been added to the driver to apply
transformations to the queries emitted by an application to convert them
into an equivalent form in HiveQL. If the application is Hive aware and
already emits HiveQL then turning off the SQL Connector feature avoids
the extra overhead of query transformation.
b) Select the Fast SQLPrepare checkbox to defer query execution to
SQLExecute.
Note: When using Native Query mode, the driver will execute the HiveQL
query to retrieve the result set metadata for SQLPrepare. As a result,
SQLPrepare might be slow. If the result set metadata is not required after
calling SQLPrepare, then enable this option.
c) Select the Get Tables With Query checkbox to retrieve the names of
tables in a particular database using the GET TABLES query.
Note: This setting is only applicable when connecting to Hive Server 2.
d) In the Rows Fetched Per Block field, type the number of rows to be
fetched per block.
Note: Any positive 32-bit integer is a valid value but testing has shown
that performance gains are marginal beyond the default value of 10000
rows.
e) In the Default String Column Length field, type the maximum data
length for string columns.
www.simba.com
Note: Hive does not provide the maximum data length for String columns
in the columns metadata. This option allows you to tune the maximum
data length for String columns.
f)
In the Binary column length field, type the maximum data length for
binary columns.
Note: Hive does not provide the maximum data length for Binary columns
in the columns metadata. The option allows you to tune the maximum
data length for Binary columns.
g) In the Decimal Column Scale field, type the maximum number of digits
to the right of the decimal point for numeric data types.
h) In the Async Exec Poll Interval field enter the time in millisecond
between each poll of the query execution status.
Note: This is only applicable to HDInsight clusters and asynchronous
execution here doesnt mean ODBC asynchronous operations are
supported; it only means the RPC call to Hive to execute a query is
asynchronous.
i)
j)
Enter the path of the file containing the trusted certificates (e.g. certificate
from the Hive Server) in the Trusted Certificates edit box to configure
the driver to load the certificates from the specified file to authenticate the
Hive server when using SSL.
Note: This is only applicable to User Name and Password (SSL) and
Windows Azure HDInsight Service authentication mechanisms, and will
be ignored by other authentication mechanisms.
Note: SSL certificates in the trusted certificates file has to be in the PEM
format.
Note: If this setting is not set the driver will default to using the trusted CA
certificates PEM file installed by the driver.
www.simba.com
If you selected Hive Server 2 as the Hive server type, then select or clear
the Apply Server Side Properties with Queries check box as needed.
Note: If you selected Hive Server 2, then the Apply Server Side
Properties with Queries check box is selected by default. Selecting the
check box configures the driver to apply each server-side property you
set by executing a query when opening a session to the Hive server.
Clearing the check box configures the driver to use a more efficient
method to apply server-side properties that does not involve additional
network round tripping. Some Hive Server 2 builds are not compatible
with the more efficient method. If the server-side properties you set do not
take effect when the check box is clear, then select the check box. If you
selected Hive Server 1 as the Hive server type, then the Apply Server
Side Properties with Queries check box is selected and unavailable.
m) Click OK
17. Click Test to test the connection and then click OK
Configuring Authentication
For details on selecting the appropriate authentication for a DSN using Microsoft Hive
ODBC Driver, see Appendix A: Authentication Options on page 11.
Using No Authentication
To connect to a Hive server without authenticating the connection:
Click the drop-down arrow next to the Mechanism field, and then select No
Authentication
Note: When connecting to a Hive server of type Hive Server 1 on a non-HDInsight
distribution, use No Authentication.
Using Kerberos
To use Kerberos authentication, Kerberos must be configured prior to use. See
Appendix B: Configuring Kerberos Authentication for Windows on page 13 for details.
After Kerberos has been installed and configured, then set the following options in the
Authentication area in the Microsoft Hive ODBC Driver DSN Setup dialog:
1. Click the drop-down arrow next to the Mechanism field, and then select
Kerberos
2. If no default realm is configured for your Kerberos implementation or the realm of
your Hive Server 2 is not the default, then type the value for the Kerberos realm
www.simba.com
of the Hive Server 2 host in the Realm field. To use the default realm, leave the
Realm field empty.
3. In the Host FQDN field, type the value for the fully qualified domain name of the
Hive Server 2 host.
4. In the Service Name field, type the value for the service name of the Hive Server
www.simba.com
www.simba.com
Features
SQL Query versus HiveQL Query
The native query language supported by Hive is HiveQL. For simple queries, HiveQL is
a subset of SQL-92. However, for most applications, the syntax is different enough that
most applications do not work with native HiveQL.
SQL Connector
To bridge the difference between SQL and HiveQL, the SQL Connector feature
translates standard SQL-92 queries into equivalent HiveQL queries. The SQL Connector
performs syntactical translations and structural transformations. For example:
JOIN, INNER JOIN and CROSS JOINSQL INNER JOIN and CROSS JOIN
syntax is translated to HiveQL JOIN syntax.
Data Types
The following data types are supported:
www.simba.com
TINYINT
SMALLINT
INT
BIGINT
FLOAT
DOUBLE
DECIMAL
BOOLEAN
STRING
TIMESTAMP
Note: The aggregate types (ARRAY, MAP and STRUCT) are not yet supported.
Columns of aggregate types are treated as STRING columns.
The example query returns all of the Hive system environment entries whose key
contains the word hive. A special query, set v, is executed to fetch system
environment information and is not supported by all Hive versions. For versions of Hive
that do not support querying system environment information, the driver returns an
empty result set.
Server-side Properties
The Microsoft Hive ODBC Driver allows you to set server-side properties via a DSN.
Server-side properties specified in a DSN affect only the connection established using
the DSN.
For details on setting server-side properties for a DSN, see Configuring ODBC
Connections on page 2.
www.simba.com
Contact Us
If you have difficulty using the driver, please contact our Technical Support staff. We
welcome your questions, comments and feature requests.
Important: To help us assist you, prior to contacting Technical Support please prepare a
detailed summary of the client and server environment including operating system
version, patch level and configuration.
www.simba.com
10
No Authentication
No Authentication
Kerberos
User Name
Note: Kerberos and User Name authentication mechanisms are available only for Hive
Server 2 on non-HDInsight distributions.
To determine the authentication mechanism configured for your Hive Server 2 on nonHDInsight distributions, examine your hive-site.xml file. Examine the following
properties to determine which authentication mechanism your server is set to use:
hive.server2.authentication
hive.server2.enable.doAs
hive.server2.authentication
hive.server2.enable.doAs
NOSASL
False
No Authentication
KERBEROS
True or False
Kerberos
NONE
True or False
User Name
Using No Authentication
When hive.server2.authentication is set to NOSASL, you must configure your
connection to use No Authentication.
www.simba.com
11
If the User Name in the DSN or driver configuration is not supplied, then the driver
defaults to using anonymous as the user name.
Note: If you deploy Hadoop using Apache Ambari, then by default the authentication
method is User Name.
Using Kerberos
When connecting to a Hive server of type Hive Server 2 on a non-HDInsight distribution
and hive.server2.authentication is set to KERBEROS, then you must configure your
connection to use Kerberos.
www.simba.com
12
www.simba.com
13
www.simba.com
14
www.simba.com
15
Default Value
Description
Driver
HOST
PORT
443 for
Windows Azure
HDInsight
Service, 10001
for Windows
Azure
HDInsight
Emulator, and
10000 for nonHDInsight
clusters.
Schema
default
www.simba.com
16
Key
Default Value
Description
DefaultStringColumnLength
65536
BinaryColumnLength
32767
UseNativeQuery
www.simba.com
17
Key
Default Value
Description
FastSQLPrepare
GetTablesWithQuery
RowsFetchedPerBlock
10000
DecimalColumnScale
10
SSP_
www.simba.com
18
Key
Default Value
Description
SSP_mapred.queue.names=myQ
ueue
After the driver applies the serverside property, the SSP_ prefix is
removed from the DSN entry
leaving an entry of
SSPKey=SSPValue
Note: The SSP_ prefix must be
upper case.
(Optional)
ApplySSPWithQueries
HiveServerType
AuthMech
www.simba.com
19
Key
Default Value
Description
Service.
(Optional)
KrbHostFQDN
KrbServiceName
KrbRealm
Depends on
Kerberos
configuration.
HTTPPath
UID
PWD
www.simba.com
20
Key
Default Value
Description
authentication mechanisms.
(Required if AuthMech is Windows
Azure HDInsight Service, User
Name and Password, or User
Name and Password (SSL))
CAIssuedCertNamesMismatch
TrustedCerts
For 32 bit
driver:
C:\Program
Files
(x86)\Microsoft
Hive ODBC
Driver\lib\cacert
s.pem
For 64 bit
driver:
C:\Program
Files\Microsoft
Hive ODBC
Driver\lib\
cacerts.pem
www.simba.com
21
Key
AsyncExecPollInterval
Default Value
100
Description
The time in millisecond between
each poll for the status of an
asynchronous query execution.
Note: This configuration is only
applicable to HDInsight clusters
and will be ignored when
connecting to non-HDInsight
clusters.
Note: Asynchronous execution
here doesnt mean ODBC
asynchronous operations are
supported; it only means the RPC
call used to execute a query
against Hive is asynchronous.
(Optional)
ForceSynchronousExec
www.simba.com
22