0% found this document useful (0 votes)
65 views

Hive Code

The document provides an overview of common Hive commands used for interacting with Hive databases, tables, and data. Some key points covered include: - How to start the Hive shell, check HDFS contents, create databases and tables, load and query data, and drop databases and tables. - How to create external tables in Hive, load data into different file formats like ORC and Parquet, and perform operations on tables like renaming, adding/dropping columns. - How to query data using conditions, sorting, aggregation, joins, and views.

Uploaded by

Abdul Khaliq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Hive Code

The document provides an overview of common Hive commands used for interacting with Hive databases, tables, and data. Some key points covered include: - How to start the Hive shell, check HDFS contents, create databases and tables, load and query data, and drop databases and tables. - How to create external tables in Hive, load data into different file formats like ORC and Parquet, and perform operations on tables like renaming, adding/dropping columns. - How to query data using conditions, sorting, aggregation, joins, and views.

Uploaded by

Abdul Khaliq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Hive Illustration : Basics

 To get started with the hive shell,

hive

 To check what is present in the HDFS,

hadoop fs – ls

 To create a directory in the current path (let’s say the name is ‘foo’),

hadoop fs - mkdir foo

 To create a database, in the hive shell (let’s say the name is ‘vin_emp’),

create database vin_emp;

 To see existing databases,

show databases;

 To start using the database,

use vin_emp;

 To check the tables present in the database,

show tables;

 To come out of the hive shell,

quit;

 to list contents of the current working directory,

ls

 To create a directory,

mkdir myhivedata

 To navigate into that data,

cd myhivedata

 To check the present working directory

pwd

 To check the contents of a file (name of the file is employees.txt),

cat employees.txt

 To create a table within the hive shell,

create table emp_global(id int,name string,city string,continent string)


row format delimited
fields terminated by ‘ , ‘
stored as textfile ;
 To check the tables,
show tables ;

 To query the table,

select * from emp_global ;

 To load data into the table,

load local inpath ‘employees.txt’ into table emp_global;

 To drop all tables inside a database,

drop database vin_emp cascade,

 To know the schema of the table,

describe emp_global ;

 To drop a table,

drop table emp_global ;

Hive Illustration : External tables in hive


 To create a database in a certain desired location,

create database vin_emp_loc location ‘/user/cloudera/myhivedata’ ;

 To copy a file from local file system to hdfs location,

hadoop fs – put empglobal.csv empdata

 To see the contents of the file,

hadoop df – cat empdata/empglobal.csv

Hive Illustration : Loading different file formats

 How to know what the table type is, whether internal or external,

describe extended emp_global ;

 To load the data into orc table,

insert into table emp_global_orc select * from emp_global ;

 Create a table whose schema is exactly like an existing table,


create table emp_global_seq LIKE emp_global_orc stored as sequencefile ;

Hive Illustration : Loading data into Hive tables


 Create table only if another table of the same name doesn’t exist and an input multiple
values in a single column using an array,

create table if not exists sibling_data (


name string, age int, country string, siblings array<string> )
row format delimited
fields terminated by ‘ , ‘
collection items terminated by ‘#’
lines terminated by ‘\n’
sorted as textfile ;

 To create table with multiple inputs of different data type in a single column,

create table auto_details(company string, model string, fuel string,


basic_specs struct<vehicle_type : string, doors : int, gears : int>,
engine_specs struct<cc : int, bhp : double>)
row format delimited
fields terminated by ‘ , ‘
collection items terminated by ‘#’ ;

Hive Illustration : Simple Operations on Hive tables


 To rename an existing table,
alter table auto_details rename to auto_table ;
 To change the name of any column,
alter table auto_details change fuel fuel_type string ;
 To add a new column to an existing table,
alter table auto_details add columns (milage double) ;
 To drop columns,(mention columns which need to remain inside the brackets after “replace”
keyword)
alter table auto_details replace (company string, model string, fuel_type string) ;

Hive Illustration : Query Operations on Hive tables


 To create a table inside a desired pre-existing database, without navigating into the
database first
create table if not exists company.empdata (
empid int,
empname string,
salary double,
designation string,
department string,,
salary double,
designation string,
department string,
age int)
row format delimited
fields terminated by ‘ , ‘
lines terminated by ‘\n’
tblproperties(‘skip.header.line.count’ = ‘1’) ;

 To select all columns and only those rows which satisfy a certain condition,
select * from empdata where department = “HR” ;
 To select all columns and only those rows which satisfy more than one condition,
select * from empdata where department =”HR” and salary > 25000 ;
 To select only desired columns and only those rows which satisfy more than one condition,
select empname, age from empdata where department =”HR” and salary > 25000 ;
 To select all columns and sort the rows based on a desired column,
select * from empdata order by salary ;
 To select all columns and sort the rows based on a desired column in descending order,

select * from empdata order by salary desc ;


 To count the total number of rows in the dataset,
select count(*) from empdata ;
 To use ‘groupby’ to count number of rows based in each category of a certain column,
select department, count(*) from empdata group by department ;
 To select all column but only those rows which do not have null value in a desired column,
select * from empdata where salary is not null ;
 To select rows by matching a substring with a desired column value,
select * from empdata where designation rlike “Manager” or rlike “manager” or “Lead” ;
 To find the average of a desired numerical column, grouped by a categorical column,
select department, avg(salary) from empdata group by department;

Hive Illustration : Querying complex structures


 To enable join operations in the hive shell,
SET hive.auto.conveert.join = False;
 To perform a join operation,
select emp.empname, emp.salary from emp_epf pf join empdata emp on (pf.empid = emp.empid) ;
 To perform a left outer join operation,
select emp.empname, emp.salary from emp_epf pf left outer join empdata emp on (pf.empid =
emp.empid) ;
 To perform a right outer join operation,
select emp.empname, emp.salary from emp_epf pf right outer join empdata emp on (pf.empid =
emp.empid) ;
 To perform a full outer join operation,
select emp.empname, emp.salary from emp_epf pf full outer join empdata emp on (pf.empid =
emp.empid) ;

Hive Illustration : Views


 To create view,
create view if not exists high_sal as select * from empdata where salary > 50000 ;
 To query data from view,
select * from high_sal ;
 To see if view is created,
show tables ;
 To see the table type, (virtual or Managed),
describe formatted high_sal ;
 To create a table, partitioned by a desired column,
create table emp_global_part(id int, name string, city string, country string)
portioned by (continent string)
row format delimited
fields terminated by ‘ , ‘
stored as textfile ;

You might also like