0% found this document useful (0 votes)

109 views

Sqoop Interview Questions

Sqoop allows importing and exporting data between Hadoop and relational databases. It can import data directly into Hive, HBase, or HDFS. Sqoop uses mappers to import data in parallel and the default number of mappers is 4. Incremental imports identify new or modified data since the last import using columns like timestamps.

Uploaded by

Guruprasad Vijayakumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views

Sqoop Interview Questions

Uploaded by

Guruprasad Vijayakumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Sqoop Interview Questions

What will happen if target directory already exists during sqoop import?
Ans: Sqoop runs a map-only job and if the target directory is present, it will throw an exception.

What is the use of warehouse directory in Sqoop import?

Ans: warehouse directory is the HDFS parent directory for table destination. If we specify target-
directory all our files are stored in that location. But, with warehouse directory, a child directory is
created inside it with the name of the table. All the files are stored inside the child directory.

What is the default number of mappers in a Sqoop job?

Ans: 4

How to bring data directly into Hive using Sqoop?

Ans: To bring data directly into Hive using Sqoop use –hive-import command.

We wish to bring data in CSV format in HDFS from RDBMS source. The column in
RDBMS table contains ‘,’. How to distinctly import data in this case?
Ans: Use can use the option –optionally-enclosed-by

How to import data directly to HBase using Sqoop?

Ans: You need to use –hbase-table to import data into HBase using sqoop. Sqoop will import data to the
table specified as the argument to –hbase-table. Each row of input table will be transformed into an
Hbase put operation to a row of output table.

What is incremental load in Sqoop?

Ans: To import records which are new. For this, you should specify –last-value parameter so that the
sqoop job will import values after the specified value.

What is the benefit of using a Sqoop job?

In the scenario where you must perform incremental import multiple times, you can create a sqoop job
for incremental import and run the job. Whenever you run the sqoop job, it will automatically identify
last imported value and then the import will start after the identified value.

Q What is the process to perform an incremental data load in Sqoop?

Answer: The process to perform incremental data load in Sqoop is to synchronize the modified or
updated data (often referred as delta data) from RDBMS to Hadoop. The delta data can be facilitated
through the incremental load command in Sqoop.

Incremental load can be performed by using Sqoop import command or by loading the data into hive
without overwriting it. The different attributes that need to be specified during incremental load in
Sqoop are-

1)Mode (incremental) –The mode defines how Sqoop will determine what the new rows are. The mode
can have value as Append or Last Modified.

2)Col (Check-column) –This attribute specifies the column that should be examined to find out the rows
to be imported.

3)Value (last-value) –This denotes the maximum value of the check column from the previous import
operation.

Q How Sqoop can be used in a Java program?

Answer: The Sqoop jar in classpath should be included in the java code. After this the method
Sqoop.runTool () method must be invoked. The necessary parameters should be created to Sqoop
programmatically just like for command line.

Q What is the significance of using –compress-codec parameter?

Answer: To get the out file of a sqoop import in formats other than .gz like .bz2 we use the –compress -
code parameter.

Q How are large objects handled in Sqoop?

Answer: Sqoop provides the capability to store large sized data into a single field based on the type of
data. Sqoop supports the ability to store-

1)CLOB ‘s – Character Large Objects

2)BLOB’s –Binary Large Objects

Large objects in Sqoop are handled by importing the large objects into a file referred as “LobFile” i.e.
Large Object File. The LobFile has the ability to store records of huge size, thus each record in the LobFile
is a large object.
Q What is a disadvantage of using –direct parameter for faster data load by sqoop?
Answer: The native utilities used by databases to support faster load do not work for binary data
formats like SequenceFile

Q How can you check all the tables present in a single database using Sqoop?
Answer: The command to check the list of all tables present in a single database using Sqoop is as
follows-

Sqoop list-tables –connect jdbc: mysql: //localhost/user;

Q How can you control the number of mappers used by the sqoop command?
Answer: The Parameter –num-mappers is used to control the number of mappers executed by a sqoop
command. We should start with choosing a small number of map tasks and then gradually scale up as
choosing high number of mappers initially may slow down the performance on the database side.

Q What is the standard location or path for Hadoop Sqoop scripts?

Answer: /usr/bin/Hadoop Sqoop

Q How can we import a subset of rows from a table without using the where clause?
Answer: We can run a filtering query on the database and save the result to a temporary table in
database.

Then use the sqoop import command without using the –where clause

Q When the source data keeps getting updated frequently, what is the approach to keep it
in sync with the data in HDFS imported by sqoop?
Answer: sqoop can have 2 approaches.

a − To use the –incremental parameter with append option where value of some columns are checked
and only in case of modified values the row is imported as a new row.

b − To use the –incremental parameter with lastmodified option where a date column in the source is
checked for records which have been updated after the last import.

Q What is a sqoop metastore?

Answer: It is a tool using which Sqoop hosts a shared metadata repository. Multiple users and/or remote
users can define and execute saved jobs (created with sqoop job) defined in this metastore.
Clients must be configured to connect to the metastore in sqoop-site.xml or with the –meta-connect
argument.

Q Can free form SQL queries be used with Sqoop import command? If yes, then how can
they be used?
Answer: Sqoop allows us to use free form SQL queries with the import command. The import command
should be used with the –e and – query options to execute free form SQL queries. When using the –e
and –query options with the import command the –target dir value must be specified.

Q Tell few import control commands:

Answer: –Append

–Columns

–Where

These command are most frequently used to import RDBMS Data.

Q How can you see the list of stored jobs in sqoop metastore?
Answer: sqoop job –list

Q What type of databases Sqoop can support?

Answer: MySQL, Oracle, PostgreSQL, IBM, Netezza and Teradata. Every database connects through jdbc
driver.

Q What is the purpose of sqoop-merge?

Answer: The merge tool combines two datasets where entries in one dataset should overwrite entries of
an older dataset preserving only the newest version of the records between both the data sets.
Q HOw sqoop can handle large objects?
Answer: Blog and Clob columns are common large objects. If the object is less than 16MB, it stored
inline with the rest of the data. If large objects, temporary stored in_lob subdirectory. Those lobs
processes in a streaming fashion. Those data materialized in memory for processing. IT you set LOB to 0,
those lobs objects placed in external storage.

Q What is the importance of eval tool?

Answer: It allows user to run sample SQL queries against Database and preview the results on the
console. It can help to know what data can import? The desired data imported or not?

Q What is the default extension of the files produced from a sqoop import using the –
compress parameter?
Answer: .gz

Q Can we import the data with “Where” condition?

Answer: Yes, Sqoop has a special option to export/import a particular data.

Q What are the limitations of importing RDBMS tables into Hcatalog directly?
Answer: There is an option to import RDBMS tables into Hcatalog directly by making use of –hcatalog –
database option with the –hcatalog –table but the limitation to it is that there are several arguments like
–as-avro file , -direct, -as-sequencefile, -target-dir , -export-dir are not supported.

Q what are the majorly used commands in sqoop?

Answer: In Sqoop Majorly Import and export command are used. But below commands are also useful
sometimes. codegen, eval, import-all-tables, job, list-database, list-tables, merge, metastore.

Q What is the usefulness of the options file in sqoop.

Answer: The options file is used in sqoop to specify the command line values in a file and use it in the
sqoop commands.

For example the –connect parameter’s value and –user name value scan be stored in a file and used
again and again with different sqoop commands.

Q what are the common delimiters and escape character in sqoop?

Answer: The default delimiters are a comma(,) for fields, a newline(\n) for records
Escape characters are \b,\n,\r,\t,\”, \\’,\o etc

Q What are the two file formats supported by sqoop for import?
Answer: Delimited text and Sequence Files.

Q while loading table from MySQL into HDFS, if we need to copy tables with maximum
possible speed, what can you do?
Answer: We need to use -direct argument in import command to use direct import fast path and this -
direct can be used only with MySQL and PostGreSQL as of now.

Q How can you sync a exported table with HDFS data in which some rows are deleted?
Answer: Truncate the target table and load it again.

Q Differentiate between Sqoop and distCP.

Answer: DistCP utility can be used to transfer data between clusters whereas Sqoop can be used to
transfer data only between Hadoop and RDBMS.

Q How can you import only a subset of rows form a table?

Answer: By using the WHERE clause in the sqoop import statement we can import only a subset of rows.

Q How do you clear the data in a staging table before loading it by Sqoop?
Answer: By specifying the –clear-staging-table option we can clear the staging table before it is loaded.
This can be done again and again till we get proper data in staging.

2024 MA-NB Price List
No ratings yet
2024 MA-NB Price List
17 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
CMIS245v10 Assignment1
No ratings yet
CMIS245v10 Assignment1
8 pages
Dodge Wiring Diagram Information
100% (1)
Dodge Wiring Diagram Information
17 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
3 pages
Interview
No ratings yet
Interview
86 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Hadoop Interview Question
No ratings yet
Hadoop Interview Question
25 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
100+ Hadoop Interview Questions From Interviews
No ratings yet
100+ Hadoop Interview Questions From Interviews
32 pages
1.hadoop Admin Brochure
No ratings yet
1.hadoop Admin Brochure
11 pages
Midhun BIGDATA Curicullum
No ratings yet
Midhun BIGDATA Curicullum
17 pages
Big Data Masters Certification Learnbay
No ratings yet
Big Data Masters Certification Learnbay
12 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
3 Mapreduce Notes
No ratings yet
3 Mapreduce Notes
25 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Real Time Hadoop Interview Questions From Various Interviews
No ratings yet
Real Time Hadoop Interview Questions From Various Interviews
6 pages
Apache Hive
No ratings yet
Apache Hive
3 pages
Hadoop Interview Questions - Part 1
No ratings yet
Hadoop Interview Questions - Part 1
8 pages
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
No ratings yet
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
8 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Hadoop Interview Guide
100% (1)
Hadoop Interview Guide
34 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Sqoop Cheatsheet
No ratings yet
Sqoop Cheatsheet
3 pages
Azure Cloud Intro
No ratings yet
Azure Cloud Intro
34 pages
9 Sqoop Notes
No ratings yet
9 Sqoop Notes
17 pages
Apps DBA Interview Question
No ratings yet
Apps DBA Interview Question
12 pages
Hive Tutorial For Beginners: Learn With Examples in 3 Days
No ratings yet
Hive Tutorial For Beginners: Learn With Examples in 3 Days
3 pages
Mohit BigData 5yr
100% (1)
Mohit BigData 5yr
3 pages
Hadoop Notes Unit2
No ratings yet
Hadoop Notes Unit2
24 pages
Shihab Alkaff PDF
No ratings yet
Shihab Alkaff PDF
6 pages
Hadoop Big Data Administration
No ratings yet
Hadoop Big Data Administration
6 pages
Spark Interview 4
No ratings yet
Spark Interview 4
10 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Edureka Interview Questions - HDFS
No ratings yet
Edureka Interview Questions - HDFS
4 pages
AaxHadoop Interview Questions and Answers
No ratings yet
AaxHadoop Interview Questions and Answers
37 pages
Create An Spark Streaming App: 1. Architecture and Abstraction
No ratings yet
Create An Spark Streaming App: 1. Architecture and Abstraction
8 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Spark Details
No ratings yet
Spark Details
11 pages
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
58 pages
Spark Vs Hadoop Features Spark
No ratings yet
Spark Vs Hadoop Features Spark
9 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
7 pages
Spark Sample Resume 2
100% (1)
Spark Sample Resume 2
7 pages
Big Data Hadoop Certification Training: About Intellipaat
No ratings yet
Big Data Hadoop Certification Training: About Intellipaat
13 pages
Datatypes in Hive
No ratings yet
Datatypes in Hive
31 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
DataStage Faq S
No ratings yet
DataStage Faq S
57 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
MapReduce Example
No ratings yet
MapReduce Example
3 pages
Bigdata Notes
No ratings yet
Bigdata Notes
26 pages
2 Hadoop (Uploaded)
No ratings yet
2 Hadoop (Uploaded)
82 pages
Hive in Class Assignment Winter 2021
No ratings yet
Hive in Class Assignment Winter 2021
2 pages
Untitled
No ratings yet
Untitled
13 pages
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
No ratings yet
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
44 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
50 PySpark Interview Questions.pdf
No ratings yet
50 PySpark Interview Questions.pdf
7 pages
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Temp
No ratings yet
Temp
1 page
MCQ Type Questions
No ratings yet
MCQ Type Questions
24 pages
Hive Interview Questions
No ratings yet
Hive Interview Questions
5 pages
Reva University - JD
No ratings yet
Reva University - JD
1 page
Q2 LAS 3 Position Paper
No ratings yet
Q2 LAS 3 Position Paper
5 pages
Human Resource Management Performance Evaluation Process
No ratings yet
Human Resource Management Performance Evaluation Process
3 pages
ce424d
No ratings yet
ce424d
2 pages
Problems in Abstract Algebra 1st Edition Adrian R. Wadsworth 2024 Scribd Download
100% (3)
Problems in Abstract Algebra 1st Edition Adrian R. Wadsworth 2024 Scribd Download
62 pages
PDR Series: Power Drive Reach Lift Truck Serial Number 333423 and Higher
No ratings yet
PDR Series: Power Drive Reach Lift Truck Serial Number 333423 and Higher
186 pages
Ms Access All Previous Year Question
No ratings yet
Ms Access All Previous Year Question
39 pages
BICSI Best Practices
No ratings yet
BICSI Best Practices
23 pages
IRIG B Synchronization
No ratings yet
IRIG B Synchronization
6 pages
V & M Do BRASIL - Structural Tubes - Vallourec
No ratings yet
V & M Do BRASIL - Structural Tubes - Vallourec
24 pages
3D Printing - The Next Industrial Revolution
No ratings yet
3D Printing - The Next Industrial Revolution
198 pages
190-ECDIS JRC JAN-7201S-9201S Instruct Manual Basic 27-7-2020
No ratings yet
190-ECDIS JRC JAN-7201S-9201S Instruct Manual Basic 27-7-2020
286 pages
Arabsat Digital Television Bouquets Channels and Radio Programs List Satellite Frequency Type
No ratings yet
Arabsat Digital Television Bouquets Channels and Radio Programs List Satellite Frequency Type
68 pages
Kalman filtering Theory and practice using MATLAB 3rd Edition Mohinder S. Grewal - Get the ebook in PDF format for a complete experience
100% (3)
Kalman filtering Theory and practice using MATLAB 3rd Edition Mohinder S. Grewal - Get the ebook in PDF format for a complete experience
59 pages
Low Power VLSI Circuits & Systems Complete Notes
100% (1)
Low Power VLSI Circuits & Systems Complete Notes
66 pages
TetroGL - An OpenGL Game Tutorial in C++ For Win32 Platforms - Part 1 - CodeProject
No ratings yet
TetroGL - An OpenGL Game Tutorial in C++ For Win32 Platforms - Part 1 - CodeProject
20 pages
Risk Assessment of Construction Projects (Part 2)-SwissRe
No ratings yet
Risk Assessment of Construction Projects (Part 2)-SwissRe
23 pages
Milestones Project Plan
No ratings yet
Milestones Project Plan
50 pages
Material List
No ratings yet
Material List
9 pages
BT-1 - RS232 Turn Bluetooth Adapter, FCC ID: 2ANPBBT-1A0: Products Introduction
No ratings yet
BT-1 - RS232 Turn Bluetooth Adapter, FCC ID: 2ANPBBT-1A0: Products Introduction
2 pages
Yvaa0153 Cu Cu
No ratings yet
Yvaa0153 Cu Cu
29 pages
INF10003 Introduction To Business Information Systems: Mark Dale and Rohan Bennett February 2021
No ratings yet
INF10003 Introduction To Business Information Systems: Mark Dale and Rohan Bennett February 2021
34 pages
Neo 8Q Neo M8 FW3 - Him - (Ubx 15029985)
No ratings yet
Neo 8Q Neo M8 FW3 - Him - (Ubx 15029985)
31 pages
Fundamentals of Computer - 100 MCQ Questions MCQ Sets
No ratings yet
Fundamentals of Computer - 100 MCQ Questions MCQ Sets
25 pages
463 6901811 enUS om
No ratings yet
463 6901811 enUS om
104 pages
BBC2017 Country Ratings Poll PDF
No ratings yet
BBC2017 Country Ratings Poll PDF
45 pages
GVT0133 013a
No ratings yet
GVT0133 013a
122 pages
Marine 1.1.3.11 Lab - Draw A Process Diagram
No ratings yet
Marine 1.1.3.11 Lab - Draw A Process Diagram
3 pages