Big Data Cheat Sheet
Big Data Cheat Sheet
Commands Task
Top Level ETL BI Reporting RDBMS
CHEAT SHEET
Interfaces
Balancer To run cluster balancing utility
Big Data Top Level PIG Hive Sqoop
Abstractions daemonlog To get or set the log level of each daemon
Comprises of large datasets that cannot be processed using traditional computing
techniques, which includes huge volumes, high velocity and extensible variety of data.
dfsadmin To run many HDFS administrative operations
Hadoop Distributed HBASE
Data Map-Reduce Database with Real-
An Apache open source framework written in JAVA which allows distributed processing Processing Time Access Datanode To run HDFS datanode service
of large datasets across clusters of computers using simple programming models.
To run a number of mapReduce administrative
mradmin
Hadoop Common At the base is a Self- operations
healing clustered Hadoop Distributed File System
These are the JAVA libraries and utilities required by other Hadoop modules which Jobtracker To run mapReduce job tracker
storage system
contains the necessary scripts and files required to start Hadoop
Hadoop YARN Namenode To run name node
Hadoop File Automation Commands
A framework used for job scheduling and managing the cluster resources
Commands Task Syntax Tasktracker To run mapReduce task tracker node
Hadoop Distributed File System
Used to copy the source path to the Secondary namenode To run secondary namenode
A Java based file system that provides scalable and reliable data storage and it provides cat hdfsdfs –cat URI [URI- – -]
high throughput access to the application data destination or the standard output
Hadoop MapReduce chgrp Used to change the group of the files hdfsdfs –chgrp [-R] GROUP URI [URI—]
FURTHERMORE:
hdfsdfs –chmod [-R] <MODE[,MODE]- – -:
A software framework, which is used for writing the applications easily which process chmod Used to change the permissions of the file
OCTALMODE> URI [URI – – -]
Big Data Hadoop Certification Training
big amount of data in parallel on large clusters hdfsdfs –chown [-
chown Used to change the owner of the file
Apache hive R][OWNER][:{GROUP]]URI[URI] • Learn from industry experts and be sought-after by the industry!
count Used to count the number of directories hdfs dfs –count [-q] <paths> • Learn any technology, show exemplary skills and have an
An infrastructure for data warehousing for Hadoop
Apache oozie Used to copy one or more than one files from unmatched career!
cp hdfsdfs –cp URI[URI – – -]<dest>
the source to destination path
An application in Java responsible for scheduling Hadoop jobs • The most trending technology courses to help you fast-track your
Du Used to display the size of directories or files hdfsdfs –du [-s][-h]URI [URI – – -]
career!
Apache Pig
A data flow platform that is responsible for the execution of the MapReduce jobs get Used to copy files to the local file system hdfs dfs –get[-ignorecrc][-crc]<src><localdst> • Logical modules for both beginners and mid-level learners
Apache Spark ls
Used to display the statistics of any file or
hdfsdfs –ls <args> • All recorded sessions available in LMS for lifetime
directory
An open source framework used for cluster computing • 24*7 Support for Lifetime
mkdir Used to create one or more directories hdfsdfs –mkdir<path>
Flume • Learn Anytime, Anywhere
Used to move one or more files from one
mv hdfs dfs –mv URI[URI – – -]<dest>
An open source aggregation service responsible for collection and transport of data location to other
from source to destination put Used to read from one file system to other hdfsdfs –put<localsrc>- – -<dest>
Hbase
rm Used to delete one or more than one files hdfsdfs –rmr[-skipTrash]URI[URI- – – ]
A column-oriented database of Hadoop that stores big data in a scalable way
Used to display the information of any specific
stat hdfsdfs –stat URI[URI – – -]
Sqoop path
Used to display the usage information of the
An interface application that is used to transfer data between Hadoop and relational help help<cmd-name>
command
database through commands