HDFS Blocks
HDFS Blocks
HDFS Blocks
DFS blocks are large compared to disk blocks, because to minimize the cost of
seeks. If we have many smaller size disk blocks, the seek time would be
maximum (time spent to seek/look for an information). And also, having
multiple small sized blocks is the burden on name node/master, as ultimately
the name node stores metadata, so it has to save this disk block information.
If the Data Block is large enough, the time it takes to transfer the data from the
disk can be significantly longer than the time to seek to the start of the block.
Thus, transferring a large file made of multiple blocks operates at the disk
transfer rate.
For each block we need a Mapper. So, in the case of small-sized blocks, there
will be a lot of Mappers. Each will be processing the data, which isn’t efficient.
DataFlair Team
1)NAS stands for Network Attached storage which is a
file-level computer data storage server connected to a
computer network providing network access to
heterogeneous group of clients
HDFS stands for Hadoop distributed file system which is
a java based file system that provides scalable and reliable
data storage and is designed to span large clusters of
commodity hardware.
2)In HDFS data blocks are distributed across the local
drives of all machines in a cluster whereas in NAS data is
stored on a dedicated server.