Understanding UNIX / Linux File System: What Is A File?

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Understanding UNIX / Linux File System

A conceptual understanding of file system, especially data structure and related terms will help
you become a successful system administrator. I have seen many new Linux system
administrator without any clue about file system. The conceptual knowledge can be applied to
restore file system in an emergency situation.

What is a File?
File are collection of data items stored on disk. Or, it's device which can store the information,
data, music (mp3 files), picture, movie, sound, book etc. In fact what ever you store in
computer it must be inform of file. Files are always associated with devices like hard
disk ,floppy disk etc. File is the last object in your file system tree. See Linux/UNIX - rules for
naming file and directory names.

What is a directory?
Directory is group of files. Directory is divided into two types:
Root directory - Strictly speaking, there is only one root directory in your system, which
is denoted by / (forward slash). It is root of your entire file system and can not be
renamed or deleted.
Sub directory - Directory under root (/) directory is subdirectory which can be created,
renamed by the user.

Directories are used to organize your data files, programs more efficiently.

Linux supports numerous file system types

Ext2: This is like UNIX file system. It has the concepts of blocks, inodes and directories.
Ext3: It is ext2 filesystem enhanced with journalling capabilities. Journalling allows fast
file system recovery. Supports POSIX ACL (Access Control Lists).
Isofs (iso9660): Used by CDROM file system.
Sysfs: It is a ram-based filesystem initially based on ramfs. It is use to exporting kernel
objects so that end user can use it easily.
Procfs: The proc file system acts as an interface to internal data structures in the kernel. It
can be used to obtain information about the system and to change certain kernel
parameters at runtime using sysctl command. For example you can find out cpuinfo with
following command:
# cat /proc/cpuinfo
Or you can enable or disable routing/forwarding of IP packets between interfaces with
following command:
# cat /proc/sys/net/ipv4/ip_forward
# echo "1" > /proc/sys/net/ipv4/ip_forward
# echo "0" > /proc/sys/net/ipv4/ip_forward
NFS: Network file system allows many users or systems to share the same files by using
a client/server methodology. NFS allows sharing all of the above file system.

Linux also supports Microsoft NTFS, vfat, and many other file systems. See Linux kernel
source tree Documentation/filesystem directory for list of all supported filesystem.
You can find out what type of file systems currently mounted with mount command:
$ mount
OR
$ cat /proc/mounts

What is a UNIX/Linux File system?


A UNIX file system is a collection of files and directories stored. Each file system is stored in
a separate whole disk partition. The following are a few of the file system:
/ - Special file system that incorporates the files under several directories including /dev, /
sbin, /tmp etc
/usr - Stores application programs
/var - Stores log files, mails and other data
/tmp - Stores temporary files
See The importance of Linux partitions for more information.

But what is in a File system?


Again file system divided into two categories:
User data - stores actual data contained in files
Metadata - stores file system structural information such as superblock, inodes,
directories

Understanding UNIX / Linux filesystem Superblock


This is second part of "Understanding UNIX/Linux file system", part I is here. Let us take an
example of 20 GB hard disk. The entire disk space subdivided into multiple file system blocks.
And blocks used for what?

Unix / Linux filesystem blocks


The blocks used for two different purpose:
Most blocks stores user data aka files (user data).
Some blocks in every file system store the file system's metadata. So what the hell is a
metadata?
In simple words Metadata describes the structure of the file system. Most common metadata
structure are superblock, inode and directories. Following paragraphs describes each of
them.

Superblock

Each file system is different and they have type like ext2, ext3 etc. Further each file system
has size like 5 GB, 10 GB and status such as mount status. In short each file system has a
superblock, which contains information about file system such as:
File system type
Size
Status
Information about other metadata structures

If this information lost, you are in trouble (data loss) so Linux maintains multiple
redundant copies of the superblock in every file system. This is very important in many
emergency situation, for example you can use backup copies to restore damaged primary
super block. Following command displays primary and backup superblock location on /
dev/sda3:
# dumpe2fs /dev/hda3 | grep -i superblock
Output:
Primary superblock at 0, Group descriptors at 1-1
Backup superblock at 32768, Group descriptors at 32769-32769
Backup superblock at 98304, Group descriptors at 98305-98305
Backup superblock at 163840, Group descriptors at
163841-163841
Backup superblock at 229376, Group descriptors at
229377-229377
Backup superblock at 294912, Group descriptors at
294913-294913

Surviving a Linux Filesystem Failures


When you use term filesystem failure, you mean corrupted filesystem data structures (or objects
such as inode, directories, superblock etc. This can be caused by any one of the following
reason:
* Mistakes by Linux/UNIX Sys admin
* Buggy device driver or utilities (especially third party utilities)
* Power outage (very rarer on production system) due to UPS failure
* Kernel bugs (that is why you don't run latest kernel on production Linux/UNIX system, most
of time you need to use stable kernel release)
Due to filesystem failure:
File system will refuse to mount
Entire system get hangs
Even if filesystem mount operation result into success, users may notice strange behavior
when mounted such as system reboot, gibberish characters in directory listings etc
So how the hell you are gonna Surviving a Filesystem Failures? Most of time fsck (front
end to ext2/ext3 utility) can fix the problem, first simply run e2fsck - to check a Linux
ext2/ext3 file system (assuming /home [/dev/sda3 partition] filesystem for demo
purpose), first unmount /dev/sda3 then type following command :
# e2fsck -f /dev/sda3
Where,
-f : Force checking even if the file system seems clean.

Please note that If the superblock is not found, e2fsck will terminate with a fatal error.
However Linux maintains multiple redundant copies of the superblock in every file
system, so you can use -b {alternative-superblock} option to get rid of this problem. The
location of the backup superblock is dependent on the filesystem's blocksize:

For filesystems with 1k blocksizes, a backup superblock can be found at block 8193
For filesystems with 2k blocksizes, at block 16384
For 4k blocksizes, at block 32768.
Tip you can also try any one of the following command(s) to determine alternativesuperblock locations:
# mke2fs -n /dev/sda3
OR
# dumpe2fs /dev/sda3|grep -i superblock
To repair file system by alternative-superblock use command as follows:
# e2fsck -f -b 8193 /dev/sda3
However it is highly recommended that you make backup before you run fsck command on
system, use dd command to create a backup (provided that you have spare space under /
disk2)
# dd if=/dev/sda2 of=/disk2/backup-sda2.img
If you are using Sun Solaris UNIX, see howto: Restoring a Bad Superblock.
Please note that things started to get complicated if hard disk participates in software RAID
array. Take a look at Software-RAID HOWTO - Error Recovery. This article/tip is part of
Understanding UNIX/Linux file system series, Continue reading rest of the
Understanding Linux file system series (this is part III):

Understanding UNIX / Linux filesystem Inodes


The inode (index node) is a fundamental concept in the Linux and UNIX filesystem. Each
object in the filesystem is represented by an inode. But what are the objects? Let us try to
understand it in simple words. Each and every file under Linux (and UNIX) has following
attributes:
=> File type (executable, block special etc)
=> Permissions (read, write etc)
=> Owner
=> Group
=> File Size
=> File access, change and modification time (remember UNIX or Linux never stores file
creation time, this is favorite question asked in UNIX/Linux sys admin job interview)
=> File deletion time
=> Number of links (soft/hard)
=> Extended attribute such as append only or no one can delete file including root user
(immutability)
=> Access Control List (ACLs)
All the above information stored in an inode. In short the inode identifies the file and its
attributes (as above) . Each inode is identified by a unique inode number within the file system.
Inode is also know as index number.

inode definition

An inode is a data structure on a traditional Unix-style file system such as UFS or ext3. An
inode stores basic information about a regular file, directory, or other file system object.

How do I see file inode number?

You can use ls -i command to see inode number of file


$ ls -i /etc/passwd
Sample Output
32820 /etc/passwd
You can also use stat command to find out inode number and its attribute:
$ stat /etc/passwdOutput:
File: `/etc/passwd'
Size: 1988
Blocks: 8
IO Block: 4096
regular file
Device: 341h/833d
Inode: 32820
Links: 1
Access: (0644/-rw-r--r--) Uid: (
0/
root)
Gid: (
root)
Access: 2005-11-10 01:26:01.000000000 +0530
Modify: 2005-10-27 13:26:56.000000000 +0530
Change: 2005-10-27 13:26:56.000000000 +0530

0/

Inode application

Many commands used by system administrators in UNIX / Linux operating systems often give
inode numbers to designate a file. Let us see he practical application of inode number. Type the
following commands:
$ cd /tmp
$ touch \"la*
$ ls -l
Now try to remove file "la*
You can't, to remove files having created with control characters or characters which are unable
to be input on a keyboard or special character such as ?, * ^ etc. You have to use inode number
to remove file. This is fourth part of "Understanding UNIX/Linux file system, continue reading
rest of the Understanding Linux file system series (this is part IV)

Understanding UNIX / Linux filesystem directories


You use DNS (domain name system) to translate between domain names and IP addresses.
Similarly files are referred by file name, not by inode number. So what is the purpose of a
directory? You can groups the files according to your usage. For example all configuration files
are stored under /etc directory. So the purpose of a directory is to make a connection between
file names and their associated inode number. Inside every directory you will find out two

directories . (current directory) and .. (pointer to previous directory i.e. the directory
immediately above the one I am in now). The .. appears in every directory except for the root
directory.

Directory
A directory contained inside another directory is called a subdirectory. At the end the directories
form a tree structure. Use tree command to see directory tree structure:
$ tree /etc | less
Again a directory has an inode just like a file. It is a specially formatted file containing records
which associate each name with an inode number. Please note the following limitation of
directories under ext2/3 file system:
There is an upper limit of 32768 subdirectories in a single directory.
There is a "soft" upper limit of about 10-15k files in a single directory
However according to official documentation of ext2/3 file system points that Using a
hashed directory index (which is under development) allows 100k-1M+ files in a single
directory without performance problems'. Here are my two favorite alias commands
related to directory :
$ alias ..='cd ..'
alias d='ls -l | grep -E "^d"'

Well I'm sure all of you know the basic commands related to directories and files
managment. Click above (or here) to see summery of all basic commands related to
directories and files managment. See interesting discussion about soft links and
directories. This is 6th part of "Understanding UNIX/Linux file system, continue reading
rest of the Understanding Linux file system series (this is part IV):

Understanding UNIX / Linux symbolic (soft) and hard


links
Inodes are associated with precisely one directory entry at a time. However, with hard links it is
possible to associate multiple directory entries with a single inode. To create a hard link use ln
command as follows:
# ln /root/file1 /root/file2
# ls -l
Above commands create a link to file1. Symbolic links refer to:
A symbolic path indicating the abstract location of another file.
Hard links refer to:
The specific location of physical data.

Hard link vs. Soft link in Linux or UNIX


Hard links cannot link directories.
Cannot cross file system boundaries.
Soft or symbolic links are just like hard links. It allows to associate multiple filenames with a
single file. However, symbolic links allows:
To create links between directories.
Can cross file system boundaries.
These links behave differently when the source of the link is moved or removed.
Symbolic links are not updated.
Hard links always refer to the source, even if moved or removed.

How do I create symbolic link?


You can create symbolic link with ln command:
$ ln -s /path/to/file1.txt /path/to/file2.txt
$ ls -ali
Above command will create a symbolic link to file1.txt.
Task: Symbolic link creation and deletion
Let us create a directory called foo, enter:
$ mkdir foo
$ cd foo
Copy /etc/resolv.conf file, enter:
$ cp /etc/resolv.conf .
View inode number, enter:
$ ls -ali
Sample output:
total 152
1048600 drwxr-xr-x
2 vivek vivek
4096 2008-12-09 20:19 .
1015809 drwxrwxrwt 220 root root 143360 2008-12-09

20:19 ..
1048601 -rwxr-xr-x
1 vivek vivek
resolv.conf
Now create soft link to resolv.conf, enter:
$ ln -s resolv.conf alink.conf
$ ls -ali
Sample output:
total 152
1048600 drwxr-xr-x
2 vivek
1015809 drwxrwxrwt 220 root
20:19 ..
1048602 lrwxrwxrwx
1 vivek
alink.conf -> resolv.conf
1048601 -rwxr-xr-x
1 vivek
resolv.conf

129 2008-12-09 20:19

vivek
4096 2008-12-09 20:24 .
root 143360 2008-12-09
vivek

11 2008-12-09 20:24

vivek

129 2008-12-09 20:19

The reference count of the directory has not changed (total 152). Our symbolic (soft) link
is stored in a different inode than the text file (1048602). The information stored in
resolv.conf is accessible through the alink.conf file. If we delete the text file resolv.conf,
alink.conf becomes a broken link and our data is lost:
$ rm resolv.conf
$ ls -ali
If alink.conf was a hard link, our data would still be accessible through alink.conf. Also,
if you delete the soft link itself, the data would still be there. Read man page of ln for
more information.

Why isnt it possible to create hard links across file


system boundaries?
A single inode number use to represent file in each file system. All hard links based upon inode
number.
So linking across file system will lead into confusing references for UNIX or Linux. For
example, consider following scenario
* File system: /home
* Directory: /home/vivek
* Hard link: /home/vivek/file2
* Original file: /home/vivek/file1
Now you create a hard link as follows:
$ touch file1
$ ln file1 file2
$ ls -l
Output:

-rw-r--r-- 2 vivek vivek


0 2006-01-30 13:28 file1
-rw-r--r-- 2 vivek vivek
0 2006-01-30 13:28 file2
Now just see inode of both file1 and file2:
$ ls -i file1
782263
$ ls -i file2
782263
As you can see inode number is same for hard link file called file2 in inode table under /home
file system. Now if you try to create a hard link for /tmp file system it will lead to confusing
references for UNIX or Linux file system. Is that a link no. 782263 in the /home or /tmp file
system? To avoid this problem UNIX or Linux does not allow creating hard links across file
system boundaries.

You might also like