File Systems
File Systems
File Systems
A file is a collection of correlated information which is recorded on secondary or non-volatile storage like
magnetic disks, optical disks, and tapes. It is a method of data collection that is used as a medium for
giving input and receiving output from that program.
In general, a file is a sequence of bits, bytes, or records whose meaning is defined by the file creator and
user. Every File has a logical location where they are located for storage and retrieval.
File Type
It refers to the ability of the operating system to differentiate various types of files like text files, binary, and
source files. However, Operating systems like MS_DOS and UNIX has the following type of files:
Ordinary files
Directory Files
Directory contains files and other related information about those files. Its basically a folder to
hold and organize multiple files.
Special Files
These files are also called device files. It represents physical devices like printers, disks, networks,
flash drive, etc.
File Attributes
A file has a name and data. Moreover, it also stores meta information like file creation date and time,
current size, last modified date, etc. All this information is called the attributes of a file system.
File Operations
There are several logical structures of a directory, these are given below.
Single-level directory –
The single-level directory is the simplest directory structure. In it, all files are contained in the same directory
which makes it easy to support and understand.
A single level directory has a significant limitation, however, when the number of files increases or when the
system has more than one user. Since all the files are in the same directory, they must have a unique name. if two
users call their dataset test, then the unique name rule violated.
Advantages:
Since it is a single directory, so its implementation is very easy.
If the files are smaller in size, searching will become faster.
The operations like file creation, searching, deletion, updating are very easy in such a directory structure.
Disadvantages:
There may chance of name collision because two files can have the same name.
Searching will become time taking if the directory is large.
This can not group the same type of files together.
Two-level directory –
As we have seen, a single level directory often leads to confusion of files names among different users. the
solution to this problem is to create a separate directory for each user.
In the two-level directory structure, each user has their own user files directory (UFD). The UFDs have similar
structures, but each lists only the files of a single user. system’s master file directory (MFD) is searches
whenever a new user id=s logged in. The MFD is indexed by username or account number, and each entry points
to the UFD for that user.
Advantages:
We can give full path like /User-name/directory-name/.
Different users can have the same directory as well as the file name.
Searching of files becomes easier due to pathname and user-grouping.
Disadvantages:
A user is not allowed to share files with other users.
Still, it not very scalable, two files of the same type cannot be grouped together in the same user.
Tree-structured directory –
Once we have seen a two-level directory as a tree of height 2, the natural generalization is to extend the directory
structure to a tree of arbitrary height.
This generalization allows the user to create their own subdirectories and to organize their files accordingly.
A tree structure is the most common directory structure. The tree has a root directory, and every file in the system
has a unique path.
Advantages:
Very general, since full pathname can be given.
Very scalable, the probability of name collision is less.
Searching becomes very easy, we can use both absolute paths as well as relative.
Disadvantages:
Every file does not fit into the hierarchical model, files may be saved into multiple directories.
We can not share files.
It is inefficient, because accessing a file may go under multiple directories.
Acyclic graph directory –
An acyclic graph is a graph with no cycle and allows us to share subdirectories and files. The same file or
subdirectories may be in two different directories. It is a natural generalization of the tree-structured directory.
It is used in the situation like when two programmers are working on a joint project and they need to access files.
The associated files are stored in a subdirectory, separating them from other projects and files of other
programmers since they are working on a joint project so they want the subdirectories to be into their own
directories. The common subdirectories should be shared. So here we use Acyclic directories.
It is the point to note that the shared file is not the same as the copy file. If any programmer makes some changes
in the subdirectory it will reflect in both subdirectories.
Advantages:
We can share files.
Searching is easy due to different-different paths.
Disadvantages:
We share the files via linking, in case deleting it may create the problem,
If the link is a soft link then after deleting the file we left with a dangling pointer.
In the case of a hard link, to delete a file we have to delete all the references associated with it.
File-System Implementation
Overview
Physical disks are commonly divided into smaller units called partitions. They
can also be combined into larger units, but that is most commonly done for
RAID installations and is left for later chapters.
Partitions can either be used as raw devices ( with no structure imposed upon
them ), or they can be formatted to hold a filesystem ( i.e. populated with
FCBs and initial directory structures as appropriate. ) Raw partitions are
generally used for swap space, and may also be used for certain programs
such as databases that choose to manage their own disk storage system.
Partitions containing filesystems can generally only be accessed using the file
system structure by ordinary users, but can often be accessed as a raw device
also by root.
The boot block is accessed as part of a raw partition, by the boot program
prior to any operating system being loaded. Modern boot programs
understand multiple OSes and filesystem formats, and can give the user a
choice of which of several available systems to boot.
The root partition contains the OS kernel and at least the key portions of the
OS needed to complete the boot process. At boot time the root partition is
mounted, and control is transferred from the boot program to the kernel
found there. ( Older systems required that the root partition lie completely
within the first 1024 cylinders of the disk, because that was as far as the boot
program could reach. Once the kernel had control, then it could access
partitions beyond the 1024 cylinder boundary. )
Continuing with the boot process, additional filesystems get mounted, adding
their information into the appropriate mount table structure. As a part of the
mounting process the file systems may be checked for errors or
inconsistencies, either because they are flagged as not having been closed
properly the last time they were used, or just for general principals.
Filesystems may be mounted either automatically or manually. In UNIX a
mount point is indicated by setting a flag in the in-memory copy of the inode,
so all future references to that inode get re-directed to the root directory of
the mounted filesystem.
Directory Implementation
There is the number of algorithms by using which, the directories can be implemented.
However, the selection of an appropriate directory implementation algorithm may
significantly affect the performance of the system.
The directory implementation algorithms are classified according to the data structure
they are using. There are mainly two algorithms which are used in these days.
1. Linear List
In this algorithm, all the files in a directory are maintained as singly lined list. Each file
contains the pointers to the data blocks which are assigned to it and the next file in the
directory.
Characteristics
Skip Ad
1. When a new file is created, then the entire list is checked whether the new file
name is matching to a existing file name or not. In case, it doesn't exist, the file
can be created at the beginning or at the end. Therefore, searching for a unique
name is a big concern because traversing the whole list takes time.
2. Hash Table
Now, searching becomes efficient due to the fact that now, entire list will not be
searched on every operating. Only hash table entries are checked using the key and if an
entry found then the corresponding file will be fetched using the value.
Process Management
Memory Management
File and Disk Management
I/O System Management
Most computer systems employ secondary storage devices (magnetic disks). It provides low-
cost, non-volatile storage for programs and data (tape, optical media, flash drives, etc.).
Programs and the user data they use are kept on separate storage devices called files. The
operating system is responsible for allocating space for files on secondary storage media as
needed.
There is no guarantee that files will be stored in contiguous locations on physical disk drives,
especially large files. It depends greatly on the amount of space available. When the disc is full,
new files are more likely to be recorded in multiple locations. However, as far as the user is
concerned, the example file provided by the operating system hides the fact that the file is
fragmented into multiple parts.
The operating system needs to track the location of the disk for every part of every file on the
disk. In some cases, this means tracking hundreds of thousands of files and file fragments on a
single physical disk. Additionally, the operating system must be able to locate each file and
perform read and write operations on it whenever it needs to. Therefore, the operating system is
responsible for configuring the file system, ensuring the safety and reliability of reading and
write operations to secondary storage, and maintains access times (the time required to write data
to or read data from secondary storage).
Divides the disk into sectors before storing data so that the disk controller can read and write
Each sector can be:
The header retains information, data, and error correction code (ECC) sectors of data, typically
512 bytes of data, but optional disks use the operating system’s own data structures to preserve
files using disks.
1. Divide the disc into multiple cylinder groups. Each is treated as a logical disk.
2. Logical format or “Create File System”. The OS stores the data structure of the first file
system on the disk. Contains free space and allocated space.
For efficiency, most file systems group blocks into clusters. Disk I / O runs in blocks. File I / O
runs in a cluster.
Boot block:
When the computer is turned on or restarted, the program stored in the initial bootstrap
ROM finds the location of the OS kernel from the disk, loads the kernel into memory, and
runs the OS. start.
To change the bootstrap code, you need to change the ROM and hardware chip. Only a
small bootstrap loader program is stored in ROM instead.
The full bootstrap code is stored in the “boot block” of the disk.
A disk with a boot partition is called a boot disk or system disk.
Bad Blocks:
Block allocation
File systems have to keep track of which blocks belong to each file; they also have to keep track
of which blocks are available for use. When a new file is created, the file system finds an
available block and allocates it. When a file is deleted, the file system makes its blocks available
for re-allocation.
It is hard to design a file system that achieves all of these goals, especially since file system
performance depends on “workload characteristics” like file sizes, access patterns, etc. A file
system that is well tuned for one workload might not perform as well for another.
Free Space Management
A file system is responsible to allocate the free blocks to the file therefore it has to keep
track of all the free blocks present in the disk. There are mainly two approaches by using
which, the free blocks in the disk are managed.
1. Bit Vector
In this approach, the free space list is implemented as a bit map vector. It contains the
number of bits where each bit represents each block.
If the block is empty then the bit is 1 otherwise it is 0. Initially all the blocks are empty
therefore each bit in the bit map vector contains 1.
LAs the space allocation proceeds, the file system starts allocating blocks to the files and
setting the respective bit to 0.
2. Linked List
It is another approach for free space management. This approach suggests linking
together all the free blocks and keeping a pointer in the cache which points to the first
free block.
Therefore, all the free blocks on the disks will be linked together with a pointer.
Whenever a block gets allocated, its previous free block will be linked to its next free
block.
2. Logical files: Logical files do not contain data. They contain a description of records that are
found in one or more physical files. A logical file is a view or representation of one or more
physical files. Logical files that contain more than one format are referred to as multi-format
logical files. If your program processes a logical file that contains more than one record format,
you can use the _Rformat() function to set the format you wish to use. Some operations cannot
be performed on logical files. If you open a logical file for stream file processing with open
modes W, W+, WB, or WB+, the file is opened but not cleared. If you open a logical file for
record file processing with open modes WR or WR+, the file is opened but not cleared. Records
in iSeries database files can be described using either a field-level description or a record-level
description. The field-level description of the record includes a description of all fields and their
arrangement in this record. Since the description of the fields and their arrangement is kept
within a database file and not in your ILE C/C++ program, database files created with a field-
level description are referred to as externally described files.
Physical versus Logical Files :
Physical File: A collection of bytes stored on a disk or tape.
Logical File: A “Channel” (like a telephone line) that hides the details of the file’s location
and physical format to the program.
When a program wants to use a particular file, “data”, the operating system must find the
physical file called “data” and make a logical name by assigning a logical file to it. This logical
file has a logical name which is what is used inside the program.
It occupies the portion of memory. It It does not occupy memory space. It does
contains the original data. not contain data.
A physical file contains one record format. It can contain up to 32 record formats.
It can exist without a logical file. It cannot exist without a physical file.
If there is a logical file for the physical file, If there is a logical file for a physical file,
the physical file cannot be deleted until and the logical file can be deleted without
Physical File Logical File
CRTPF command is used to make such an CRTLF command is used to make such an
object. object.
Physical files represent the real data saved The logical file represents one or multiple
on an iSeries system and describe how the physical files. It also has a description of
data is to be displayed to or retrieved from a the records found in one or multiple
program. physical files.
If there is a logical file for a physical file, the If there is a logical file for a physical file,
physical file can’t be deleted until and unless the logical file can be deleted without
we delete the Logical file. deleting the physical file.