File Systems

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

What is File System?

A file is a collection of correlated information which is recorded on secondary or non-volatile storage like
magnetic disks, optical disks, and tapes. It is a method of data collection that is used as a medium for
giving input and receiving output from that program.

In general, a file is a sequence of bits, bytes, or records whose meaning is defined by the file creator and
user. Every File has a logical location where they are located for storage and retrieval.

File Type
It refers to the ability of the operating system to differentiate various types of files like text files, binary, and
source files. However, Operating systems like MS_DOS and UNIX has the following type of files:

Character Special File


It is a hardware file that reads or writes data character by character, like mouse, printer, and more.

Ordinary files

 These types of files stores user information.


 It may be text, executable programs, and databases.
 It allows the user to perform operations like add, delete, and modify.

Directory Files

 Directory contains files and other related information about those files. Its basically a folder to
hold and organize multiple files.

Special Files

 These files are also called device files. It represents physical devices like printers, disks, networks,
flash drive, etc.

File Attributes
A file has a name and data. Moreover, it also stores meta information like file creation date and time,
current size, last modified date, etc. All this information is called the attributes of a file system.

Here, are some important File attributes used in OS:

 Name: It is the only information stored in a human-readable form.


 Identifier: Every file is identified by a unique tag number within a file system known as an
identifier.
 Location: Points to file location on device.
 Type: This attribute is required for systems that support various types of files.
 Size. Attribute used to display the current file size.
 Protection. This attribute assigns and controls the access rights of reading, writing, and executing
the file.
 Time, date and security: It is used for protection, security, and also used for monitoring

File Operations

 A file is an abstract data type. To define a file properly, we need to consider


the operations that can be performed on files.
 Six basic file operations. The OS can provide system calls to create, write, read,
reposition, delete, and truncate files.
o Creating a file. Two steps are necessary to create a file.
1. Space in the file system must be found for the file.
2. An entry for the new file must be made in the directory.
o Writing a file. To write a file, we make a system call specifying both the
name of the file and the information to be written to the file. The
system must keep a write pointer to the location in the file where the
next write is to take place. The write pointer must be updated
whenever a write occurs.
o Reading a file. To read from a file, we use a system call that specifies
the name of the file and where (in memory) the next block of the file
should be put. The system needs to keep a read pointer to the location
in the file where the next read is to take place.

 Because a process is usually either reading from or writing to a


file, the current operation location can be kept as a per-
process current-file-position pointer.
 Both the read and write operations use this same pointer, saving
space and reducing system complexity.
Repositioning within a file. The directory is searched for the appropriate
entry, and the current-file-position pointer is repositioned to a given
value. Repositioning within a file need not involve any actual I/O. This
file operation is also known as a file seek.
Deleting a file. To delete a file, we search the directory for the named file.
Having found the associated directory entry, we release all file space, so
that it can be reused by other files, and erase the directory entry.
Truncating a file. The user may want to erase the contents of a file but
keep its attributes. Rather than forcing the user to delete the file and
then recreate it, this function allows all attributes to remain unchanged
(except for file length) but lets the file be reset to length zero and its file
space released.

Structures of Directory in Operating System


 Difficulty Level : Easy
 Last Updated : 10 Dec, 2021
A directory is a container that is used to contain folders and files. It organizes files and folders in a hierarchical
manner. 

There are several logical structures of a directory, these are given below.  
 Single-level directory – 
The single-level directory is the simplest directory structure. In it, all files are contained in the same directory
which makes it easy to support and understand. 
A single level directory has a significant limitation, however, when the number of files increases or when the
system has more than one user. Since all the files are in the same directory, they must have a unique name. if two
users call their dataset test, then the unique name rule violated.

Advantages: 
 Since it is a single directory, so its implementation is very easy.
 If the files are smaller in size, searching will become faster.
 The operations like file creation, searching, deletion, updating are very easy in such a directory structure.
Disadvantages:
 There may chance of name collision because two files can have the same name.
 Searching will become time taking if the directory is large.
 This can not group the same type of files together.
 Two-level directory – 
As we have seen, a single level directory often leads to confusion of files names among different users. the
solution to this problem is to create a separate directory for each user. 
In the two-level directory structure, each user has their own user files directory (UFD). The UFDs have similar
structures, but each lists only the files of a single user. system’s master file directory (MFD) is searches
whenever a new user id=s logged in. The MFD is indexed by username or account number, and each entry points
to the UFD for that user.
Advantages: 
 We can give full path like /User-name/directory-name/.
 Different users can have the same directory as well as the file name.
 Searching of files becomes easier due to pathname and user-grouping.
Disadvantages:
 A user is not allowed to share files with other users.
 Still, it not very scalable, two files of the same type cannot be grouped together in the same user. 
 
 Tree-structured directory – 
Once we have seen a two-level directory as a tree of height 2, the natural generalization is to extend the directory
structure to a tree of arbitrary height. 
This generalization allows the user to create their own subdirectories and to organize their files accordingly.

A tree structure is the most common directory structure. The tree has a root directory, and every file in the system
has a unique path. 
Advantages: 
 Very general, since full pathname can be given.
 Very scalable, the probability of name collision is less.
 Searching becomes very easy, we can use both absolute paths as well as relative.
Disadvantages: 
 Every file does not fit into the hierarchical model, files may be saved into multiple directories.
 We can not share files.
 It is inefficient, because accessing a file may go under multiple directories. 
 
 Acyclic graph directory – 
An acyclic graph is a graph with no cycle and allows us to share subdirectories and files. The same file or
subdirectories may be in two different directories. It is a natural generalization of the tree-structured directory. 
It is used in the situation like when two programmers are working on a joint project and they need to access files.
The associated files are stored in a subdirectory, separating them from other projects and files of other
programmers since they are working on a joint project so they want the subdirectories to be into their own
directories. The common subdirectories should be shared. So here we use Acyclic directories. 
It is the point to note that the shared file is not the same as the copy file. If any programmer makes some changes
in the subdirectory it will reflect in both subdirectories.

Advantages: 
 We can share files.
 Searching is easy due to different-different paths.
Disadvantages:
 We share the files via linking, in case deleting it may create the problem,
 If the link is a soft link then after deleting the file we left with a dangling pointer.
 In the case of a hard link, to delete a file we have to delete all the references associated with it. 

File-System Implementation
Overview

 File systems store several important data structures on the disk:


o A boot-control block, ( per volume ) a.k.a. the boot block in UNIX or
the partition boot sector in Windows contains information about how
to boot the system off of this disk. This will generally be the first sector
of the volume if there is a bootable system loaded on that volume, or
the block will be left vacant otherwise.
o A volume control block, ( per volume ) a.k.a. the master file table in
UNIX or the superblock in Windows, which contains information such as
the partition table, number of blocks on each filesystem, and pointers
to free blocks and free FCB blocks.
o A directory structure ( per file system ), containing file names and
pointers to corresponding FCBs. UNIX uses inode numbers, and NTFS
uses a master file table.
o The File Control Block, FCB, ( per file ) containing details about
ownership, size, permissions, dates, etc. UNIX stores this information in
inodes, and NTFS in the master file table as a relational database
structure.

Figure 12.2 - A typical file-control block.

 There are also several key data structures stored in memory:


o An in-memory mount table.
o An in-memory directory cache of recently accessed directory
information.
o A system-wide open file table, containing a copy of the FCB for every
currently open file in the system, as well as some other related
information.
o A per-process open file table, containing a pointer to the system open
file table as well as some other information. ( For example the current
file position pointer may be either here or in the system file table,
depending on the implementation and whether the file is being shared
or not. )
 Figure 12.3 illustrates some of the interactions of file system components
when files are created and/or used:
o When a new file is created, a new FCB is allocated and filled out with
important information regarding the new file. The appropriate directory
is modified with the new file name and FCB information.
o When a file is accessed during a program, the open( ) system call reads
in the FCB information from disk, and stores it in the system-wide open
file table. An entry is added to the per-process open file table
referencing the system-wide table, and an index into the per-process
table is returned by the open( ) system call. UNIX refers to this index as
a file descriptor, and Windows refers to it as a file handle.
o If another process already has a file open when a new request comes in
for the same file, and it is sharable, then a counter in the system-wide
table is incremented and the per-process table is adjusted to point to
the existing entry in the system-wide table.
o When a file is closed, the per-process table entry is freed, and the
counter in the system-wide table is decremented. If that counter
reaches zero, then the system wide table is also freed. Any data
currently stored in memory cache for this file is written out to disk if
necessary.
Figure 12.3 - In-memory file-system structures. (a) File open. (b) File read.

12.2.2 Partitions and Mounting

 Physical disks are commonly divided into smaller units called partitions. They
can also be combined into larger units, but that is most commonly done for
RAID installations and is left for later chapters.
 Partitions can either be used as raw devices ( with no structure imposed upon
them ), or they can be formatted to hold a filesystem ( i.e. populated with
FCBs and initial directory structures as appropriate. ) Raw partitions are
generally used for swap space, and may also be used for certain programs
such as databases that choose to manage their own disk storage system.
Partitions containing filesystems can generally only be accessed using the file
system structure by ordinary users, but can often be accessed as a raw device
also by root.
 The boot block is accessed as part of a raw partition, by the boot program
prior to any operating system being loaded. Modern boot programs
understand multiple OSes and filesystem formats, and can give the user a
choice of which of several available systems to boot.
 The root partition contains the OS kernel and at least the key portions of the
OS needed to complete the boot process. At boot time the root partition is
mounted, and control is transferred from the boot program to the kernel
found there. ( Older systems required that the root partition lie completely
within the first 1024 cylinders of the disk, because that was as far as the boot
program could reach. Once the kernel had control, then it could access
partitions beyond the 1024 cylinder boundary. )
 Continuing with the boot process, additional filesystems get mounted, adding
their information into the appropriate mount table structure. As a part of the
mounting process the file systems may be checked for errors or
inconsistencies, either because they are flagged as not having been closed
properly the last time they were used, or just for general principals.
Filesystems may be mounted either automatically or manually. In UNIX a
mount point is indicated by setting a flag in the in-memory copy of the inode,
so all future references to that inode get re-directed to the root directory of
the mounted filesystem.

12.2.3 Virtual File Systems

 Virtual File Systems, VFS, provide a common interface to multiple different


filesystem types. In addition, it provides for a unique identifier ( vnode ) for
files across the entire space, including across all filesystems of different types.
( UNIX inodes are unique only across a single filesystem, and certainly do not
carry across networked file systems. )
 The VFS in Linux is based upon four key object types:
o The inode object, representing an individual file
o The file object, representing an open file.
o The superblock object, representing a filesystem.
o The dentry object, representing a directory entry.
 Linux VFS provides a set of common functionalities for each filesystem, using
function pointers accessed through a table. The same functionality is accessed
through the same table position for all filesystem types, though the actual
functions pointed to by the pointers may be filesystem-specific. See
/usr/include/linux/fs.h for full details. Common operations provided include
open( ), read( ), write( ), and mmap( ).

Figure 12.4 - Schematic view of a virtual file system.

Directory Implementation

There is the number of algorithms by using which, the directories can be implemented.
However, the selection of an appropriate directory implementation algorithm may
significantly affect the performance of the system.
The directory implementation algorithms are classified according to the data structure
they are using. There are mainly two algorithms which are used in these days.

1. Linear List

In this algorithm, all the files in a directory are maintained as singly lined list. Each file
contains the pointers to the data blocks which are assigned to it and the next file in the
directory.

Characteristics

Skip Ad

1. When a new file is created, then the entire list is checked whether the new file

name is matching to a existing file name or not. In case, it doesn't exist, the file

can be created at the beginning or at the end. Therefore, searching for a unique

name is a big concern because traversing the whole list takes time.

2. The list needs to be traversed in case of every operation (creation, deletion,

updating, etc) on the files therefore the systems become inefficient.

2. Hash Table

To overcome the drawbacks of singly linked list implementation of directories, there is


an alternative approach that is hash table. This approach suggests to use hash table
along with the linked lists.
A key-value pair for each file in the directory gets generated and stored in the hash
table. The key can be determined by applying the hash function on the file name while
the key points to the corresponding file stored in the directory.

Now, searching becomes efficient due to the fact that now, entire list will not be
searched on every operating. Only hash table entries are checked using the key and if an
entry found then the corresponding file will be fetched using the value.

Disk Management in Operating System


 Difficulty Level : Basic
 Last Updated : 06 Oct, 2021
The range of services and add-ons provided by modern operating systems is constantly
expanding, and four basic operating system management functions are implemented by all
operating systems. These management functions are briefly described below and given the
following overall context. The four main operating system management functions (each of which
are dealt with in more detail in different places) are:

 Process Management
 Memory Management
 File and Disk Management
 I/O System Management
Most computer systems employ secondary storage devices (magnetic disks). It provides low-
cost, non-volatile storage for programs and data (tape, optical media, flash drives, etc.).
Programs and the user data they use are kept on separate storage devices called files. The
operating system is responsible for allocating space for files on secondary storage media as
needed. 
There is no guarantee that files will be stored in contiguous locations on physical disk drives,
especially large files. It depends greatly on the amount of space available. When the disc is full,
new files are more likely to be recorded in multiple locations. However, as far as the user is
concerned, the example file provided by the operating system hides the fact that the file is
fragmented into multiple parts.

The operating system needs to track the location of the disk for every part of every file on the
disk. In some cases, this means tracking hundreds of thousands of files and file fragments on a
single physical disk. Additionally, the operating system must be able to locate each file and
perform read and write operations on it whenever it needs to. Therefore, the operating system is
responsible for configuring the file system, ensuring the safety and reliability of reading and
write operations to secondary storage, and maintains access times (the time required to write data
to or read data from secondary storage). 

Disk management of the operating system includes:


 Disk Format
 Booting from disk
 Bad block recovery

The low-level format or physical format:

 Divides the disk into sectors before storing data so that the disk controller can read and write
Each sector can be:

The header retains information, data, and error correction code (ECC) sectors of data, typically
512 bytes of data, but optional disks use the operating system’s own data structures to preserve
files using disks.

 It is conducted in two stages:

1. Divide the disc into multiple cylinder groups. Each is treated as a logical disk.

2. Logical format or “Create File System”. The OS stores the data structure of the first file
system on the disk. Contains free space and allocated space.

For efficiency, most file systems group blocks into clusters. Disk I / O runs in blocks. File I / O
runs in a cluster.

 Boot block:

 When the computer is turned on or restarted, the program stored in the initial bootstrap
ROM finds the location of the OS kernel from the disk, loads the kernel into memory, and
runs the OS. start.
  To change the bootstrap code, you need to change the ROM and hardware chip. Only a
small bootstrap loader program is stored in ROM  instead.
  The full bootstrap code is stored in the “boot block” of the disk.
  A disk with a boot partition is called a boot disk or system disk. 

 Bad Blocks:

 Disks are error-prone because moving parts have small tolerances.


   Most disks are even stuffed from the factory with bad blocks and are handled in a variety
of ways.
  The controller maintains a list of bad blocks.
 The controller can instruct each bad sector to be logically replaced with one of the spare
sectors. This scheme is known as sector sparing or transfer.
  A soft error triggers the data recovery process.
 However, unrecoverable hard errors may result in data loss and require manual
intervention.

Block allocation

File systems have to keep track of which blocks belong to each file; they also have to keep track
of which blocks are available for use. When a new file is created, the file system finds an
available block and allocates it. When a file is deleted, the file system makes its blocks available
for re-allocation.

The goals of the block allocation system are:

 Speed: Allocating and freeing blocks should be fast.


 Minimal space overhead: The data structures used by the allocator should be small, leaving as
much space as possible for data.
 Minimal fragmentation: If some blocks are left unused, or some are only partially used, the
unused space is called “fragmentation”.
 Maximum contiguity: Data that is likely to be used at the same time should be physically
contiguous, if possible, to improve performance.

It is hard to design a file system that achieves all of these goals, especially since file system
performance depends on “workload characteristics” like file sizes, access patterns, etc. A file
system that is well tuned for one workload might not perform as well for another.
Free Space Management

A file system is responsible to allocate the free blocks to the file therefore it has to keep
track of all the free blocks present in the disk. There are mainly two approaches by using
which, the free blocks in the disk are managed.

1. Bit Vector

In this approach, the free space list is implemented as a bit map vector. It contains the
number of bits where each bit represents each block.

If the block is empty then the bit is 1 otherwise it is 0. Initially all the blocks are empty
therefore each bit in the bit map vector contains 1.

LAs the space allocation proceeds, the file system starts allocating blocks to the files and
setting the respective bit to 0.

2. Linked List

It is another approach for free space management. This approach suggests linking
together all the free blocks and keeping a pointer in the cache which points to the first
free block.

Therefore, all the free blocks on the disks will be linked together with a pointer.
Whenever a block gets allocated, its previous free block will be linked to its next free
block.

Physical and Logical File Systems


 Last Updated : 27 Apr, 2022
1. Physical files: Physical files contain the actual data that is stored on an iSeries system, and a
description of how data is to be presented to or received from a program. They contain only one
record format and one or more members. Records in database files can be described using either
a field-level description or a record-level description. A field-level description describes the
fields in the record to the system. Database files that are created with field-level descriptions are
referred to as externally described files. A record-level description describes only the length of
the record, and not the contents of the record. Database files that are created with record-level
descriptions are referred to as program-described files. This means that your ILE C/C++ program
must describe the fields in the record.
An ILE C/C++ program can use either externally described or program-described files. If it uses
an externally described file, the ILE C/C++ compiler can extract information from the externally
described file, and automatically include field information in your program. Your program does
not need to define the field information. For further information see “Using Externally Described
Files in Your Programs”. A physical file can have a keyed sequence access path. This means that
data is presented to an ILE C/C++ program in a sequence that is based on one or more key fields
in the file. 

2. Logical files: Logical files do not contain data. They contain a description of records that are
found in one or more physical files. A logical file is a view or representation of one or more
physical files. Logical files that contain more than one format are referred to as multi-format
logical files. If your program processes a logical file that contains more than one record format,
you can use the _Rformat() function to set the format you wish to use. Some operations cannot
be performed on logical files. If you open a logical file for stream file processing with open
modes W, W+, WB, or WB+, the file is opened but not cleared. If you open a logical file for
record file processing with open modes WR or WR+, the file is opened but not cleared. Records
in iSeries database files can be described using either a field-level description or a record-level
description. The field-level description of the record includes a description of all fields and their
arrangement in this record. Since the description of the fields and their arrangement is kept
within a database file and not in your ILE C/C++ program, database files created with a field-
level description are referred to as externally described files. 
Physical versus Logical Files :
 Physical File: A collection of bytes stored on a disk or tape.
 Logical File: A “Channel” (like a telephone line) that hides the details of the file’s location
and physical format to the program.
When a program wants to use a particular file, “data”, the operating system must find the
physical file called “data” and make a logical name by assigning a logical file to it. This logical
file has a logical name which is what is used inside the program.

Physical File Logical File

It occupies the portion of memory. It It does not occupy memory space. It does
contains the original data. not contain data.

A physical file contains one record format. It can contain up to 32 record formats.

It can exist without a logical file. It cannot exist without a physical file.

If there is a logical file for the physical file, If there is a logical file for a physical file,
the physical file cannot be deleted until and the logical file can be deleted without
Physical File Logical File

unless we delete the logical file. deleting the physical file.

CRTPF command is used to make such an CRTLF command is used to make such an
object. object.

Physical files represent the real data saved The logical file represents one or multiple
on an iSeries system and describe how the physical files. It also has a description of
data is to be displayed to or retrieved from a the records found in one or multiple
program.  physical files.

If there is a logical file for a physical file, the If there is a logical file for a physical file,
physical file can’t be deleted until and unless the logical file can be deleted without
we delete the Logical file. deleting the physical file.

You might also like