File Management: Objectives
File Management: Objectives
File Management: Objectives
File management
Objectives
1. Long-term storage of data
2. Allow creation and deletion of files – automatically management of secondary storage
3. Allow for file reference using symbolic names
4. Protect against unauthorised access (access control) – allow sharing of files when required
5. Protect files against system failure
Files
A file is a uniform logical unit of information created by a process. Also
has address space but mapping a mass storage unit instead of RAM.
Basically, a named collection of related information that Is recorded on
secondary storage. (e.g. the set of lines in a program, or the set of
words in a text document).
Used for storing large amounts of data in the long-term
Allows processes to access the data concurrently
Naming
Motivation: no need for user to use numerical addresses - can be accessed using user-friendly name
Different Oss enforce different file naming conventions, but most follow a common pattern.
Many OS, e.g. Windows/Unix support up to approximately 260 characters for names.
Restrictions as to the characters that can be user, e.g. “?” is invalid in Windows but valid in Unix.
Some Oss distinguish between upper and lower case. Windows is not case sensitive, while UNIX is.
Extensions can be useful to tell the user and OS what types of data the file contains.
In MS-DOS only 3 characters were allowed for extensions, in Unix the size is up to the user.
In Unix, extensions are not enforced by OS, but for example, C Compiler needs a “.C” extension to compile.
Gui-Based Oss usually attach meanings to extensions – tries to associate applications to file extensions (docx –
MS word).
Issue -> easy to trick and corrupt files by modifying extensions; MacOS uses a more sophisticated approach,
examines file and tries to work out its type by the “look” of its contents.
Sequential access is still interesting today because of locality principle. Random access is essential for most
applications.
Sequential access – read next/write next
Random access – Read n/write n (n = relative block number)
Directories
Most filing systems allow files to be grouped together into
directories (or folders), resulting in a more logical organisation
• Allows operations to be performed in bulk on groups of files,
e.g., copy files or
set one of their attributes
• Allows different files to have the same filename as long as they are in different
directories
• Each directory is managed via a special file, which contains:
• a file descriptor table with descriptors for each file under that directory,
corresponding to specific entries on global file table
Two-level:
Separate directory for each user
Letters indicate owners of the directories and files
Pros: can have the same file name for different users
Cons: limited grouping capability
Hierarchical directory systems:
Directories in a tree-like structure
Pros: grouping capabilities, can have same name for files in different directories
Requires a method to browse and locate
Section 0 is master boot record (MBR), used to boot the computer via a boot block from a specified partition,
from which the OS is loaded.
The super block contains the info about the partition (e.g. the number of blocks)
Contiguous allocation
Drives are split into blocks of fixed size; e.g. 1KB – a file of 50KB would be 60 blocks.
Contiguous blocks are assigned to each file.
Advantages:
1. Simple implementation, needs to store the first block address and its length
2. The performance of such an implementation is good
3. Allows easy random access
4. Resilient to drive faults: damage to a single block results in only localised loss of data.
Disadvantages:
1. Need to track size of the files when initially created
2. Files cannot grow
3. Fragmentation as files are deleted, holes may be generated.
Linked List Allocation
Files are stored as linked list of blocks. The first bytes of each block are used as a pointer. Each block points to
the next block and the final block contains a null pointer.
Advantages:
1. Every block can be used
2. File size does not have to be known beforehand
3. Files can grow
4. No external fragmentation
5. No internal fragmentation except for last block
Disadvantages:
1. Does not support random access – very slow
2. Some space is lost for useful data within each block due to pointer
I-Nodes
Used in UNIX type Oss. Each file is associated with an i-node (index-node) listing
all the attributes and drive/disk addresses of the files blocks.
With the i-node it is possible to find all the blocks that correspond to the file.
Advantages:
1. Only the i-node of the file needs to be in memory
2. And only when the corresponding file is opened
Disadvantages:
1. What if a file grows beyond limits of i—node?
2. Last disk addresses must point to an address block instead of a data
block.
To open a file, the path name is used to locate its directory entry.
The directory entry provides a mapping from a filename/file
descriptor to the disk blocks that contain the data.
The directory entry contains all the information needed to find the disk blocks for a given file…
Contiguous allocations – addresses of the entire file
Linked list allocations – first disk block address
i-node implementation the directory – i-node number.
It also allows access to files attributes.
A journaling file system uses a special disk area to make a log entry listing the actions to be completed
In the event of disk failure the log is used to bring back the disk into a consistent state and complete all pending
actions.
Log entries are erased once the operations complete successfully.
Raid 0 (stripping) – distributes data across several disks in a way which gives improved speed and full capacity.
No security.
Raid 1 (mirroring) – uses more than one disk which store the same data. Degraded speed, not full capacity, but
secure.
Raid 1+0 – takes advantage of both.