0% found this document useful (0 votes)
12 views17 pages

Lecture 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

Lecture 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

NEHRU ARTS AND SCIENCE COLLEGE

COIMBATORE -641 105

DEPARTMENT OF INFORMATION TECHNOLOGY


Course: Data Structures

Facilitator : Dr.S.Saraswathi
File Organizations
File organization

• Sequential

• Random

• Linked organization

• Inverted files
Sequential File Organization
• The easiest method for file Organization is Sequential method. In this
method the file are stored one after another in a sequential manner. There
are two ways to implement this method:

• Pile File Method – This method is quite simple, in which we store the
records in a sequence i.e one after other in the order in which they are
inserted into the tables.
• Sorted File Method –In this method, As the name itself suggest
whenever a new record has to be inserted, it is always inserted in a sorted
(ascending or descending) manner.

• Sorting of records may be based on any primary key or any other key.
Pros
• Fast and efficient method for huge amount of data.
• Simple design.
• Files can be easily stored in magnetic tapes i.e cheaper storage
mechanism.
Cons
• Time wastage as we cannot jump on a particular record that is
required, but we have to move in a sequential manner which
takes our time.
• Sorted file method is inefficient as it takes time and space for
sorting records.
Random File Organization
• Records are stored at random locations on the disk.

• This randomization could be achieved by any of several techniques: direct


addressing, directory lookup, hashing.

• Direct addressing: in direct addressing with equi-size records, available


disk space is divided out into nodes large enough to hold a record.

• Numeric value of primary key is used to determine the node into which a
particular record is to be stored.
• Directory lookup: the index is not direct access type but is a dense index
maintained using a structure suitable for index operations.

• Retrieving a record involves searching the index for the record address
and then accessing the record itself.

• The storage management scheme will depend on whether fixed size or


variable size nodes are being used.

• It requires more accesses for retrieval and update, since index searching
will generally require more than one access.

• In both direct addressing and directory lookup, some provision must be


made to handle collisions.
• Hashing: the available file space is divided into buckets and slots. Some
space may have to be set aside for an overflow area in case chaining is
being used to handle overflows.
• When variable size records are present, the no. of slots per bucket will be
only rough indicator of no. of records a bucket can hold.
• The actual no. will vary dynamically with the size of records in a particular
bucket.
• Random organization on the primary key using any of the above three
techniques overcomes the difficulties of sequential organizations.
• Insertion, deletions become easy.
• But batch processing of queries becomes inefficient as records are not
maintained in order of primary key.
• Handling range queries becomes very inefficient except in case of
directory lookup.
Linked File Organization
• Linked organizations differ from sequential organizations essentially in that the
logical sequence of records is generally different from the physical sequence.

• In sequential ith record is placed at location li, then the i+1st record is placed at
li + c where c is the length of ith record or some fixed constant.

• In linked organization the next logical record is obtained by following link value
from present record.

• Linking in order of increasing primary key eases insertion deletion.


• Searching for a particular record is difficult since no index is available, so
only sequential search possible.

• We can facilitate indexes by maintaining indexes corresponding to ranges


of employee numbers eg. 501-700, 701-900. all records with same range
will be linked together i a list.

• We can generalize this idea for secondary key level also. We just set up
indexes for each key and allow records to be in more than one list.

• This leads to the multilist structure for file representation.


Inverted File Organization
• Inverted files are similar to multilists.

• Multilists records with the same key value are linked together with link
information being kept in individual record.

• In case of inverted files the link information is kept in index itself.

• EG. We assume that every key is dense.

• Since the index entries are variable length, index maintenance becomes
complex fro multilists. Benefits being Boolean queries require only one access
per record satisfying the query. Queries of type k1=xx and k2=yy can be
handled similarly by intersecting two lists.
• The retrieval works in two steps.

• In the first step, the indexes are processed to obtain a list of records satisfying
the query and in the second, these records are retrieved using the list.

• The no. of disk accesses needed is equal to the no. of records being retrieved +
the no. to process the indexes.

• Inverted files represent one extreme of file organization in which only the index
structures are important.

• The records themselves can be stored in any way.

• Inverted files may also result in space saving compared with other file
structures when record retrieval doesn’t require retrieval of key fields.

• In this case key fields may be deleted from the records unlike multilist
structures.
Thank you

You might also like