Lecture 5.Pptx 2
Lecture 5.Pptx 2
Lecture 5.Pptx 2
Facilitator : Dr.S.Saraswathi
Index Techniques
• Indexed sequential access file combines both sequential file and direct
access file organization.
• This file have multiple keys. These keys can be alphanumeric in which the
records are ordered is called primary key.
• The data can be access either sequentially or randomly using the index. The
index is stored in a file and read into memory when the file is opened.
Advantages of Indexed sequential access file
organization
• In indexed sequential access file, sequential file and random file access is
possible.
• It accesses the records very fast if the index table is properly organized.
• It accesses the records very fast if the index table is properly organized.
• Dense Index:
For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with that search key value.
Sparse Index:
• The index record appears only for a few items in the data file. Each item points to a block as shown.
• To locate a record, we find the index record with the largest search key value less than or equal to the
search key value we are looking for.
• We start at that record pointed to by the index record, and proceed along with the pointers in the file
(that is, sequentially) until we find the desired record.
• Hashed Indexes
• Trie Indexing
Cylinder Surface indexing
• This index structure is for files stored in hard disks. It needs the records to
be sorted on some field.
• An index of the first record in each cylinder is maintained using the field on
which the records are sorted as the key field. This is the cylinder index.
• The key values of this index defines the ranges of the key value present in
each cylinder, so that for a search of a particular record the cylinder in
which the record may be present can be found out.
• So the search for the record can be performed only in that cylinder.
• Again within a cylinder, records are stored starting from the first surface
and through the assumed sequence of surfaces.
• Here an index of the first record in each surface is maintained using the
same key field. This is the surface index.
• There is a cylinder index for the entire file and one surface index for each
cylinder that the file spreads over.
• The idea of cylinder surface index can be used even without
considering cylinders and surfaces.
• If the records are sorted and kept in a disk file, then an index
containing the (key-value, offset) of equally spaced records records
can help a sequential search by requiring the search first in the index,
and then in the proper segment of the disk file.
Hashed indexing
• Hash functions and overflow handling techniques are used for hashing.
• Since the index is to be maintained on disk and disk access times are generally
several orders of magnitude larger than internal memory access times, much
consideration is given to hash table design and the choice of an overflow handling
technique.
1. rehashing
3. chaining
• Expected number of bucket accesses when s=1 is roughly same for method
rehashing, random and quadratic probing.
• Since the hash table is on a disk and these overflow techniques tend to
randomize the use of hash table, we can expect each bucket access to take
almost the maximum seek time.
• At each level the root of the tree can contain upto m - 1 key values
that define the partition boundaries.
• The trie contains two types of nodes. First type called the branch
node and second information node.
• Searching a trie for a key value X requires breaking up X into its
consequent characters and following the branching patterns
determined by these characters.