0% found this document useful (0 votes)

60 views17 pages

Basic File Structure

The document discusses database file structures and indexing. It covers primary and secondary storage, fixed and variable length records, file headers, and different types of file organizations like heap files and sorted files.

Uploaded by

Saloni Vani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views17 pages

Basic File Structure

Uploaded by

Saloni Vani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Basic File Structure, Hashing & Indexing

Databases are stored physically as files of records, which are typically stored on magnetic
disks.
physical database file structures: auxiliary data structures called INDEXES.
The collection of data that makes up a computerized database must be stored physically on some
computer storage medium. The DBMS software can then retrieve, update, and process this data as
needed. Computer storage media form a storage hierarchy that includes two main categories:
(DATABASE jo ki collection of records hota hai vo khi physical medium pe store hota haii, jese
hard disk me tb hi hm retrive kr payenge data koo)
Primary storage. This category includes storage media that can be operated
on directly by the computer’s central processing unit (CPU), such as the com
puter’s main memory and smaller but faster cache memories.
 Provides: fast access to data, but limited storage capacity.

Secondary and tertiary storage. This category includes magnetic disks,

optical disks (CD-ROMs, DVDs, and other similar storage media), and
tapes. Hard-disk drives are classified as secondary storage, whereas remov
able media such as optical disks and tapes are considered tertiary storage.
 Provides: slower access to data, larger capacity, cost less.
 Data in secondary or tertiary storage cannot be processed directly by the CPU; first it must be
copied into primary storage and then processed by the CPU.
Files, Fixed-Length Records, and Variable-Length Records
In many cases, all records in a file are of the same
record type. If every record in the file has exactly the same size (in bytes), the file is
said to be made up of fixed-length records. If different records in the file have dif
ferent sizes, the file is said to be made up of variable-length records. A file may have
variable-length records for several reasons:
■ The file records are of the same record type, but one or more of the fields are
of varying size (variable-length fields). For example, the Name field of
EMPLOYEE can be a variable-length field.
■ The file records are of the same record type, but one or more of the fields
may have multiple values for individual records; such a field is called a
repeating field and a group of values for the field is often called a repeating
group.
■ The file records are of the same record type, but one or more of the fields are
optional; that is, they may have values for some but not all of the file records
(optional fields).
■ The file contains records of different record types and hence of varying size
(mixed file). This would occur if related records of different types were
clustered (placed together) on disk blocks; for example, the GRADE_REPORT
records of a particular student may be placed following that STUDENT’s
record.
The fixed-length EMPLOYEE records in Figure 17.5(a) have a record size of 71 bytes.
Every record has the same fields, and field lengths are fixed, so the system can iden
tify the starting byte position of each field relative to the starting position of the
record. This facilitates locating field values by programs that access such files. Notice
that it is possible to represent a file that logically should have variable-length records
as a fixed-length records file.
Optional field case: special null values should be stored in every file record if no record exist for
that field.

(a) A fixed-length record with six fields and size of 71 bytes.

(b) A record with two variable-length fields and three fixed-length fields.
(c) A variable-field record with three types of separator characters.
For variable-length fields, each record has a value for each field, but we do not know
the exact length of some field values. To determine the bytes within a particular
record that represent each field, we can use special separator characters (such as ? or
% or $)
A file of records with optional fields can be formatted in different ways. If the total
number of fields for the record type is large, but the number of fields that actually
appear in a typical record is small, we can include in each record a sequence of
<field-name, field-value> pairs rather than just the field values. T
separating the field name from the field value and separating one field from the next field. A more
practical option is to assign a short field type code—say, an integer number—to each field and
include in each record a sequence of <field-type, field-value> pairs rather than <field-name,
field-value> pairs.
A repeating field needs one separator character to separate the repeating values of
the field and another separator character to indicate termination of the field.
Finally, for a file that includes records of different types, each record is preceded by a
record type indicator.
The records of a file must be allocated to disk blocks because a block is the unit of

data transfer between disk and memory. When the block size is larger than the

record size, each block will contain numerous records, although some files may have

unusually large records that cannot fit in one block.

The records of a file must be allocated to disk blocks because a block is the unit of
data transfer between disk and memory. When the block size is larger than the
record size, each block will contain numerous records, although some files may have
unusually large records that cannot fit in one block.
Suppose a bytes = B
for fixed size length records size of = R,
with B ≥ R
we can fit bfr = ⎣ B/R⎦ records per block
where the ⎣ (x)⎦ (floor function) rounds down the number x
to an integer
B exactly, so we have some unused space in each block equal to
B − (bfr * R) bytes
To utilize this unused space, we can store part of a record on one block and the rest
on another. A pointer at the end of the first block points to the block containing the
remainder of the record in case it is not the next consecutive block on disk. This
organization is called spanned because records can span more than one block.
Whenever a record is larger than a block, we must use a spanned organization. If
records are not allowed to cross block boundaries, the organization is called
unspanned.
File Headers

A file header or file descriptor contains information about a file that is needed by

the system programs that access the file records.

header includes

 disk addresses of the file blocks

 record format descriptions = field lengths and the order of fields within a record for

fixed-length unspanned records and field typE codes, separator characters, and

record type codes for variable-length records.

To search for a record on disk, one or more blocks are copied into main memory

buffers. Programs then search for the desired record or records within the buffers,

using the information in the file header. If the address of the block that contains the

desired record is not known, the search programs must do a linear search through the

file blocks. Each file block is copied into a buffer and searched until the record is

located or all the file blocks have been searched unsuccessfully. This can be very

time-consuming for a large file. The goal of a good file organization is to locate the

block that contains a desired record with a minimal number of block transfers.
Operations on Files

retrieval operations update operations.

■ Open. Prepares the file for reading or writing.

■ Reset. Sets the file pointer of an open file to the beginning of the file.

■ Find (or Locate). Searches for the first record that satisfies a search condition.

Read (or Get). Copies the current record from the buffer to a program vari

able in the user program

■ FindNext. Searches for the next record in the file that satisfies the search condition.

Delete. Deletes the current record and (eventually) updates the file on disk to reflect the

deletion.

■ Modify. Modifies some field values for the current record and (eventually) updates the file

on disk to reflect the modification.

Insert. Inserts a new record in the file by locating the block where the record is to be

inserted,

Close. Completes the file access by releasing the buffers and performing any

other needed cleanup operations

■ Scan.

■ FindAll

■ Find (or Locate) n.

■ FindOrdered

■ Reorganize.
Files of Unordered Records (Heap Files)
In this type of organization, records are placed in the file in the order in which they are

inserted, so new records are inserted at the end of the file. Such an organization is called a

heap or pile file.

It is also used to collect and store data records for future use.

Inserting a new record is very efficient

the last disk block of the file is copied into a buffer, the new record is added, and the block

is then rewritten back to disk. The address of the last file block is kept in the file header

, searching for a record using any search condition involves a linear search through the file

block by block—an expensive procedure.

If only one record satisfies the search condition, then, on the average, a program will read

into memory and search half the file

blocks before it finds the record.

To read all records in order of the values of some field, we create a sorted copy of the

file. Sorting is an expensive operation for a large disk file, and special techniques for

external sorting are used

For a file of unordered fixed-length records using unspanned blocks and contiguous

allocation, it is straightforward to access any record by its position in the file

Files of Ordered Records (Sorted Files)
We can physically order the records of a file on disk based on the values of one of

their fields—called the ordering field. This leads to an ordered or sequential file

If the ordering field is also a key field of the file—a

field guaranteed to have a unique value in each

record—then the field is called the ordering key for

the file

Ordered records have some advantages over

unordered files.
1. reading the records in order of the ordering key

values becomes extremely efficient because no sorting

is required

2. finding the next record from the current one in order

of the ordering key usually requires no additional block

accesses because the next record is in the same block

as the current one (unless the current record is the last

one in the block).

3. using a search condition based on the value of an ordering key field results in faster

access when the binary search technique is used, which constitutes an improvement over

linear searches.

A binary search for disk files can be done on the blocks rather than on the records

DISADVANTAGE
Inserting and deleting records are expensive operations for an ordered file because
the records must remain physically ordered. To insert a record, we must find its cor

rect position in the file, based on its ordering field value, and then make space in the

file to insert the record in that position.

2. For a large file this can be very time

consuming because, on the average, half the records of the file must be moved to

make space for the new record. This means that half the file blocks must be read and

rewritten after records are moved among them.

.3. For record deletion, the problem is

less severe if deletion markers and periodic reorganization are used.

Hashing Techniques
Another type of primary file organization is based on hashing, which provides very

fast access to records under certain search conditions.

This organization is usually called a hash file.

The search condition must be an equality condition on a single Field, called the hash field.

In most cases, the hash field is also a key field of the file, in which case it is called the hash

key.

The idea behind hashing is to provide a funcTion h, called a hash function or randomizing

function.

, which is applied to the Hash field value of a record and yields the address of the disk block

in which the record is stored.

Internal Hashing

For internal files, hashing is typically implemented as a hash table through the use

of an array of records. Suppose that the array index range is from 0 to M – 1, as

shown in Figure 17.8(a); then we have M slots whose addresses correspond to the

array indexes. We choose a hash function that transforms the hash field value into

an integer between 0 and M − 1. One common hash function is the h(K) = K mod

M function, which returns the remainder of an integer hash field value K after division by M;

this value is then used for the record address.

External Hashing for Disk Files

Hashing for disk files is called external hashing. To suit the characteristics of disk

storage, the target address space is made of buckets, each of which holds multiple

records. A bucket is either one disk block or a cluster of contiguous disk blocks. The
hashing function maps a key into a relative bucket number, rather than assigning an

absolute block address to the bucket. A table maintained in the file header converts

the bucket number into the corresponding disk block address,

A collision occurs when the hash field value of a record that is being inserted hashes

to an address that already contains a different record. In this situation, we must

insert the new record in some other position, since its hash address is occupied. The

process of finding another position is called collision resolution.

There are numerous methods for collision resolution, including the following:

■ Open addressing. Proceeding from the occupied position specified by the

hash address, the program checks the subsequent positions in order until an

unused (empty) position is found. Algorithm 17.2(b) may be used for this

purpose.

■ Chaining. For this method, various overflow locations are kept, usually by

extending the array with a number of overflow positions. Additionally, a

pointer field is added to each record location. A collision is resolved by plac

ing the new record in an unused overflow location and setting the pointer of

the occupied hash address location to the address of that overflow location.

A linked list of overflow records for each hash address is thus maintained, as

shown in Figure 17.8(b).

■ Multiple hashing. The program applies a second hash function if the first

results in a collision. If another collision results, the program uses open

addressing or applies a third hash function and then uses open addressing if

necessary.

Each collision resolution method requires its own algorithms for insertion,

retrieval, and deletion of records.

The collision problem is less severe with buckets, because as many records as will fit

in a bucket can hash to the same bucket without causing problems. However, we

must make provisions for the case where a bucket is filled to capacity and a new

record being inserted hashes to that bucket. We can use a variation of chaining in

which a pointer is maintained in each bucket to a linked list of overflow records for

the bucket.The pointers in the linked list should be record pointers, which include both a

block address and a relative record position within the block.

Hashing provides the fastest possible access for retrieving an arbitrary record given

the value of its hash field. Although most good hash functions do not maintain

records in order of hash field values, some functions—called order preserving— do

The hashing scheme described so far is called static hashing because a fixed number

of buckets M is allocated. This can be a serious drawback for dynamic files. Suppose

that we allocate M buckets for the address space and let m be the maximum number

of records that can fit in one bucket; then at most (m * M) records will fit in the allo

cated space. If the number of records turns out to be substantially fewer than

(m * M), we are left with a lot of unused space. On the other hand, if the number of

records increases to substantially more than (m * M), numerous collisions will

result and retrieval will be slowed down because of the long lists of overflow

records.
Dynamic Hashing. A precursor to extendible hashing was dynamic hashing, in

which the addresses of the buckets were either the n high-order bits or n − 1 high

order bits, depending on the total number of keys belonging to the respective

bucket. The eventual storage of records in buckets for dynamic hashing is somewhat

similar to extendible hashing. The major difference is in the organization of the

directory. Whereas extendible hashing uses the notion of global depth (high-order d

bits) for the flat directory and then combines adjacent collapsible buckets into a

bucket of local depth d − 1, dynamic hashing maintains a tree-structured directory

with two types of nodes:

■ Internal nodes that have two pointers—the left pointer corresponding to the

0 bit (in the hashed address) and a right pointer corresponding to the 1 bit.

■ Leaf nodes—these hold a pointer to the actual bucket with records.

Extendible Hashing. In extendible hashing, a type of directory—an array of 2d

bucket addresses—is maintained, where d is called the global depth of the directory. The

integer value corresponding to the first (high-order) d bits of a hash value

is used as an index to the array to determine a directory entry, and the address in

that entry determines the bucket in which the corresponding records are stored.

However, there does not have to be a distinct bucket for each of the 2d directory

locations. Several directory locations with the same first d bits for their hash values

may contain the same bucket address if all the records that hash to these locations fit

in a single bucket. A local depth d —stored with each bucket—specifies the number

of bits on which the bucket contents are based. Figure 17.11 shows a directory with
global depth d = 3.

The value of d can be increased or decreased by one at a time, thus doubling or halv

ing the number of entries in the directory array. Doubling is needed if a bucket,

whose local depth d is equal to the global depth d, overflows. Halving occurs if d >

d for all the buckets after some deletions occur. Most record retrievals require two

block accesses—one to the directory and the other to the bucket.

To illustrate bucket splitting, suppose that a new inserted record causes overflow in

the bucket whose hash values start with 01—the third bucket in Figure 17.11. The

records will be distributed between two buckets: the first contains all records whose

hash values start with 010, and the second all those whose hash values start with

011. Now the two directory locations for 010 and 011 point to the two new distinct

buckets. Before the split, they pointed to the same bucket. The local depth d of the

two new buckets is 3, which is one more than the local depth of the old bucket.

If a bucket that overflows and is split used to have a local depth d equal to the global

depth d of the directory, then the size of the directory must now be doubled so that

we can use an extra bit to distinguish the two new buckets. For example, if the

bucket for records whose hash values start with 111 in Figure 17.11 overflows, the

two new buckets need a directory with global depth d = 4, because the two buckets

are now labeled 1110 and 1111, and hence their local depths are both 4. The direc

tory size is hence doubled, and each of the other original locations in the directory

is also split into two locations, both of which have the same pointer value as did the

original location.

The main advantage of extendible hashing that makes it attractive is that the per

formance of the file does not degrade as the file grows, as opposed to static external
hashing where collisions increase and the corresponding chaining effectively614

Chapter 17 Disk Storage, Basic File Structures, and Hashing

increases the average number of accesses per key. Additionally, no space is allocated

in extendible hashing for future growth, but additional buckets can be allocated

dynamically as needed. The space overhead for the directory table is negligible. The

maximum directory size is 2k, where k is the number of bits in the hash value.

Another advantage is that splitting causes minor reorganization in most cases, since

only the records in one bucket are redistributed to the two new buckets. The only

time reorganization is more expensive is when the directory has to be doubled (or

halved). A disadvantage is that the directory must be searched before accessing the

buckets themselves, resulting in two block accesses instead of one in static hashing.

This performance penalty is considered minor and thus the scheme is considered

quite desirable for dynamic files.

Lab 2
No ratings yet
Lab 2
6 pages
Chapter 17 Disk Storage, Basic File Structures, and Hashing Disk Storage Devices
No ratings yet
Chapter 17 Disk Storage, Basic File Structures, and Hashing Disk Storage Devices
10 pages
VeloCloud - Lab-Hol-2040-01-Net - PDF - en
No ratings yet
VeloCloud - Lab-Hol-2040-01-Net - PDF - en
71 pages
Module - 3 - Study Session - 2
No ratings yet
Module - 3 - Study Session - 2
11 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
1.file Organization
No ratings yet
1.file Organization
90 pages
File Organization
No ratings yet
File Organization
37 pages
CST 204 Dbms Module - 3 Physical Data Organization
No ratings yet
CST 204 Dbms Module - 3 Physical Data Organization
93 pages
Elmasri 6e Ch17 Week2 HW DiskStorage
No ratings yet
Elmasri 6e Ch17 Week2 HW DiskStorage
96 pages
Dbms 5
No ratings yet
Dbms 5
38 pages
CH 13
No ratings yet
CH 13
6 pages
Lec 5DB
No ratings yet
Lec 5DB
40 pages
8.physical Database Design
No ratings yet
8.physical Database Design
20 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
Disk Storage, Basic File Structures, and Hashing
No ratings yet
Disk Storage, Basic File Structures, and Hashing
18 pages
Fundamental File Structure Concepts-Report
No ratings yet
Fundamental File Structure Concepts-Report
25 pages
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
No ratings yet
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
13 pages
Part 6 Storage Index
No ratings yet
Part 6 Storage Index
28 pages
File Structure and Indexing
No ratings yet
File Structure and Indexing
18 pages
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
No ratings yet
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
38 pages
Physical Records
No ratings yet
Physical Records
14 pages
OSY Chapter 6 SSP
No ratings yet
OSY Chapter 6 SSP
24 pages
FP-Lecture-6 01
No ratings yet
FP-Lecture-6 01
33 pages
Database 2 Notes
No ratings yet
Database 2 Notes
42 pages
File Structures Indexing Kopyası
No ratings yet
File Structures Indexing Kopyası
76 pages
Of February 1978, Sex: Male, Class: Form 4A: Compiled by Kapondeni T. 11-Feb-14
No ratings yet
Of February 1978, Sex: Male, Class: Form 4A: Compiled by Kapondeni T. 11-Feb-14
7 pages
Elmasri Storage Hashing
No ratings yet
Elmasri Storage Hashing
27 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
F - DataBase Chapter 5
No ratings yet
F - DataBase Chapter 5
20 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Fundamental File Structure Concepts
No ratings yet
Fundamental File Structure Concepts
17 pages
DBMS Book Special Notes PDF
No ratings yet
DBMS Book Special Notes PDF
68 pages
15is62 FS 25QB Prasadbs
No ratings yet
15is62 FS 25QB Prasadbs
21 pages
4 DBMS
No ratings yet
4 DBMS
78 pages
Intro File2
No ratings yet
Intro File2
36 pages
Lecture 1 Edited-1
No ratings yet
Lecture 1 Edited-1
48 pages
TOPIC THREE-File System
No ratings yet
TOPIC THREE-File System
15 pages
Chapter 5 File Management
100% (2)
Chapter 5 File Management
37 pages
ss2 DPR Second Term
No ratings yet
ss2 DPR Second Term
5 pages
Lecture 3.3.1 File Organization
No ratings yet
Lecture 3.3.1 File Organization
13 pages
File Concept
No ratings yet
File Concept
21 pages
Module 5 File Organization 1
No ratings yet
Module 5 File Organization 1
37 pages
Lecture 01 - File Storage - Part 1
No ratings yet
Lecture 01 - File Storage - Part 1
48 pages
5 File Management
No ratings yet
5 File Management
14 pages
31 File Structures
No ratings yet
31 File Structures
20 pages
File Processing
No ratings yet
File Processing
55 pages
History of File Structures
No ratings yet
History of File Structures
26 pages
Data Structure Unit 5
50% (4)
Data Structure Unit 5
14 pages
Files and Their Organization: Data Hierarchy
No ratings yet
Files and Their Organization: Data Hierarchy
17 pages
File Organisation: Seekg Seekp
No ratings yet
File Organisation: Seekg Seekp
7 pages
Disk Storage, Basic File Structures, and Hashing
No ratings yet
Disk Storage, Basic File Structures, and Hashing
34 pages
6.file Managment
No ratings yet
6.file Managment
7 pages
File Organization Unit 4 Notes
No ratings yet
File Organization Unit 4 Notes
29 pages
File Organization-Lec8
No ratings yet
File Organization-Lec8
31 pages
File Management
No ratings yet
File Management
10 pages
Data File
No ratings yet
Data File
22 pages
Elmasri 6e Ch17 PPT Compatibility Mode Repaired
No ratings yet
Elmasri 6e Ch17 PPT Compatibility Mode Repaired
32 pages
Chapter 1
No ratings yet
Chapter 1
11 pages
UNIT 4 Os
No ratings yet
UNIT 4 Os
39 pages
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
5/5 (5)
Best Free Open Source Data Recovery Apps for Mac OS English Edition
From Everand
Best Free Open Source Data Recovery Apps for Mac OS English Edition
Cyber Jannah Sakura
No ratings yet
Ass 3 Remaining
No ratings yet
Ass 3 Remaining
2 pages
FAQ's Unit-5
No ratings yet
FAQ's Unit-5
6 pages
SA - Unit 2 - Question Bank
No ratings yet
SA - Unit 2 - Question Bank
6 pages
DBMS Lab Assignment - III
No ratings yet
DBMS Lab Assignment - III
9 pages
DBMS HIMANSHU MANDAL - Removed
No ratings yet
DBMS HIMANSHU MANDAL - Removed
21 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
57 pages
Data Types Sophia.K
No ratings yet
Data Types Sophia.K
1 page
Installing Python For Windows: Figure 2.1: Running The Python Setup File
No ratings yet
Installing Python For Windows: Figure 2.1: Running The Python Setup File
79 pages
Se 3 GTzu 1 y QOn G7 NJ UO8 FQ7 EQAQJMq VVB
No ratings yet
Se 3 GTzu 1 y QOn G7 NJ UO8 FQ7 EQAQJMq VVB
11 pages
2018-Hv-Sinan Petrus Toma-Rac Node Eviction - Die Nadel Im Heuhaufen Finden-Praesentation
No ratings yet
2018-Hv-Sinan Petrus Toma-Rac Node Eviction - Die Nadel Im Heuhaufen Finden-Praesentation
67 pages
How To Setup Jmeter
No ratings yet
How To Setup Jmeter
9 pages
Audio Term
No ratings yet
Audio Term
3 pages
02-PAS-Design & Config-System Devices Configuration
No ratings yet
02-PAS-Design & Config-System Devices Configuration
24 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
4 pages
DLD Questions
No ratings yet
DLD Questions
3 pages
Biznet MetroNET and GamersNET - Connection Setting - Microtik Router PDF
No ratings yet
Biznet MetroNET and GamersNET - Connection Setting - Microtik Router PDF
7 pages
Lesson 3.1
No ratings yet
Lesson 3.1
20 pages
Form 15ca
No ratings yet
Form 15ca
2 pages
Juniper Lab Manual
100% (2)
Juniper Lab Manual
101 pages
Pygame Zero: e e R Z o
No ratings yet
Pygame Zero: e e R Z o
13 pages
CUDA Compression Final Report
No ratings yet
CUDA Compression Final Report
11 pages
L Edit
No ratings yet
L Edit
38 pages
Chapter 1
No ratings yet
Chapter 1
177 pages
City College of Calamba: Net201 - Computer Networking 2 Midterm Enrichment Activity 1 - Configuring Vlans and Trunking
No ratings yet
City College of Calamba: Net201 - Computer Networking 2 Midterm Enrichment Activity 1 - Configuring Vlans and Trunking
7 pages
Install Shield 2012 Repackage R User Guide
No ratings yet
Install Shield 2012 Repackage R User Guide
220 pages
Simcom Sim8200 m2 Evb2 Uses Guidelines v1 01
No ratings yet
Simcom Sim8200 m2 Evb2 Uses Guidelines v1 01
35 pages
Microsoft® Midtown Madness™ 1.0 (Chicago Edition) README
No ratings yet
Microsoft® Midtown Madness™ 1.0 (Chicago Edition) README
8 pages
74hc151 Multiplexor
No ratings yet
74hc151 Multiplexor
15 pages
DDCA Question Bank CO-4
No ratings yet
DDCA Question Bank CO-4
11 pages
DNS (Domain Name Service) - TCP and Udp 53: The Hypertext Transfer Protocol (HTTP)
No ratings yet
DNS (Domain Name Service) - TCP and Udp 53: The Hypertext Transfer Protocol (HTTP)
13 pages
XC8-PIC-Assembler User's Guide For Embedded Engineers
No ratings yet
XC8-PIC-Assembler User's Guide For Embedded Engineers
33 pages
Modicon M251 Logic Controller: Programming Guide
No ratings yet
Modicon M251 Logic Controller: Programming Guide
260 pages
Register
No ratings yet
Register
11 pages
KL 005.11 Ksws Description en v1.0
100% (1)
KL 005.11 Ksws Description en v1.0
3 pages

Basic File Structure

Uploaded by

Basic File Structure

Uploaded by

Basic File Structure, Hashing & Indexing

Secondary and tertiary storage. This category includes magnetic disks,

(a) A fixed-length record with six fields and size of 71 bytes.

unusually large records that cannot fit in one block.

the system programs that access the file records.

 disk addresses of the file blocks

record type codes for variable-length records.

retrieval operations update operations.

■ Open. Prepares the file for reading or writing.

able in the user program

on disk to reflect the modification.

other needed cleanup operations

■ Find (or Locate) n.

heap or pile file.

Inserting a new record is very efficient

block by block—an expensive procedure.

into memory and search half the file

blocks before it finds the record.

external sorting are used

allocation, it is straightforward to access any record by its position in the file

If the ordering field is also a key field of the file—a

field guaranteed to have a unique value in each

record—then the field is called the ordering key for

Ordered records have some advantages over

values becomes extremely efficient because no sorting

2. finding the next record from the current one in order

of the ordering key usually requires no additional block

accesses because the next record is in the same block

as the current one (unless the current record is the last

one in the block).

file to insert the record in that position.

2. For a large file this can be very time

rewritten after records are moved among them.

.3. For record deletion, the problem is

less severe if deletion markers and periodic reorganization are used.

fast access to records under certain search conditions.

This organization is usually called a hash file.

in which the record is stored.

of an array of records. Suppose that the array index range is from 0 to M – 1, as

this value is then used for the record address.

External Hashing for Disk Files

the bucket number into the corresponding disk block address,

to an address that already contains a different record. In this situation, we must

process of finding another position is called collision resolution.

■ Open addressing. Proceeding from the occupied position specified by the

extending the array with a number of overflow positions. Additionally, a

pointer field is added to each record location. A collision is resolved by plac

shown in Figure 17.8(b).

results in a collision. If another collision results, the program uses open

retrieval, and deletion of records.

block address and a relative record position within the block.

records in order of hash field values, some functions—called order preserving— do

records increases to substantially more than (m * M), numerous collisions will

similar to extendible hashing. The major difference is in the organization of the

bucket of local depth d − 1, dynamic hashing maintains a tree-structured directory

with two types of nodes:

■ Leaf nodes—these hold a pointer to the actual bucket with records.

Extendible Hashing. In extendible hashing, a type of directory—an array of 2d

integer value corresponding to the first (high-order) d bits of a hash value

block accesses—one to the directory and the other to the bucket.

Chapter 17 Disk Storage, Basic File Structures, and Hashing

quite desirable for dynamic files.

You might also like