0% found this document useful (0 votes)
26 views

Data Structures Unit 2

Uploaded by

ganaroy2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Data Structures Unit 2

Uploaded by

ganaroy2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Aurora’s Technological and Research Institute

1
Department of Computer Science and Engineering

Data Structures
R18 CSE/IT II year

By
D. Subhashini
Associate Professor
Department of Computer Science and Engineering
Aurora’s Technological and Research Institute
Hyderabad

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
2
Department of Computer Science and Engineering

UNIT II
Dictionaries: linear list representation, skip list
representation, operations - insertion, deletion and
searching.
Hash Table Representation: hash functions, collision
resolution-separate chaining, open
addressing-linear probing, quadratic probing, double
hashing, rehashing, extendible hashing.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
3
Department of Computer Science and Engineering

Dictionaries
A dictionary is a general-purpose data structure for storing a group of objects. A
dictionary has a set of keys and each key has a single associated value. When presented
with a key, the dictionary will return the associated value.
For example, the results of a classroom test could be represented as a dictionary with
pupil's names as keys and their scores as the values.
A dictionary is also called a hash, a map, a hashmap in different programming languages.
They're all the same thing: a key-value store.
Typically, the keys in a dictionary must be simple types (such as integers or strings) while
the values can be of any type. Dictionaries are often implemented as hash tables.
Keys in a dictionary must be unique; an attempt to create a duplicate key will typically
overwrite the existing value for that key.
Note that there is a difference (which may be important) between a key not existing in a
dictionary, and the key existing but with its corresponding value being null.

Implementations of Dictionaries
Dictionaries can be implemented mainly in the following two ways.
 Linear List Representation
There are two method of representing dictionaries using linear list.
1. Sorted Array- An array data structure is used to implement the dictionary.
2. Sorted Chain- A linked list data structure is used to implement the dictionary
 Skip List Representation

Main operations on dictionaries


Dictionaries typically support several operations:

 Retrieve a value
 Insert or update a value (typically, if the key does not exist in the dictionary, the key-
value pair is inserted; if the key already exists, its corresponding value is overwritten
with the new one)
 Remove a key-value pair
 Test for existence of a key

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
4
Department of Computer Science and Engineering

Skip List Data Structure


A skip list is a data structure that is used for storing a sorted list of items with a help of
hierarchy of linked lists that connect increasingly sparse subsequences of the items. A
skip list allows the process of item look up in efficient manner. The skip list data structure
skips over many of the items of the full list in one step, that’s why it is known as skip list.

Complexity
Average
Case Worst Case

Space O(n) O(nlogn)

Search O(logn) O(n)

Insert O(logn) O(n)

Delete O(logn) O(n)

Structure of Skip List


A skip list is built up of layers. The lowest layer (i.e. bottom layer) is an ordinary
ordered linked list. The higher layers are like ‘express lane’ where the nodes are
skipped (observe the figure).

Insert an element in Skip list


Deciding nodes level
Each element in the list is represented by a node, the level of the node is chosen
randomly while insertion in the list. Level does not depend on the number of
elements in the node.
MaxLevel is the upper bound on number of levels in the skip list.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
5
Department of Computer Science and Engineering

Node Structure
Each node carries a key and a forward array carrying pointers to nodes of a different
level. A level i node, carries i forward pointers indexed through 0 to i.

Insertion in Skip List


We will start from highest level in the list and compare key of next node of the current
node with the key to be inserted. Basic idea is If
 Key of next node is less than key to be inserted then we keep on moving forward
on the same level

 Key of next node is greater than the key to be inserted then we store the pointer
to current node i at update[i] and move one level down and continue our
search.

At the level 0, we will definitely find a position to insert given key.


Consider this example where we want to insert key 17:

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
6
Department of Computer Science and Engineering

Skip List Searching and Deletion


Searching an element in Skip list
Searching an element is very similar to approach for searching a spot for inserting an
element in Skip list. The basic idea is if:
 Key of next node is less than search key then we keep on moving forward on the
same level.

 Key of next node is greater than the key to be inserted then we store the pointer
to current node i at update[i] and move one level down and continue our
search.

At the lowest level (0), if the element next to the rightmost element has key equal to
the search key, then we have found key otherwise failure.
Consider this example where we want to search for key 17:

Deleting an element from the Skip list


Deletion of an element k is preceded by locating element in the Skip list using above
mentioned search algorithm. Once the element is located, rearrangement of pointers is
done to remove element form list just like we do in singly linked list. We start from
lowest level and do rearrangement.
After deletion of element there could be levels with no elements, so we will remove
these levels as well by decrementing the level of Skip list.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
7
Department of Computer Science and Engineering

Consider this example where we want to delete element 6:

Here at level 3, there is no element (arrow in red) after deleting element 6. So we will
decrement level of skip list by 1.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
8
Department of Computer Science and Engineering

HASH TABLE REPRESENTATION

 Hash table is a data structure used for storing and retrieving data very quickly.
Insertion of data in the hash table is based on the key value. Hence every entry
in the hash table is associated with some key.

 Using the hash key the required piece of data can be searched in the hash table
by few or more key comparisons. The searching time is then dependent upon the
size of the hash table.

 The effective representation of dictionary can be done using hash table. We can
place the dictionary entries in the hash table using hash function.

HASH FUNCTION

 Hash function is a function which is used to put the data in the hash table. Hence
one can use the same hash function to retrieve the data from the hash table.

 The integer returned by the hash function is called hash key.

For example: Consider that we want place some employee records in the hash table.
The record of employee is placed with the help of key: employee ID. The employee ID
is a 7 digit number for placing the record in the hash table. To place the record 7 digit
number is converted into 3 digits by taking only last three digits of the key.

If the key is 496700 it can be stored at 0th position. The second key 8421002, the
record of those key is placed at 2nd position in the array.
Hence the hash function will be: H(key) = key%1000
Where key%1000 is a hash function and key obtained by hash function is called hash
key.

 Bucket and Home bucket: The hash function H(key) is used to map several
dictionary entries in the hash table. Each position of the hash table is called bucket.

The function H(key) is home bucket for the dictionary with pair whose value is key.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
9
Department of Computer Science and Engineering

TYPES OF HASH FUNCTION

There are various types of hash functions that are used to place the record in the
hash table-

 Division Method:

The hash function depends upon the remainder of division. Typically the divisor is
table length.

For ex: If the record 54, 72, 89, 37 is placed in the hash table and if the table size
is 10 then h(key) = record % table size
0
1
54%10=4 2 72
72%10=2 3
89%10=9 4 54
37%10=7 5
6
7 37
8
9 89
 Mid Square:

In the mid square method, the key is squared and the middle or mid part of the result
is used as the index. If the key is a string, it has to be preprocessed to produce a
number.
Consider that if we want to place a record 3111 then

31112 = 9678321
for the hash table of size 1000 H(3111) = 783 (the middle 3 digits)

 Multiplicative hash function:

The given record’s key is multiplied by some constant value. The formula for computing
the hash key is:

H(key) = floor(p *(fractional part of key*A)) where p is integer constant and A is


constant real number.

Donald Knuth suggested to use constant A = 0.61803398987


If key 107 and p=50 then

H(key) = floor(50*(107*0.61803398987))
= floor(3306.4818458045)
= 3306
At 3306 location in the hash table the record 107 will be placed.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
10
Department of Computer Science and Engineering

 Digit Folding:

The key is divided into separate parts and using some simple operation these parts are
combined to produce the hash key.
For ex: consider a record 12365412 then it is divided into separate parts as 123 654 12
and these are added together

H(key) = 123+654+12
= 789
The record will be placed at location 789

 Digit Analysis:

The digit analysis is used in a situation when all the identifiers are known in advance.
We first transform the identifiers into numbers using some radix, r. Then examine the
digits of each identifier. Some digits having most skewed distributions are deleted. This
deleting of digits is continued until the number of remaining digits is small enough to
give an address in the range of the hash table. Then these digits are used to calculate
the hash address.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
11
Department of Computer Science and Engineering

COLLISION

The hash function is a function that works on key and returns a value, using which the
record can be placed in the hash table. Thus this function helps us in placing the record
in the hash table at appropriate position and due to this we can retrieve the record
directly from that location. This function need to be designed very carefully and it
should not return the same hash key address for two different records. This is an
undesirable situation in hashing.

Definition:
The situation in which the hash function returns the same hash key (home bucket) for
more than one record is called collision and two same hash keys returned for different
records is called synonym.

Similarly when there is no room for a new pair in the hash table then such a situation
is called overflow. Sometimes when we handle collision, it may lead to overflow
conditions. Collision and overflow show the poor hash functions.

For example, 0
1 131
Consider a hash function. 2
3 43
H(key) = recordkey%10 having the hash table size of 10 4 44
5
The record keys to be placed are 6 36
7 57
131, 44, 43, 78, 19, 36, 57 and 77 8 78
131%10=1 9 19
44%10=4
43%10=3
78%10=8
19%10=9
36%10=6
57%10=7
77%10=7

Now if we try to place 77 in the hash table then we get the hash key to be 7 and at
index 7 already the record key 57 is placed. This situation is called collision. From the
index 7 as we find that there is no room to place 77 in the hash table. This situation is
called overflow.

COLLISION RESOLUTION TECHNIQUES

If collision occurs then it should be handled by applying some techniques. Such a


technique is called collision handling technique.
 Chaining
 Open addressing (linear probing)
 Quadratic probing
 Double hashing
 Double hashing
 Rehashing

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
12
Department of Computer Science and Engineering

CHAINING

In collision handling method chaining is a concept which introduces an additional field


with data i.e. chain. A separate chain table is maintained for colliding data. When
collision occurs then a linked list(chain) is maintained at the home bucket.

For ex: Consider the keys to be placed in their home buckets are 131, 3, 4, 21, 61, 7,
97, 8, 9

then we will apply a hash function as H(key) = key % D

Where D is the size of table. The hash table will be:

Here D = 10

0
1 131 21 61 NULL

3 NULL

NULL
4

7 97 NULL

A chain is maintained for colliding elements. for instance 131 has a home bucket (key)
1. similarly key 21 and 61 demand for home bucket 1. Hence a chain is maintained at
index 1.

OPEN ADDRESSING – LINEAR PROBING

This is the easiest method of handling collision. When collision occurs i.e. when two
records demand for the same home bucket in the hash table then collision can be
solved by placing the second record linearly down whenever the empty bucket is found.
When use linear probing (open addressing), the hash table is represented as a one-
dimensional array with indices that range from 0 to the desired table size-1.
Before inserting any elements into this table, we must initialize the table to represent
the situation where all slots are empty. This allows us to detect overflows and collisions
when we inset elements into the table. Then using some suitable hash function the
element can be inserted into the hash table.

For example:
Consider that following keys are to be inserted in the hash table 131, 4, 8, 7, 21, 5,

31, 61, 9, 29

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
13
Department of Computer Science and Engineering

Initially, we will put the following keys in the hash table.


We will use Division hash function. That means the keys are placed using the formula

H(key) = key % tablesize H(key) = key % 10

For instance the element 131 can be placed at H(key) = 131 % 10 = 1

Index 1 will be the home bucket for 131. Continuing in this fashion we will place 4, 8,

7. Now the next key to be inserted is 21. According to the hash function

H(key)=21%10
H(key) = 1

But the index 1 location is already occupied by 131 i.e. collision occurs. To resolve this
collision we will linearly move down and at the next empty location we will prob the
element. Therefore 21 will be placed at the index 2. If the next element is 5 then we
get the home bucket for 5 as index 5 and this bucket is empty so we will put the
element 5 at index 5.

Index Key Key Key

0 NULL NULL NULL

1 131 131 131

2 NULL 21 21

3 NULL NULL 31

4 4 4 4

5 NULL 5 5

6 NULL NULL 61

7 7 7 7

8 8 8 8

9 NULL NULL NULL

after placing keys 31, 61

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
14
Department of Computer Science and Engineering

The next record key is 9. According to decision hash function it demands for the home
bucket 9. Hence we will place 9 at index 9. Now the next final record key 29 and it
hashes a key 9. But home bucket 9 is already occupied. And there is no next empty
bucket as the table size is limited to index 9. The overflow occurs. To handle it we
move back to bucket 0 and is the location over there is empty 29 will be placed at 0th
index.

Problem with linear probing:


One major problem with linear probing is primary clustering. Primary clustering is a
process in which a block of data is formed in the hash table when collision is resolved.
Key

39
19%10 = 9 cluster is formed
18%10 = 8 29
39%10 = 9 8
29%10 = 9
8%10 = 8

rest of the table is empty

this cluster problem can be solved by quadratic probing.


18

QUADRATIC PROBING: 19

Quadratic probing operates by taking the original hash value and adding successive
values of an arbitrary quadratic polynomial to the starting value. This method uses
following formula.

H(key) = (Hash(key) + i2) % m)

where m can be table size or any prime number.


for ex: If we have to insert following elements in the hash table with table size 10:
37, 90, 55, 22, 17, 49, 87 0 90
1 11
37 % 10 = 7 2 22

90 % 10 = 0 3
55 % 10 = 5 4
22 % 10 = 2 5 55
11 % 10 = 1 6
7 37
Now if we want to place 17 a collision will occur as 8
17%10 = 7 and bucket 7 has already an element 37. 9
Hence we will apply quadratic probing to insert this record in the hash table.
Hi (key) = (Hash(key) + i2) % m
Consider i = 0 then (17 + 02) % 10 = 7
D. Subhashini, Assoc. prof
Aurora’s Technological and Research Institute
15
Department of Computer Science and Engineering

(17 + 12) % 10 = 8, when i =1

The bucket 8 is empty hence we will place the element at index 8. 0 90


Then comes 49 which will be placed at index 9. 1 11
2 22
49 % 10 = 9 3
4
5 55
6
7 37
8 17
9 49

Now to place 87 we will use quadratic probing.


0 90
(87 + 0) % 10 = 7 1 11
(87 + 1) % 10 = 8… but already occupied 2 22
(87 + 22) % 10 = 1.. already occupied 3
(87 + 32) % 10 = 6 4
It is observed that if we want place all the necessary elements in 5 55
the hash table the size of divisor (m) should be twice as large as 6 87
total number of elements. 7 37
8 17
9 49
DOUBLE HASHING
Double hashing is technique in which a second hash function is applied to the key when a collision
occurs. By applying the second hash function we will get the number of positions from the point of
collision to insert.
There are two important rules to be followed for the second function:
 it must never evaluate to zero.
 must make sure that all cells can be probed.
The formula to be used for double hashing is

H1(key) = key mod tablesize


Key
H2(key) = M – (key mod M)
90

where M is a prime number smaller than the size of the table.


22
Consider the following elements to be placed in the hash table of size 10
37, 90, 45, 22, 17, 49, 55
Initially insert the elements using the formula for H1(key).
Insert 37, 90, 45, 22 45

H1(37) = 37 % 10 = 7
H1(90) = 90 % 10 = 0 37
H1(45) = 45 % 10 = 5
H1(22) = 22 % 10 = 2
H1(49) = 49 % 10 = 9 49

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
16
Department of Computer Science and Engineering

Now if 17 to be inserted then


Key
H1(17) = 17 % 10 = 7 90
H2(key) = M – (key % M)
17

Here M is prime number smaller than the size of the table. 22


Prime number smaller than table size 10 is 7

Hence M = 7 , H2(17) = 7-(17 % 7)


=7–3=4 45

That means we have to insert the element 17 at 4 places from 37. e to take 4
In short we ha v jumps. Therefore the 17 will be placed at index 1. 37

Now to insert number 55


49

H1(55) = 55 % 10 =5 Collision
Key

H2(55) = 7-(55 % 7) 90
=7–6=1
17

That means we have to take one jump from index 5 to place 55. 22

Finally the hash table will be -

45

55

37

49

Comparison of Quadratic Probing & Double Hashing

 The double hashing requires another hash function whose probing efficiency
is same as some another hash function required when handling random collision.
 The double hashing is more complex to implement than quadratic probing.
 The quadratic probing is fast technique than double hashing.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
17
Department of Computer Science and Engineering

REHASHING

Rehashing is a technique in which the table is resized, i.e., the size of table is doubled
by creating a new table. It is preferable if the total size of table is a prime number.
There are situations in which the rehashing is required.

 When table is completely full


 With quadratic probing when the table is filled half.
 When insertions fail due to overflow.

In such situations, we have to transfer entries from old table to the new table by re
computing their positions using hash functions.

Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. the table size
is 10 and will use hash function.,

H(key) = key mod tablesize

37 % 10 = 7
90 % 10= 0
55 % 10 = 5
22 % 10 = 2
17 % 10 = 7 Collision solved by linear probing 49 % 10 = 9

Now this table is almost full and if we try to insert more elements collisions will occur
and eventually further insertions will fail. Hence we will rehash by doubling the table
size. The old table size is 10 then we should double this size for new table, that
becomes 20. But 20 is not a prime number, we will prefer to make the table size as
23. And new hash function will be

H(key) key mod 23

37 % 23 = 14
90 % 23 = 21
55 % 23 = 9
22 % 23 = 22
17 % 23 = 17
49 % 23 = 3
87 % 23 = 18

Now the hash table is sufficiently large to accommodate new insertions.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
18
Department of Computer Science and Engineering

Advantages:
1. This technique provides the programmer a flexibility to enlarge the table size if
required.
2. Only the space gets doubled with simple hash function which avoids occurrence
of collisions.

EXTENSIBLE HASHING

 Extensible hashing is a technique which handles a large amount of data. The data
to be placed in the hash table is by extracting certain number of bits.
 Extensible hashing grow and shrink similar to B-trees.
 In extensible hashing referring the size of directory the elements are to be placed
in buckets. The levels are indicated in parenthesis.

For ex: Directory

0 1
Levels
(0) (1)
001 111
data to be placed in bucket
010

 The bucket can hold the data of its global depth. If data in bucket is more than
global depth then, split the bucket and double the directory.

Consider we have to insert 1, 4, 5, 7, 8, 10. Assume each page can hold 2 data
entries (2 is the depth).

Step 1: Insert 2, 4

1 = 001
0
4 = 100
(0)
001
100

Insert 5. The bucket is full. Hence double the directory.

D. Subhashini, Assoc. prof


Aurora’s Technological and Research Institute
19
Department of Computer Science and Engineering

1 = 001
0 1
4 = 100
(0) (1)
5 = 101
100 001
101 Based on last bit the data is inserted.

Step 2: Insert 7
7 = 111
But as depth is full we can not insert 7 here. Then double the directory and split the bucket. After
insertion of 7. Now consider last two bits.

00 01 10 11
(1) (2) (2)

100 001 111


101

Step 3: Insert 8 i.e. 1000

00 01 10 11
(2)
(1)
001 111
100
101
1000

Step 4: Insert 1 0

D. Subhashini, Assoc. prof


Thus the data is inserted using extensible hashing.

Deletion Operation:

If we wan to delete 10 then, simply make the bucket of 10 empty.

00 01 10 11

(1) (2) (2)

100 001 111


1000 101

Delete 7.

00 01 10 11
(1) (1)

100 001 Note that the level was increased when we insert 7.
1000 101 Now on deletion
of 7, the level should get decremented.

Delete 8. Remove entry from directory 00.

00 00 10 11

(1) (1)
100 001
101
Applications of hashing:

1. In compilers to keep track of declared variables.


2. For online spelling checking the hashing functions are used.
3. Hashing helps in Game playing programs to store the moves made.
4. For browser program while caching the web pages, hashing is used.
5. Construct a message authentication code (MAC)
6. Digital signature.
7. Time stamping
8. Key updating: key is hashed at specific intervals resulting in new key

You might also like