Dsa 3
Dsa 3
Dsa 3
Set : well-defined collection of distinct objects. An element of a set is any object in the set.
Sets
- belongs to or is an element of
Terminology. Operations. Set-Based ADTs. Implementations. ADT Dictionary. Direct Access Tables. Hash Tables. MApping ADT. Priority Queue ADT. Partially Ordered Trees. Heaps.
i) every element of S is also an element of T, and ii) every element of T is also an element of S.
Set terminology
A subset of a set is a part of the set.
Set terminology
It is often convenient to assume that elements are linearly ordered by a relation, usually denoted by '<' (read "less than" or "precedes"). A linear order on a set S satisfies two properties:
S is a subset of T if each element of S is also an element of T. S = T if and only if S T and T S. S is a proper subset of T if S is a subset of T and S T.
is a proper subset of any non-empty set. Any non-empty set is an improper subset of itself. The power set (S) of a set S is the set containing
all the subsets of S. |(S)| = number of subsets of S = 2|S|.
DSA - lecture 3 - T.U. Cluj-Napoca - M. Joldos 3
For any a and b in S, exactly one of a < b, a = b, or b < a is true. For all a, b, and c in S, if a < b and b < c, then a < c (transitivity).
Set operations
complement ( ) not
Set operations
difference (\,) but not
A = {x U : x A} | A |=| U | | A |
intersection () and
A \ B = {x U : x A x B} = A B | A \ B |=| A | | A B |
symmetric difference ( ,) exclusive or
A B = {x U : x A x B}
union () or
A B = ( A \ B ) ( B \ A) = ( A B) ( A B)
Two sets A and B are disjoint if A B = .
A B = {x U : x A x B} | A B |=| A | + | B | | A B |
DSA - lecture 3 - T.U. Cluj-Napoca - M. Joldos 5
For insertion we successively examine the hash table looking for an unoccupied slot The slots we check depend on the key we wish to insert
10
Hash functions
A hash function is usually specified as the composition of two functions: Hash code: h1: keys integers Compression function: h2: integers [0, N 1]
The hash code is applied first, and the compression function is applied next on the result, i.e.,
A hash function h maps keys of a given type to integers in a fixed interval [0, N 1] Example:
h(x) = h2(h1(x))
The goal of the hash function is to disperse the keys in an apparently random way
Hash codes
Component sum:
Polynomial accumulation:
We reinterpret the memory address of the key object as an integer Good in general, except for numeric and string keys We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., byte, short, int, and float in C)
Polynomial p(x) can be We partition the bits of the key evaluated in O(n) time into a sequence of components of using Horners rule:
fixed length (e.g., 8, 16 or 32 bits)
a0 a1 an1
Integer cast:
The following polynomials are successively computed, each from the previous one in O(1) time
13
Compression Functions
Division: h2 (y) = y mod m The size m of the hash table is usually chosen to be a prime The reason has to do with number theory and is beyond the scope of this course Multiply, Add and Divide (MAD): h2 (y) = (ay + b) mod m a and b are nonnegative integers such that a mod m 0 Otherwise, every integer would map to the same value b
Rehashing Strategies
Linear hashing
0 i m-1; checks B[h'(k)], then B[h'(k)+1], , B[m-1] primary clustering effect: two keys that hash onto different values compete for same locations in successive hashes
Quadratic hashing
h: an auxiliary hash function; 0 i m-1; c1 0 and c2 0: auxiliary constants checks B[h'(k)]; next checked locations depend quadratically on i secondary clustering effect: two different keys that hash onto same locations compete for succcessive hash locations Note: i is the number of trial (0 for first)
15
16
Rehashing Strategies
Double hashing
Probability of collision after first rehash: Probability of at least i collisions: Average number of probes for insertion
h2(k) mod m away from the previous positions (sequence depends in two ways on key k) h2(k) and m must be relatively prime (to allow for the whole table to be searched). To ensure this condition:
take m=2k and make h2(k) generate an odd number or take m prime make h2(k) return a positive integer m smaller
than m
Conclusion: to fill the table completely (M=m) requires an average of ln m, or m ln m probes in total
DSA - lecture 3 - T.U. Cluj-Napoca - M. Joldos 17 DSA - lecture 3 - T.U. Cluj-Napoca - M. Joldos 18
The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100% Applications of hash tables:
19
20
Mapping ADT
Mapping (associative store): a function from elements of one type, called the domain type to elements of another type (possibly the same) type, called the range type. Operations:
createEmpty(A): initializes the mapping A by making each domain element have no assigned range value assign (A, d, r) defines A(d) to be r compute (A, d, r) returns true and sets r to A(d) if A(d) is defined; false is returned otherwise.
insert and deletemin (as well as the usual createEmpty for initialization of the data structure). min() returns, but does not remove, an entry with smallest key size() isEmpty()
Entry ADT: An entry in a priority queue is simply a (key, value) pair Priority queues store entries to allow for efficient insertion and removal based on keys Operations for Entry ADT:
Keys in a priority queue can be arbitrary objects on which an order is defined Two distinct entries in a priority queue can have the same key
DSA - lecture 3 - T.U. Cluj-Napoca - M. Joldos 23
key(): returns the key for this entry value(): returns the value associated with this entry
24
A comparator encapsulates the action of comparing two objects according to a given total order relation
A generic priority queue uses an auxiliary comparator The comparator is external to the keys being compared When the priority queue needs to compare two keys, it uses its comparator compare(x, y): Returns an integer i such that i < 0 if a < b, i = 0 if a = b, and i > 0 if a > b; an error occurs if a and b cannot be compared.
25
26
insert in a POT
new element as far left as possible on the lowest level
Binary tree At the lowest level, where some leaves may be missing, we require that all missing leaves are to the right of all leaves that are not on the lowest level. Tree is partially ordered: the priority of node v is no greater than the priority of the children of v
Initial tree
27
28
Heaps
Complete binary tree of height, h, iff:
Heaps
it is empty or its left subtree is complete of height h-1 and its right subtree is completely full of height h-2 or its left subtree is completely full of height h-2 and its right subtree is complete of height h-1. all the leaves are either on the same level or two adjacent ones, and all nodes at the lowest level are as far to the left as possible.
it is empty or the key in the root is larger than that in either child
and both subtrees have the heap property.
highest priority item is at the root value of the heap structure: we can both extract
the highest priority item and insert a new one in O(log n) time
29 DSA - lecture 3 - T.U. Cluj-Napoca - M. Joldos 30
Properties of a complete tree lead to a very efficient storage mechanism using n sequential locations in an array
DSA - lecture 3 - T.U. Cluj-Napoca - M. Joldos 31 DSA - lecture 3 - T.U. Cluj-Napoca - M. Joldos 32
Heap operations
33
34
Heap operations
35
36
Summary
Sets in general Abstract data types based on sets Mapping ADT
Reading
Implementations Implementations Operations: insert, deleteMin Operations: insert, deleteMax Implementation in arrays
operations Implementations
lists
Dictionary ADT
Implementations
Heaps
AHU, chapter 5, sections 5.1, 5.2 CLR, chapters: 12, 7.1, 7.2, 7.3, 7.5 Preiss, chapters: Hashing, Hash Tables and Scatter Tables. Heaps and Priority Queues. Notes
37
38