Lecture 2: Arrays, Linked Lists, and Recursion
31/7/2014
Topic: Stack and heap memory
Array
Contiguous uniform storage
Pros:
Constant time access to indexed memory location: O(1)
Memory efficient
Cons:
Resizing requires copying to new array
Worst case for sorted insert is must shuffle entire array
Linked List
Pros:
Insertion: O(1)
Cons:
Does not allow access via index must traverse entire list: O(n)
Uses more memory
Types:
Single linked list
Double linked list
o Easy to insert new node before a certain node
Recursion
Linear recursion
Tail recursion
Binary recursion
Multiple recursion
Lecture 3: Abstract Data Types
31/7/2014
Model of data structure that specifies:
o Type of data stored
o Operations supported and types of parameters
Specifies what each operation does, but not how
In Java: interface
Factory Pattern can be used to allow different ADT implementations to be
instantiated at runtime
Lecture 4 Counting Primitive Operations
4/8/2014
Primitive Operations
Assumed to take constant time
Big-O
f ( n ) is O ( g ( n ) )
f ( n ) cg ( n )
(Big Omega)
Describes a lower bound
Big-
n n0
cg ( n ) f ( n)
for
cn 0 such that
Describes an upper bound
Big-
if there are positive constants
(Big Theta)
c 1 g ( n ) f ( n ) c2 g(n)
Describes a tight bound
f (n) is asymptotically equal to
g(n)
Lecture 5: Asymptotic Analysis
7/8/2014
Lecture 6:
7/8/2014
Lecture 7: The Stack ADT and Array-based Stacks
11/8/2014
The stack ADT:
Insertion/deletion: LIFO
Main operations:
o push(object)
o pop()
Auxiliary operations:
o top()
o size()
o isEmpty()
For empty stack, pop and top return null
Java Virtual Machine (JVM)
Keeps track of chain of active methods with a stack
Performance and limitations
Let n be no. of elements, N >= n fixed sized of array
Space: O(N)
Time: O(1)
Trying to push element into full stack exception
Note: Using linked list instead would give space O(n)
Lecture 8: The Queue ADT and Array-based Queues
14/8/2014
Queue ADT:
Insertion/deletion: FIFO
Main operations:
o enqueue(object)
o dequeue()
Auxiliary operations:
o first()
o size()
o isEmpty()
Array-based queue
Array size N
Two vars keep track of front and size
Array location
r=( f +sz ) mod N
is the first empty slot past the rear of the
queue
List and Iterator ADTs and Amortisation
Growable Array-based array list:
push(o) adds o to the end of list
o When array is full, replace array with a larger one
Incremental strategy: Increase size by constant
Doubling strategy: double the size
o Worst time complexity: O(n). Best: O(1)
Amortized time of a push operation is the average time taken by push: T(n)/n
Turns out doubling strategy is better
Positional list ADT:
Accessor
o first()
o last()
o before(p)
o after(p)
Update
o addFirst(e)
o addLast(e)
o addBefore(p,e)
o addAfter(p,e)
o set(p,e)
o remove(p)
Natural implementation: doubly-linked list
Iterator
next()
hasNext()
Lecture 9: Trees
18/8/2014
Tree: Hierarchical structure of nodes with parent-child relation
Traversal
Preorder traversal
o A node is visited before its descendants
Postorder traversal
A node is visited after its descendants
Binary tree
Each node has at most two children: left child and right child
Proper binary tree: each node has exactly zero or two children
Inorder traversal
o A node is visited after its left subtree and before its right subtree
Lecture 10: Trees continued
22/8/2014
Evaluate Arithmetic Expressions
Euler Tour Traversal
Preorder, postorder, inorder traversals are special cases of this
Walk around tree and visit each node three times
Linked Structure for Trees
Node stores:
o Element
o Parent node
o Sequence of children nodes
For binary tree, node stores:
o Element
o Parent node
o Left child node
o Right child node
Node implement position ADT
Array-Based Binary Tree
Node v is stored at A[rank(v)]
o rank(root) = 0
o Left child: rank(node) = 2 * rank(parent(node)) + 1
o Right child: rank(node) = 2 * rank(parent(node)) + 2
O(2n ) , all nodes on right side
Worst case space:
May be ok to use if every row is filled. Saves a little space by not needing
links
Priority Queue
Priority Queue ADT
Entry: pair(key, value)
Main methods:
o insert(k, v)
o removeMin(): removes and returns entry with smallest key, or null if
empty
Additional methods:
o min()
o size()
o isEmpty()
Ordering
x y y x
Comparability:
Antisymmetric property:
Transitive property:
Entry ADT
Methods:
getKey()
getValue()
x y y x x= y
x y y z x y
Comparator ADT
Compare(x,y) returns i such that:
i< 0 if a<b
i=0 if a=b
i> 0 if a>b
Sequence-based priority queue
Unsorted:
o Insert: O(1)
o removeMin and min: O(n)
Sorted:
o Insert: O(n)
o removeMin and min: O(1)
Selection-Sort
PQ-sort function implemented with unsorted sequence
Running time
o n insert operations takes O(n)
Runs in
O ( n2 )
time
Insertion-Sort
PQ-sort function implemented with sorted sequence
Runs in
O(n2 ) time
In-place insertion-sort
Use swaps
Lecture 11: Heap
25/8/2014
Heap
A binary tree storing keys at nodes and satisfying properties:
o
Heap-Order: for every internal node
other than root,
key ( v ) key ( parent ( v ) )
o
Complete Binary Tree: Fill top to bottom, left to right.
depth
2i
nodes at
The last node of a heap is the rightmost node of maximum depth
Height of heap:
O ( log n )
Insertion: Insert after last node, then restore heap-order by swapping with
parent
Removal: Replace root with last node, then set new last node. Restore
heap-order (downheap)
Updating the last node:
o If previous left node is left child, go to right child. Otherwise:
o Go up from previous last node until left child or root
o If left child is reached, to the right child
o Go down left until a leaf is reached
o If root is reached, go left until new row
o
Have to traverse height twice at most, so
n
log
O
Heap-Sort
O ( n)
Space:
Insert and removeMin:
Size, isEmpty, min:
Heap-sort:
O ( log n )
O ( 1)
O ( n log n )
Array-based heap implementation
Can represent heap with n keys by array of length n
For node at rank I, left child 2i + 1, right child 2i + 2
Merging two heaps
Merge two heaps and key k by storing k as root with the two heaps as
subtrees, then perform downheap
Bottom-up heap construction
Constructs a heap from n given keys in
O(n)
time
Lecture 12: Adaptable Priority Queue
29/08/2014
Adaptable Priority Queue ADT:
replaceKey(e, k)
- returns previous key of entry e
replaceValue(e, v) - returns previous value
Location-aware list
Each entry stores:
o
o
o
key
value
position (or rank)
Maps and Hashtables
Map
Searchable collection of key-value entries
Multiple entries with the same key are not allowed
Map
ADT
get(k)
put(k,v)
remove(k)
size(), isEmpty()
entrySet()
keyset()
values()
Hash function
Usually specified as composition of hash code and compression:
keys integers [0, N1]
To minimise collisions when combining sequence of components, use
polynomial accumulation, e.g.
p ( z )=ao +a 1 z +a2 z 2+ z= prime number e . g . 33
Polynomials can be evaluated in
O(n)
time with Horners rule
Lecture 13
1/9/2014
Collision handling
Separate chaining: let each cell in table point to linked list of entries
Open addressing: place colliding item in different cell
Linear probing
o Place colliding item in next (circularly) available table cell
o Future collisions longer sequence of probes (inspections)
o remove(k): if entry (k,o) is found replace with DEFUNCT and return
element o, else return null
Double hashing
o Have secondary hash function d(k)
If collision, place in first available cell at
( h ( k ) + jd ( k ) ) mod N , j=1,2,3,
Performance of hashing
Worst case search/insert/remove on hash table is
O(n)
(i.e.
everything hashes to same spot)
=n /N
Load factor:
Assuming hash values are like random numbers, expected number of
probes for an insertion with open addressing is
1/(1 )
O(1)
Settle for average
In practice hashing is very fast provided load factor is not close to 100%
Set, Multiset, Multimap, Skip Lists
10/9/2014
Set
Set ADT
S T
addAll(T):
retainAll(T):
removeAll(T):
S T
ST
Implementations
Sorted list
o
Generic merging: Runs in
(>,<,==) run in
O ( n A +n B )
provided auxiliary methods
O ( 1)
Generic Merging
Auxiliary methods aIsLess, bIsLess, bothAreEqual can be replaced for e.g.
intersection, union
Runs in
O ( n A +n B )
provided auxiliary methods run in
Multimap
Similar to map, but can have multiple entries with same key
Skip Lists
O(1)
A data structure for maps that uses a randomized insertion algorithm
A series of lists
Sh
S0 , S1 , , Sh
contains only the two keys
E.g. search for 78
Implementation
Quad-node: stores entry, link to nodes prev, next, below, above
Define special keys PLUS_INF, MINUS_INF
Space Usage
h
Expected number of nodes:
2ni n 21i 2 n O ( n )
i=0
i=0
Height
With high probability, height is
log n
O )
Time
Search time proportional to number of drop-downs + number of scanforwards.
O(log n) , scan-forward also
Drop-down bounded by height:
Expected search, insertion and deletion time is
O(log n)
O(log n)
(with high
probability)
Binary search
Can perform nearest neighbour queries on an ordered map that is
implemented with an array, sorted by key
At each step, the number of candidate items is halved
O(log n)
Search tables
Ordered map implemented by sorted sequence
O ( log n )
Search using binary search:
Insert/remove:
Good for small maps or when search is the most common operation
O ( n)
Binary search tree
Binary tree storing keys/key-value entries at its internal nodes
Let u be In left subtree of v, w in right subtree.
External nodes do not store items
key ( u ) v key ( w )
Deletion
Performance
For ordered map with
Space used is
O ( n)
items implemented by binary search tree with height
Get, put and remove take
Height is
O(n)
O ( h ) time
in worst case (spread in a line) and
O(log n)
in best
case
Search Tree
8/9/2014
AVL Tree
Binary search tree with height balance: for every internal node v, heights
of children of v can differ by at most 1
n
log
O
Height of AVL tree storing
Fixing height-balance property after insertion: trinode restructuring
Search Trees
11/9/2014
Multi-way search tree
keys is
(2, 4) tree
A multi-way search tree with:
o Node-Size property: Every internal node has at most four children
o Depth property: All external nodes have the same depth
h log ( n+1 )
search takes
O(log n)
Splay tree
18/10/2014
Is a binary search tree, but after each operation do splaying
Does not enforce logarithmic upper bound on tree height
o
Worst-case time complexity of search, delete and inserts is
Amortised time complexity is
O(n)
O ( log n )
Efficient for looking up frequently accessed elements thanks to splaying
Splaying
Move a node to root using rotation
Performed after all operations
See slides for
o Which node to splay
o Zig/zag operations. Note: easier to understand with actual numbers
for nodes
Splaying costs
o O(h) rotations, each O(1). Still O(n) worst-case
o Amortised cost is O(log n)
Time T(n) needed to perform a series of n splay operations,
divided by n, i.e. T(n)/n
Red-Black Trees
18/9/2014
A red-black tree is a representation of a (2,4) tree by means of a binary
search tree whose nodes are coloured red or black
o Has same logarithmic worst-case asymptotic time complexity for
search, insert, removal
o Simpler implementation with a single node type
Is a binary search tree with:
o Root property: root is black
o External property: every leaf is black
o
o
Internal property: the children of red node are black
Depth property: All leaves have the same black depth
Height
Height O(log n)
o Proof: height is at most twice the height of associated (2,4) tree,
which is O(log n)
Search works the same way as for binary search tree. O(log n)
Insertion
Colour the newly inserted node red, unless root.
If there is double red,
Sorting & Selection
22/9/2014
Merge-sort
Divide and conquer
Like heap-sort, has O(n log n) time
Unlike heap-sort
o Does not use an auxiliary priority queue
o It accesses data in a sequential manner (suitable for sorting data on
a disk)
Algorithm
o
Divide: partition
into
S1
each.
Recur: recursively merge-sort
Conquer: merge
S 1 and
S2
and
S1
S2
and
of about n/2 elements
S2
into a unique sorted sequence
Merging two sorted sequences implemented by doubly linked lists, each
length n/2, is O(n). Height is O(log n). At each recursive call the sequence
length is halved. Thus total running time of merge-sort is O(n log n)
Quick-Sort
Is a randomised sorting algorithm based on divide-and-conquer paradigm
Algorithm
o Divide: pick a random element x (called pivot) and partition S into
L: elements less than x
E: elements equal x
G: elements greater than x
Note: each insert/remove is at beginning/end of sequence, so is
O(1). Thus partition takes O(n)
o Recur: sort L and G
o Conquer: join L, E and G
Worst-case running time is when pivot is min/max
Expected height is O(log n), work done at nodes of same depth is O(n)
Expected running time of quick-sort is O(n log n)
O(n )
In-place quick-sort
See slides
Shortest Paths
13/10/2014
Dijkstras Algorithm
Solves single-source shortest path problem
Similar to uniform cost search, but no goal vertex
Used on weighted graph with non-negative edge weights
Grow a cloud of vertices starting at source vertex s. Each step add
vertex with smallest distance label d
Edge relaxation:
min { d ( z ) , d (u )+ weight ( e ) } d ( z )
Text Processing Tries
20/10/2014
Trie
Trie: A compact data structure for representing a set of strings, e.g. all the
words in a text
Supports pattern matching queries in time proportional to the pattern size
Standard trie: ordered tree with each node labelled with a character.
Children alphabetically ordered
Compressed trie: Every internal node is at least degree 2. Obtained from
standard trie by compressing chains of redundant nodes.
See slides for time complexity
Compact representation
Compact representation of compressed trie, store ranges of indices at
nodes instead of strings.
Suffix Trie