CS5114: Theory of Algorithms
Sharath Raghvendra
Associate Professor, Dept. of Computer Science,
Virginia Tech
Fall 2022
Agenda
▪ Today’s topic
▪ Dynamic Programming (Continued)
▪ RNA Secondary Structure Determination
▪ Edit Distance
▪ Shortest path with negative edge costs
▪ After that…
▪ NP-Completeness and reductions between problems (Chapter 8)
Principles of Dynamic Programming
1. Define polynomial number of sub-problems. One of them should be the
final solution
2. There is a natural ordering of problems from smallest to largest such that
we can obtain solution to the larger problem by combining solutions of
smaller ones.
3. Optimal value of a problem can be computed from optimal value of its
sub-problems
RNA Secondary Structure
▪ RNA : is a string 𝑏1 𝑏2 , … . , 𝑏𝑛 over alphabet {A, C, G, U}
▪ Secondary Structure: RNA is a single-stranded, therefore it loops back and
forms base pairs with itself. Determining this 2-dimensional structure is
essential for predicting its function in the body.
Constraints on Secondary Structure
▪ Given a string, we need to compute a set of base pairings S = 𝑏𝑖 , 𝑏𝑗
that satisfy:
▪ [Watson-Crick Pairings] S is a matching, i.e., no alphabet of the string is involved in
multiple pairings. Furthermore, this pairing is A—U , U—A , C—G or G—C
▪ [No Sharp Turns] The ends of each pair are separated by at least four bases. If
𝑏𝑖 , 𝑏𝑗 ∈ 𝑆, then 𝑖 < 𝑗 − 4
▪ [Non-crossing] If 𝑏𝑖 , 𝑏𝑗 and 𝑏𝑘 , 𝑏𝑙 are in S, then we cannot have 𝑖 < 𝑘 < 𝑗 < 𝑙
▪ Maximizing the size of S ensures that the structure is stable.
▪ Objective: Find a base pairing S which is of the largest size.
Examples
Optimal Substructure
▪ What is the optimal substructure property here?
▪ Results in two sub-problems, one is not in the desired form.
▪ Sub-problem 1 𝑏1 , 𝑏2 , 𝑏3 , … , 𝑏𝑡−1 = 𝑂𝑃𝑇(𝑡 − 1)
▪ Sub-problem 2 𝑏𝑡+1 , 𝑏𝑡+2 , … , 𝑏𝑛−1
Dynamic Programming – False Start
▪ Lets try defining sub-problems similar to weighted interval scheduling
▪ OPT(j) = Max number of base pairings in a secondary structure of the sub-
string 𝑏1 , 𝑏2 , … , 𝑏𝑗
▪ We need to express solution to OPT(j) as solution to smaller sub-problems
▪ Suppose 𝑏𝑛 matches to 𝑏𝑡 . What sub-problems do we get?
▪ Results in two sub-problems, one is not in the desired form.
▪ Sub-problem 1 𝑏1 , 𝑏2 , 𝑏3 , … , 𝑏𝑡−1 = 𝑂𝑃𝑇(𝑡 − 1)
▪ Sub-problem 2 𝑏𝑡+1 , 𝑏𝑡+2 , … , 𝑏𝑛−1
Dynamic Programming – Sub-Problems
▪ Define sub-problems over intervals
▪ OPT(i, j) = max number of base pairs in a secondary structure of the sub-
string 𝑏𝑖 , 𝑏𝑖+1 , … , 𝑏𝑗
▪ Case 1: If 𝑖 ≥ 𝑗 − 4
▪ 𝑂𝑃𝑇 𝑖, 𝑗 = 0
▪ Case 2: Base 𝑏𝑗 is not involved in a pair
▪ 𝑂𝑃𝑇 𝑖, 𝑗 = 𝑂𝑃𝑇(𝑖, 𝑗 − 1)
▪ Case 3: Base 𝑏𝑗 maps with 𝑏𝑡 for some 𝑖 ≤ 𝑡 < 𝑗 − 4
▪ Problem splits into 𝑂𝑃𝑇(𝑖, 𝑡 − 1) and 𝑂𝑃𝑇(𝑡 + 1, 𝑗 − 1)
Ordering Sub-Problems
▪ We need to still define an order for solving sub-problems
▪ We will solve 𝑂𝑃𝑇 𝑖, 𝑗 in increasing order of their interval length
▪ Note, we have expressed optimal for interval 𝑖, 𝑗 as a combination of
solution to sub-problems of smaller intervals.
Efficiency
▪ There are 𝑂 𝑛2 intervals, each corresponds to a sub-problem
▪ To compute the solution to one sub-problem, we need to compute max
over 𝑂 𝑛 smaller sub-problems
▪ Therefore, total time taken is 𝑂(𝑛3 )
▪ Retrieving the base pairings can be done via backtracking using the table
▪ You can figure out a procedure for retrieving solution on your own.
Edit Distance
Edit Distance
▪ The Edit Distance (or Levenshtein distance) is a distance function
between two strings. It is used to measure their similarity.
▪ Edit Distance = minimum number of edits needed to transform one string into the
other.
▪ Applications:
▪ spell checkers,
▪ natural language translation,
▪ bioinformatics.
Application
How does google know that “dynamic” and
Dymanic are similar?
Edit Distance
▪ Given initial string x and final string y, what is the minimum number of edits
required to convert string x to string y.
▪ Valid edits include:
▪ Insert a character to x
▪ Delete a character from x
▪ Change a character in x to another character
Example
Alternately, we can ask how well the words line-up? Such line-ups are called alignments.
Example
Alternately, we can ask how well the words line-up? Such line-ups are called alignments.
Representing edits as alignments
▪ An alignment is a matching of positions in strings x and y given by
▪ Sets 1 … 𝑚 , {1 … 𝑛} represent positions in x and y
▪ A matching of positions (or indices) is an alignment if there is no “crossing
pairs” in M: if 𝑖, 𝑗 , 𝑖 ′ , 𝑗 ′ ∈ 𝑀, 𝑖 < 𝑖′ then 𝑗 < 𝑗′
▪ Not every index needs to be matched in an alignment.
▪ Unmatched indices create a gap. Every gap has a gap penalty 𝛿
▪ 𝑖, 𝑗 ∈ 𝑀 is a mismatch if 𝑥𝑖 ≠ 𝑦j . Mismatch penalty is 𝛼𝑥𝑖 𝑦𝑗
▪ Goal: Given two strings obtain an alignment with minimum penalty.
Dynamic Programming – Observations
▪ So, when matching one word to another one, consider the last characters
of strings s and t.
▪ If we are lucky enough that they ALREADY match,
▪ then we can simply "cancel" and recursively find the edit distance between the two
strings left when we delete this last character from both strings.
▪ Otherwise, we MUST make one of three changes:
1) delete the last character of string x (this character matches to a “blank” on y)
2) delete the last character of string y (this character matches to a “blank” on x)
3) change the last character of string s to the last character of string t. (this will
correspond to a mismatch)
▪ Edit distance between an empty string and another string is the length of
the second string times insertion/deletion cost.
Defining Sub-Problems