Study Guide 1

4/25/2021 Study guide: - WorkFlowy
Study guide:
General:
To do:
Repeat the problem out loud, slowly but confidently.
Ask questions, check assumptions, know what the constraints are.
Talk out the problem!
Work out examples on the whiteboard, devising some kind of brute force
algorithm along the way.
Don't stay silent here.
Start with naive simplest brute force solution.

Check edge cases and do null checks.
Work out examples using the code/algorithm; keep track of state by hand.
When debugging, don't rush. Don't guess and check—make sure I know
exactly what went wrong and how to fix it. Preventatively: don't try to be too
clever.
Know time/space complexity: http://bigocheatsheet.com/
Heuristics and guidelines:
Always consider hash tables (dictionaries) with their O(1)-ness.
"Tip: using a dictionary is the most common way to get from a brute
force approach to something more clever. It should always be your first
thought."
Sets are also extremely useful if all I care about is unique elements and
membership testing.
Don't be afraid of using one even for graph traversals. Worry about
optimizing space later.
If at all array-related, try sorting first.
If search-related, consider binary search.
"Binary search teaches us that when a list is sorted or mostly sorted:
The value at a given index tells us a lot about what's to the left and
what's to the right.
We don't have to look at every item in the list. By inspecting the middle
item, we can "rule out" half of the list.
We can use this approach over and over, cutting the problem in half
until we have the answer. This is sometimes called "divide and
conquer.""
For binary search, consider floor and ceiling as exclusive walls.
Start with a brute force solution, look for repeat work in that solution, and
modify it to only do that work once.
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE 1/112
"Instead, we started off by coming up with a slow (but correct) brute

force solution and trying to improve from there. We looked at what our
solution actually calculated, step by step, and found some repeat work.
Our final answer came from brainstorming ways to avoid doing that
repeat work."
Space-time trade-off! That is, for better time complexity, try using auxiliary
data structures. E.g., do something in a single pass over an array—O(N) time
—by using a hash table—O(N) space—vs. doing something in nested passes
—O(N^2)—without using any extra space—O(1).
What information can I store to save time? Don't be scared of using
tuples.
Counting sort (where the items are stored as keys/indices and the
frequency is the value) is a big example here.
Another example: O(1) get_max method for a Stack class stores extra
information (the max at and below each element) to save time (instead
of iterating through the stack O(N)).
Remember that I can use two pointers:
E.g.: to get the midpoint in a linked list by having one pointer go twice
as fast, or in a sum array problem by having the pointers work inward
from either end, or to test if a string is a palindrome.
Also, the "stick" approach where to get the first node in a cycle, you
start one pointer n nodes ahead of the other, where n is the length of
the cycle.
Don't forget that I can also start at the center and expand outward, like
in the palindromic substring problem:
https://leetcode.com/problems/longest-palindromic-
substring/description/.
Try a greedy solution:
Iterate through the problem space taking the best solution "so far"
(running sum etc.) until the end.
Optimal if the problem has "optimal substructure," which means
stitching together optimal solutions to subproblems yields an optimal
solution.
Also usually necessary for O(N) solutions.
"Suppose we could come up with the answer in one pass through the
input, by simply updating the 'best answer so far' as we went.
What additional values would we need to keep updated as we looked
at each item in our input, in order to be able to update the 'best
answer so far' in constant time?"
Does solving the problem for size (N – 1) make solving it for size N any
easier? If so, try to solve recursively and/or with dynamic programming.

Using the max/min function can help a lot in recursive or dynamic
programming problems.
To use, always think about starting from the end, i.e., if I had an answer
for everything but the last something (element, cell, subproblem, etc.),
what more do I need? Basically the n+1 step of an induction proof.
Think: at step n, what happens if I *do* include this thing? What
happens if I don't? What do I need to compare and take the max
of? And how do I use my answer to step n-1?
Start with a simple cache! Then move on to a tabular building-from-
bottom-up approach.
Always think recursion for binary trees. But remember that iterative
solutions are usually better because they avoid possible stack overflows
and can be more space-efficient.
A lot of problems can be treated as graph problems and/or use breadth-first
or depth-first traversal. And if the problem involves parsing or reversal in
some way, consider using a stack.
Any time you repeatedly have to take the min or max of a dynamic
collection, think heaps. (If you don’t need to insert random elements, prefer
a sorted array.)
Python has a useful heapq class (min-heap with 0-based indexing).
Particularly useful if there are multiple such possible min/max values
and some of them are in the "past" (precluding only having variables
for the lowest 1-2 elements).
If you have a lot of strings, try putting them in a prefix tree / trie.
To improve time complexity, also consider how various complexities match
to solution structures and try working backwards from a target runtime:
"If we're going to do better than O(N^2), we're probably going to do it
in either O(N lg N) or O(N). O(N lg N) comes up in sorting and
searching algorithms where we're recursively cutting the list in half. It's
not obvious that we can save time by cutting the list in half here. Let's
first see how well we can do by looping through the list only once."
Remember that time complexity is asymptotic, so don't worry about
iterating through a list multiple times—it's still O(N).
"Starting with a target runtime and working backward from there can
be a powerful strategy for all kinds of coding interview questions.":
We started with an O(N^2) brute force solution and wondered if
we could do better.
We knew to beat O(N^2), we'd probably do O(N) or O(N lg N), so
we started thinking of ways we might get an O(N lg N) runtime.
lg(N) usually comes from iteratively cutting stuff in half, so we

arrived at the final algorithm by exploring that idea.
Try simplifying or otherwise restating what the question is asking for.
Sometimes an explicit restatement (instead of relying on implicit
assumptions about a plan of attack) makes things much easier.
"The requirement of "the difference between the depths of any two leaf
nodes is no greater than 1" implies that we'll have to compare the
depths of all possible pairs of leaves. That'd be expensive—if there
are n leaves, there are O(N^2) possible pairs of leaves.
But we can simplify this requirement to require less work. For example,
we could equivalently say:
"The difference between the min leaf depth and the max leaf
depth is 1 or less"
"There are at most two distinct leaf depths, and they are at most 1
apart""
Break up a problem into independent cases and draw out sample inputs for
the cases.
"Breaking things down into cases is another strategy that really helped
us here.
Notice how simple finding the second largest node got when we
divided it into two cases:
The largest node has a left subtree.
The largest node does not have a left subtree.
Whenever a problem is starting to feel complicated, try breaking it
down into cases.
It can be really helpful to actually draw out sample inputs for those
cases. This'll keep your thinking organized and also help get your
interviewer on the same page."
Not quite the same as N-1, but sometimes a divide-and-conquer approach is
what is necessary. If I know the answer for exclusive parts of the problem,
can I somehow combine to get the final answer?
Avoid passing copies of subarrays, instead pass low and high indices.
For puzzle problems or anything where we can enumerate all possible
solutions and there's a notion of a partial candidate solution, consider
backtracking.
"Backtracking is an important tool for solving constraint satisfaction
problems, such as crosswords, verbal arithmetic, Sudoku, and many
other puzzles. It is often the most convenient (if not the most efficient)
technique for parsing, for the knapsack problem and other
combinatorial optimization problems."
combinatorial optimization problems.
Backtracking is a powerful technique. It allows us to try all possible

configurations/solutions, but in an optimal and incremental order, so
that we can eliminate wholesale bad approaches. Basically we generate
partial solutions at each iteration: if it passes the constraint check, we
recur deeper (DFS style); if not, we continue / "prune" that entire
branch and backtrack up the search tree to an unexplored branch. We
generate new solutions through a loop breadth-wise and explore them
further through DFS recurrence. The n queens problem is a typical
example.
Just remember, we need a quick test to see if our candidate solution so
far satisfies our constraints, and then we use DFS to explore further.
Choose, explore, uncooked. The unchoose part is critical.
If it is anything to do with even/odd-ness, consider "canceling" even
elements out. Don't need to track exact numbers.
Especially if a boolean function, think about how we can short-circuit and
return early.
Also important for breaking out of loops early.
For backtracking (not "real" backtracking) or reconstructing going backward,
think about what additional information we'd need.
"The tricky part was backtracking to assemble the path we used to
reach our end_node. In general, it's helpful to think of backtracking as
two steps:
Bitwise Operations:
Integers are represented as a string of bits (binary digits). Either big endian (akin
to most significant place value at left) or little endian (akin to most significant
place value at right). So three-hundred twenty-one as 321 or 123.
If unsigned integer, the number of bits used (the width) determines the numbers
that can be encoded: 2^n.
Unsigned means that the number is always positive; negatives aren't supported.
If signed integer, 4 methods; most common is 2's complement where negating a

number (whether positive or negative) is done by inverting all the bits and then
adding 1.
This has the advantage of having only one representation of 0 (instead of
negative 0 and positive 0).
With 2's complement: "a signed integral type with n bits [...] represent[s]
numbers from −2^(n−1) through 2^(n−1) − 1." So with 8 bits, the numbers
from -128 to 127 can be encoded.
Java supports the following 2's complement signed numerical primitives (does not
support unsigned numbers really):
Also called integral data types.
b 8 bi
byte -- 8 bits
E.g.: from −128 to 127.
short -- 16 bits
char -- unsigned 16 bits
int -- 32 bits
long -- 64 bits
Python supports 4 primary numeric types:
https://docs.python.org/2/library/stdtypes.html
int -- Implemented using C's long, giving at least 32 bits of precision.

float -- Implemented using C's double, number of bits of precision depends
on machine.
long -- Unlimited precision; expands to the limit of the available memory
(not limited by number of bits).
complex -- Have a real and imaginary part, each of which is a float.
There are also fractions -- that hold rationals -- and decimal -- that hold
floating-point numbers with user-definable precision.
Arithmetic:
Adding works as normal. Just remember that 1 + 1 = 10, so gotta carry the 1.
With subtraction, as normal but just be careful with borrowing from the left.
If substracting 1 from 0 (0 - 1), borrow from the first place to the left where
it's 1 - 0. If have to borrow multiple times, the leftmost digit (the one I'm
really borrowing from) becomes a 0, the intervening digits get a decimal
place added to them and then since they get borrowed from too a 1 is
subtracted (so a 0 becomes 10 then 1 and a 1 becomes 11 then 10), and the
original digit that needed to be increased becomes a 10.
http://sandbox.mc.edu/~bennet/cs110/pm/sub.html
Multiplication can be done with left bitshifts. So because 3 * 5 is equivalent

to 3 * (4 + 1), you can bitshift 3 to the left 2 places (which is 3 * 2^2 = 3 * 4)
and then add 3 (3 * 1) back in.
Division can be done with right bitshifts, but just remember that it's integer
division--rounded down basically.
Bitwise operations:
& = AND
^ = XOR
| = (inclusive) OR
x << n = left-shifts x by n places (0s appended at the end). Equivalent to x *
2^n
x >> n = right-shifts x by n places using sign extension. Equivalent to x / 2^n
(integer division).
In Python, right-shift 0-fills.
x >>> n = right-shifts x by n places where 0s are shifted into the leftmost

spots.
Only in Java.
~ = flips bits (unary operator)

Bit facts & tricks:
Since operations are bitwise (occur bit by bit), these are all equivalent to their series-
equivalents. E.g.: x ^ 0 is the same as x ^ 0s. If a statement is true for a single bit, it's true for a
sequence of bits.
^ (XOR)
x^0=x
x ^ 1 = ~x
~ = negation
x^x=0
& (AND)
x&0=0
x&1=x
x&x=x
| (inclusive OR)
x|0=x
x|1=1
x|x=x
Swapping two values without a temporary variable:
a=a^b
b=a^b
a=a^b
E.g.: a = 2, b = 3; a = 0010, b = 0011.
a = a ^ b = 0010 ^ 0011 = 0001
b = a ^ b = 0001 ^ 0011 = 0010
a = a ^ b = 0001 ^ 0010 = 0011
a = 3, b = 2.
Common (and not-so-common) bit tasks:
Get the value of the bit at i in num. Specifically, return whether the ith bit is
1.
boolean getBit(int num, int i) {
return ((num & (1 << i)) != 0);
}
First left-shift 1 by i to get something like 00100000. Then AND with
num to clear (set to 0) all the bits except the ith bit. If that value is not
0, then it's 1 and return true. Else it's 0 so return false.
To be more precise, the ANDing doesn't quite clear. It just makes it so
that there are only two cases: all the bits are 0 or all the bits except the
y p
ith bit is 0. In the first case, the ith bit must be 0; in the second, it must
be 1.
Set num's ith bit to 1.
int setBit(int num, int i) {
return num | (1 << i);
}
First left-shift 1 by i to again get something like 00100000. Then OR
with num so that the ith bit will become 1 and the other values will not
change. Recall that x OR-ing with 1 will always result in 1 and that x
OR-ing with 0 will not change x.
Clear num's ith bit. Specifically, set the ith bit to 0.
int clearBit(int num, int i) {
int mask = ~(1 << i);
return num & mask;
}
Do something like the inverse of set (naturally). Left-shift 1 by i to again
get something like 00100000. Then invert it so it looks like 11011111.
AND with num: ANDing with 1 doesn't change anything, so the only bit
that will change is the ith one, which will become a 0 if it isn't already.
Update num's ith bit with v. Specifically, set the ith bit to v.
int updateBit(int num, int i, int v) {
int mask = ~(1 << i);
return (num & mask) | (v << i);
}
The first part is equivalent to the clear bit method. Create a mask that
looks like 11011111 then AND with num to clear the ith bit. Next, left-
shift v by i bits so that v becomes a number where the ith bit is equal
to what v was and all the other bits are 0. Finally, OR with the modified
num (ith bit cleared) so that num's ith bit becomes 1 if v is 1 or is
otherwise left as 0--recall that no other bits in num are modified since
ORing with a 0 does not change the original.
Clear (unset) rightmost set bit. That is, change rightmost 1 to 0.
int clearRightmost(int num) {
return num & (num-1);
}
So, e.g., a 12 (1100) becomes a 1000 since ANDing 1100 and 1011 is
1000.
This forms the basis of the operation to determine parity.
Calculate parity. That is, return 1 (true) if the number of 1s is odd and 0
otherwise.
EPI 5.1. #link http://www.geeksforgeeks.org/write-a-c-program-to-find-the-parity-of-an-
unsigned-integer/ and
https://graphics.stanford.edu/~seander/bithacks.html#OperationCounting
Naive method:
boolean getParity(int num) {
boolean parity = false;
while (num) {
parity = !parity;
num = num & (num - 1);
}
return parity;
}
Each time through the loop (until num is 0), clear the rightmost
set bit and invert parity. Every odd number of clears, parity will be
true / 1, so in the end, parity will be 1 if the number of 1s is odd
and 0 otherwise.
Time complexity is the number of set bits.
We can also get the rightmost bit and xor with a 0/1 result. If 0,
result doesn't change. Otherwise, result flips.
Not-so-naive method uses a precomputed lookup table. So for
example the table could have the parities of the values from 0 through
15 inclusive. And then to find the parity of a 64 bit number, you would
go 16 bits at a time by right shifting and XORing the 16 bit parities
together.
This works because bitwise operations are associative (i.e., the
grouping doesn't matter).
Get the Hamming distance.
https://leetcode.com/problems/hamming-distance/#/description
"The Hamming distance between two integers is the number of

positions at which the corresponding bits are different.
Given two integers x and y, calculate the Hamming distance."
def hammingDistance(self, x, y):
# Result has 1 only in places where corresponding bits are
different.
z=x^y
ct = 0
# Count the number of 1s.
while z:
while z:
ct += 1
# Unsets the rightmost 1.
z &= z - 1
return ct
Arrays Dictionaries and Strings:
Hash tables are important! Note that insertion and find/lookup proceed through
identical starting steps: hash the key and go to that table location. This enables
the O(1) complexity for both.
Chaining is the common way to handle collisions. Just means each position
is actually the head of a linked list.
Open addressing is the other method. Deletions are tricky with this.
Java has built-in hashcodes.
Python's dictionaries are implemented using hash tables. They are simply arrays
whose indexes are obtained using a hash function on the keys.
Motivation:
"Think of a hash map as a "hack" on top of an array to let us use
flexible keys instead of being stuck with sequential integer "indices."
All we need is a function to convert a key into an array index (an
integer). That function is called a hashing function."
For the common case where I increment a key if it exists, but set it to 0
otherwise, use the get method:
my_dict[some_key] = my_dict.get(some_key, 0) + 1
Arrays offer easy random access but hard modification (have to move elements
around).
In the worst case a larger array has to be reallocated too (but amortizes out).
In Java:
Use StringBuilder for expensive string concatenation. And for easy string
reversal.
Everything is initialized to 0, including arrays.
Remember that chars are also ints.
Python only really has lists; arrays are just thin wrappers over C arrays.
Always use lists unless have a really good explicit reason to need C-style arrays.
Linked Lists:
All a linked list really is is the head node (which points to the next node, and so
on).
To reverse a linked list, just need to remember to store next and update prev and
curr pointers at each iteration:
nxt = curr.next
curr.next = prev
prev = curr
curr = nxt
To recursively reverse:
base case is null node or null node.next
remember to set head.next.next to head and then head.next = None
Deleting a node n is just setting the references to skip n: prev.next = n.next;
Can also delete a curr node even without a prev pointer: curr.data =
curr.next.data; curr.next = curr.next.next;
Basically, copy (and overwrite) next's data to curr and delete next.
Just make sure to check for the null pointer (!!) and be aware of the head pointer.
Linked lists offer easy modification but non-existent random access.
An important technique is the "runner" one:
Have two pointers iterating through the list, with one pointer either ahead
by a fixed amount or actually moving faster.
For example, to determine the midpoint of a linked list, have two pointers
such that one of them jumps 2 nodes at once and the other just 1. When the
fast pointer hits the end, the slow one will be in the middle of the list.
To determine if a linked list has a cycle, do the same thing where 1 pointer
moves ahead 2 nodes at a time. If there's a cycle, the runner will eventually
equal the normal curr node. If not, will hit null node.
https://www.wikiwand.com/en/Cycle_detection#/Tortoise_and_hare
def hasCycle(self, head):
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow == fast: return True
return False
then, once fast = slow, reset slow to head of list, and move forward
both pointers 1 at a time. once they equal again, that's the start of the
cycle. then move the fast pointer 1 at a time, once they equal for the
3rd time, that's the length of the cycle
can also use this to detect duplicates in an array given certain
constraints
http://keithschwarz.com/interesting/code/?dir=find-
duplicate
https://leetcode.com/problems/find-the-duplicate-
number/description/
def detectCycle(self head):
def detectCycle(self, head):
slow = fast = head

has_cycle = False
while fast and fast.next:
slow = slow.next
if slow == fast:
has_cycle = True
break
if not has_cycle: return None
# the point where they meet again, resetting slow to head, is
the cycle start
while head != fast:
head = head.next
fast = fast.next
return head
or alternatively, if don't need the start index and just want cycle length,
move fast 1 node at a time until hits slow again
A fixed amount (also called the stick approach) use case would be to
determine the start of a cycle if you know the length of the cycle.
Recursion can often help.
Example problems and model code:
Sort linked list using O(1) space and O(n log n) time:
https://leetcode.com/problems/sort-list/description/
def sortList(self, head):

# base case
if not head or not head.next: return head
# get middle (if even, get 1st middle)
slow = fast = head
while fast.next and fast.next.next:
slow = slow.next
# split list in two and sort halves
h2 = slow.next
slow.next = None
left = self.sortList(head)
right = self.sortList(h2)
# then merge sorted halves
return self.merge_sorted_lists(left, right)
def merge_sorted_lists(self, h1, h2):

if None in (h1, h2): return h1 or h2
if h1.val <= h2.val:
h1.next = self.merge_sorted_lists(h1.next, h2)
return h1
else:
h2.next = self.merge_sorted_lists(h1, h2.next)
return h2
Stacks and Queues:
Stacks are last-in first-out (LIFO), like a stack of dinner plates or trays.
Queues are first-in first-out (FIFO), like a queue at an amusement park ride.
Both are easily implemented with a linked list.
With a stack, there's only one access point: the top (the linked list's head), and
nodes point down / toward the tail.
With a queue, there are two: the first node (the head [of the line]) and the last (the
tail). Nodes point back / toward the tail and are added to the tail (!).
Priority queues are neat structures (technically ADTs) where ordering is
determined not by insertion time, but by some priority field:
Support insertion, find-min (or find-max), and delete-min (delete-max)
operations.
The idea is that if I consistently just want the highest priority element, the
priority queue will take care of that for me at insertion time, instead of my
having to manually resort at extraction time.
Backing data structures include heaps (and balanced binary trees).
Generally implemented by keeping an extra pointer to the highest priority
element, so on insertion, update iff new element is higher priority, and on
deletion, delete then use find-min (or find-max) to restore the pointer.
Popping everything from a stack and pushing it all to another stack reverses the
ordering.
Pushing everything back of course reverts the ordering.
Two stacks are Turing-complete.

(Binary) Trees:
A tree is in part defined as a node (the root), which holds some value, together
with references to other nodes (the children, themselves trees). Because this also
defines a directed graph, these conditions are added:
at most 1 reference can point to any given node (i.e., a node has no more
than 1 parent)
and no node in the tree points toward the root.
As such a tree is really just a special case of a directed graph (digraph):
As such, a tree is really just a special case of a directed graph (digraph):
The graph definition: a connected (directed in this context, but undirected in,

e.g., the context of a spanning tree) graph with no cycles.
Formally, directed means that the tree is a rooted tree.
Binary trees (branching factor = 2) are recursively built from nodes that have
pointers to a left and right child.
Binary trees aren't necessarily binary search trees (BSTs), which require that the left
children are less than or equal to the current node which is less than all the right
nodes.
Traversals:
Depth-first traversal (DFS) prefixes are in reference to when the root/current
node is visited. So pre-order traversal means first root, then left children,
then right; post-order means first left, then right, then root. In-order traversal
will give a sorted list if it's a valid BST. Make sure to recurse fully when
traversing.
DFS generally will hit leaves first/faster.
In-order traversal pseudocode is simple:
inorder(left)
visit(root)
inorder(right)
Same for the other traversals:
visit(root)
preorder(left)
preorder(right)
Only one kind of breadth-first traversal—the level order traversal. This
traversal visits nodes by levels from top to bottom and from left to right.
Uses a queue.
To choose between the two, we want to use DFS if we need to hit a leaf fast
and BFS if we're more concerned with a "middle" level closer to the root. But
if we need to print all leaf nodes, then by necessity they're the same.
Balanced trees are better. It also doesn't mean perfectly balanced.
Depth and height:
A node's depth = the number of edges from the node to the tree's root
node. A root node will have a depth of 0.
A node's height = the number of edges on the longest path from the node
to a leaf. A leaf (root?) node will have a height of 0.
The depth and height of a tree are equal.
Full and complete is a tree where all leaves are at the bottom and each non-leaf
node has exactly 2 children (sometimes called a perfect tree as in
http://www.cs.gettysburg.edu/~ilinkin/courses/Fall-2012/cs216/notes/bintree.pdf).
Rare because then must have exactly (2^n) - 1 nodes where n is the number of levels, or (2^
(h+1)) - 1 where h is the height.
Full = every node other than the leaves has 2 children. I.e., every node either
has 0 or 2 children.
Complete = every level (except maybe the last) is completely filled and all
nodes are as far left as possible. I.e., the tree is as compact as possible.
In a perfect tree, h = O(log n) because h = log2 (n + 1) − 1 and we drop the
less significant terms.
Algo tips:
For DFS, use a stack as always.
To validate BST, in-order traversal and comparing to sorted version is O(N lg
N)*, but the better O(N) way is to recurse down the tree and set floor and
ceiling conditions.
*Actually we can check if a list is sorted in O(N) time (we don't need to
resort) so same thing.
For lowest common ancestor (LCA), the LCA is just the first node that is in
between the keys I'm looking for. E.g., if a root node is greater than both key
nodes, then I know the LCA has to be on the left side of the tree.
In-order successor:
Important in a lot of binary tree operations.
The next node in an in-order traversal; i.e., the node with the smallest key
that's still greater than the current node's key.
Pseudo-pseudocode:
If current node has a right child, go to it then as far left as possible.
Must be smallest key in right subtree.
Otherwise, must travel up the tree (its left child / subtree is by
definition smaller, i.e., can't be successor):
If current node is the left child of its parent, then the parent must
be the successor.
Otherwise, keep going up until escape the right subtree and hit
the middle. I.e., when the current node is a left child, its parent
must be the in-order successor. We're looking for its ("upper-
right") parent.
Heaps:
A natural implementation of the priority queue ADT (the other being a
balanced binary tree).
The highest priority element (whether min or max) recursively sits at
the top of the heap.
Works by maintaining a partial ordering, so less expensive than fully
sorted, but more valuable than no ordering.
sorted, but more valuable than no ordering.
A complete tree where the highest (or lowest) priority element is always
stored at the root—hence a heap. A max heap is one where a node is always
greater than or equal to its children; a min heap is one where a node is
always lesser than or equal to its children.
Are used to implement priority queues—queues where the ordering is
based on priority, not on position in queue—and to do heap sort.
There is no implied ordering within a level (that is, no ordering between
siblings) and so a heap is not a sorted structure.
Insertion and deletion (need to respect the heap ordering property) are both
O(log N) where N is the number of nodes.
(Binary) heaps are implemented in arrays.
Note that any binary tree can be so implemented, but that it makes
particular sense for heaps, because heaps are guaranteed to be
complete trees and thus the array implementation will be compact.
If a non-complete tree is implemented in an array, then there will be
empty slots because a node may have only 1 child and the tree may go
on quite a bit deeper than that level.
Saves space because can use array position to implicitly maintain the
ordering property, instead of storing pointers.
In the array implementation:
Let n be the number of elements in the heap and i be an arbitrary valid
index of the array storing the heap. If the tree root is at index 0, with
valid indices 0 through n − 1, then each element a at index i has:
children at indices 2i + 1 and 2i + 2
and its parent at floor((i − 1) ⁄ 2).
Alternatively, if the tree root is at index 1, with valid indices 1
through n, then each element a at index i has:
1-basing sacrifices a tiny amount of space to simplify index arithmetic.
children at indices 2i and 2i +1

and its parent at index floor(i ⁄ 2).
Operations:
Both insert and remove — remove only removes the root since that's
all we care about — are done to an end of the heap to maintain the
compact shape. Then, to modify the ordering property, the heap is
traversed (respectively, up-heap / sift-up and down-heap / sift-down).
Both operations take O(log N) time.
Up-heap (or sift-up) is used, e.g., to restore the heap ordering when a
new element is added to the end/bottom:
Compare the added element with its parent; if they are in the
correct order stop
correct order, stop.
If not, swap the element with its parent and return to the previous
step.
Down-heap (or sift-down) is used, e.g., to restore the heap ordering
when the root is deleted/removed and replaced with the last element
(on the last level):
Compare the new root with its children; if they are in the correct
order, stop.
If not, swap the element with one of its children and return to the
previous step (for the newly ordered subtree). (Swap with its
smaller child in a min-heap and its larger child in a max-heap.)
So basically, swap with higher priority child.
Heapify, given a complete binary tree, turns the data structure into a
heap:
We want the heap property to obtain, so we need to ensure that
each parent node is greater (or lesser, if min heap) than its
children.
Starting at the last parent node (take the last node and use index
arithmetic to figure out the last parent), call down-heap on it. This
will make the subtree rooted at this parent node a heap.
Continue on for all the other parent nodes.
Where the heap is 1-based implemented:
Mostly from visualgo.
for (i = arr.length / 2; i >= 1; i--)

DownHeap(arr[i])
Heapsort just entails copying elements to be sorted into an array, heapifying
it, then taking the max (root) out each time and putting it at the end of the
result array.
This can be done in-place (in the same array) where one section is for
the heap and the other for the sorted list. Thus, each iteration of the
algorithm will reduce the heap by 1 and increase the sorted list by 1.
Heapsort is similar to insertion sort in that each step of the algorithm
takes the max element from the unsorted section and moves it to the
sorted section.
Note that heaps are not binary search trees, so we can't efficiently find a
given key using binary search.
This is because the order property only works between parent and
children, not between siblings.
Self-balancing binary trees:
Binary trees have performance proportional to their height, so keeping the
height small maximizes performance. In the worst case, a tree approximates
a linked list; a maximally unbalanced tree is just a linked list — where the
height is 1 less than the number of nodes.
Specifically, a balanced tree has performance O(log n) (think binary
search), but an unbalanced tree has performance O(n) (think linked list).
Self-balancing trees thus guarantee balanced-ness and consequently
efficient performance.
Red-black trees:
http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspx
Guarantees searching, insertion, and deletion in O(log n).

Each node has a color, red or black, associated with it. How the tree is
"painted" ensures balance. When modified, the tree is re-arranged and
re-painted to keep the necessary color properties.
Originally an abstraction of symmetric binary B-trees, also called 2-3-4
or 2-4 trees. Red-black trees are really isomorphic to 2-4 trees; i.e., they
are equivalent data structures.
https://www.cs.princeton.edu/~rs/AlgsDS07/09BalancedTrees.pdf
https://www.wikiwand.com/en/2%E2%80%933%E2%80%934_tree
In a 2-4 tree, there are 3 kinds of nodes:

2-node: 1 key and (if internal etc.) 2 children
3-node: 2 keys and 3 children
4-node: 3 keys and 4 children
All leaves are at the same depth, the bottom level. So, every path
from root to leaf is the same length ensuring perfect balance.
Leaf nodes have no data.
Red-black trees have the following properties:
https://www.topcoder.com/community/data-science/data-science-tutorials/an-
introduction-to-binary-search-and-red-black-trees/
1) every node is either red or black;

2) all leaves are black;
Leaves can either be explicit nodes with no data or just NIL nodes.
3) if a node is red, both its children are black; and

4) every path from a given node n to a descendant leaf has the
same number of black nodes (not counting node n). We call this
number the black height of n, which is denoted by bh(n).
These explicit properties give rise to this critical property: the path from
the root to the farthest leaf is no more than twice as long as the path
from the root to the nearest leaf, ensuring that the tree is roughly
height-balanced.
This limits the height of any red-black tree, ensuring that even in the
worst case, search, insert, and deletion are still efficient.
The balance property is guaranteed through the combination of

properties 3 and 4:
For a red–black tree T, let B be the number of black nodes
in property 4 (since every path has the same number of black
nodes). Let the shortest possible path from the root of T to any
leaf consist of B black nodes. Longer possible paths may be
constructed by inserting red nodes. However, property 3 makes it
impossible to insert more than 1 consecutive red node. Therefore,
ignoring any black NIL leaves, the longest possible path consists
of 2*B nodes, alternating black and red (this is the worst case).
The shortest possible path has all black nodes, and the longest
possible path alternates between red and black nodes. Since all
maximal paths have the same number of black nodes, by property
4, this shows that no path is more than twice as long as any other
path.
A red-black tree can be converted to a 2-4 tree by remembering that
red nodes are part of ("horizontally" linked within) a logical 2-4 B-tree
node and black nodes separate the logical 2-4 nodes with normal
vertical links:
So to convert a red-black tree to a 2-4 form, just move the red
nodes up so that they become horizontally linked with their
parent black node to form a logical 2-4 node.
All leaf nodes will be at the same depth.
See http://www.wikiwand.com/en/Red%E2%80%93black_tree for
an example.
Search and other read-only operations are the same as those used to
BST. However, insertion and deletion are much more complex, because
they may violate red-black properties and then must be fixed with
color changes and rotations.
These fixes are quite quick in practice. E.g., the color changes are amortized O(1).
A tree rotation is a binary tree operation that changes the

structure without affecting the order of the elements. That is, the
in-order traversal is the same pre- and post-rotation.
http://www.wikiwand.com/en/Tree_rotation
Essentially, a right rotation (the root moves right) on a root node
X has X become the right child of the new root, which will be X's
former left child.
Order in the descendants is maintained through realizing that the
right child of a left child of a root node can become the left child
of the root without violating any binary tree constraints.
Insertion and deletion have a number of cases that must be handled
Insertion and deletion have a number of cases that must be handled.
AVL trees:
http://www.wikiwand.com/en/AVL_tree
More rigidly balanced than red-black trees, leading to slower insertion

and removal, but faster search. So better for read-heavy use cases.
Critical property: the heights of the two child subtrees of any node
differ by at most one; if at any time they differ by more than one,
rebalancing (with tree rotations) is done to restore this property.
Retrieval, insertion, and deletion, as with red-black trees, are O(log n) in
the worst case.
A trie (or radix tree or prefix tree) is a n-ary tree variant in which characters are
stored at each node, such that each path down the tree may represent a word:
Unlike a binary search tree, no node in the tree stores the key associated
with that node; instead, its position in the tree defines the key with which it
is associated.
All the descendants of a node have a common prefix of the string associated
with that node, and the root is associated with the empty string.
https://www.wikiwand.com/en/Trie
Note that tries do not have to be keyed by character strings; other ordered
lists can work too, e.g., permutations on a list of digits.
Can just implement as a dictionary of (nested) dictionaries . Or of course
using classes and OO. E.g. for dicts:
root = {}
for word in strs:
node = root
for char in word:
if char not in node:
node[char] = {}
node = node[char]
# mark end of word
node[''] = 1
node = root
prefix = ''
# if there's more than 1 key, then divergence / not common ancestor
while len(node.keys()) == 1:
# if reached the end of any word, the LCA can't be longer
if '' in node: return prefix
char = list(node.keys())[0]
prefix += char
node = node[char]
return prefix
Links:
http://www.wikiwand.com/en/Tree_(data_structure)
this is a pretty good general reference:
http://www.cs.cmu.edu/~adamchik/15-121/lectures/Trees/trees.html.
Graphs:
The graph ADT comprises a set of vertices (or nodes) together with a set of pairs
of vertices, the edges. A vertex can be any arbitrary abstract object.
Graphs are an extremely powerful concept. Linked lists, trees, state transition
diagrams, etc. are all just cases of graphs.
Terminology:
Direction:
Directed graph (or digraph) = a graph where the edges have a
direction associated with them. I.e., each edge is now an ordered pair
of vertices.
In conceptual terms, direction implies that relations between vertices
are not symmetric. For example, a graph of friends would be
undirected, since X being friends with Y implies that Y is friends with X.
And on the other hand, a graph of Twitter followers would be directed,
since X being a follower of Y does not imply that Y is a follower of X.
Directed acyclic graph (or DAG) = a digraph with no (directed) cycles.
https://www.wikiwand.com/en/Directed_acyclic_graph
Many important applications, e.g., systems of events or partial

events, scheduling, causal structures.
A rooted tree, as in the ADT, is a DAG.
Workflowy is essentially a directed acyclic graph!
Topological sort (or topological ordering):
A topological sort of a digraph is just the linear ordering of
its vertices such that for every directed edge ab, a comes
somewhere before b.
Any DAG has at least one topological ordering.
The canonical case is scheduling: the vertices of the graph
may represent tasks to be performed, and the edges may
represent constraints that one task must be performed
before another (prerequisites). Thus, a topological ordering
is just a valid sequence for the tasks.
A reverse postorder traversal provides a topological
ordering.
Multiple edges & loops:
Multiple (or parallel) edges = 2 or more edges that connect the same 2
vertices.
Loop = an edge that connects a vertex to itself.
Multigraph = a graph that can have multiple edges and (depending on
the convention) sometimes loops.
Simple graph = a graph that cannot have multiple edges or loops.
Connectivity:
Connected graph = graph with no vertices by themselves; i.e., there is
some path between every pair of vertices / every pair of the graph's
vertices is connected.
A connected component is a maximal subgraph that is (1) connected
and (2) not connected to any vertices in the rest of the supergraph.
Subgraph of a graph G = graph whose vertices are a subset of G's
vertices and whose edges are a subset of G's edges. Note that
this is subset, not proper subset.
Tracking the connected components of a changing graph is a
straightforward partitioning problem, which can be naturally
solved with the application of disjoint-set (or union-find) data
structures.
https://www.wikiwand.com/en/Disjoint-set_data_structure
Maximal such subgraph because otherwise there would be

"overlapping" connected components. This constrains it such that
each vertex and each edge belongs to exactly one connected
component. Mathematically, connected components partition
their supergraph.
A digraph is strongly connected if every pair of vertices (x and y) has a
directed path both from x to y and from y to x. Otherwise, it is merely
(weakly) connected.
The strong components of a digraph are just the maximal
strongly connected subgraphs.
Kosaraju's algorithm and Tarjan's strongly connected components
algorithm are 2 popular ways to compute a digraph's strongly
connected components.
A vertex cut (or separating set) = the set of vertices whose removal
would make the graph disconnected.
Note that this is not the same as a cut.
Easy to compute connectivity: if the number of nodes visited through

either BFS or DFS equals the number of vertices, then the graph is
connected.
Weighted graphs have weights (or costs or distances) associated with their
d S 2 h
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE h h b f d ( hl h) 22/112
edges. So 2 paths can have the same number of edges (same path length),
but have different total weights (or distances) because their edges' weights
differ.
Cut:
A partition of a graph's vertices into two disjoint subsets.
Also defines a cut-set, the set of edges that have 1 endpoint in each
subset of the partition. I.e., the edges that cross the cut (also called a
crossing edge). Sometimes a cut is identified as its cut-set.
The size / weight of a cut is the (total) weight of the cut-set, the edges
crossing the cut. If unweighted, the size is the number of such crossing
edges.
Degree of a node is the number of edges connected to it. If directed graph,
there is an indegree and an outdegree.
A Hamiltonian path is a path that visits each vertex exactly once. Similarly, a
Hamiltonian cycle (or circuit) is a Hamiltonian path that is a cycle.
Determining whether such a path exists in a graph is known as the NP-
complete Hamiltonian path problem.
Coloring:
A graph coloring is just assigning colors to the nodes in a graph.
A legal coloring is when no two adjacent / connected nodes have the
same color.
Finding the fewest number of colors we can use for a legal coloring
(the chromatic number) is an NP problem.
Three primary implementation methods / data structures:
https://www.wikiwand.com/en/Graph_(abstract_data_type) and
http://www.cs.rochester.edu/~nelson/courses/csc_173/graphs/implementation.html
Adjacency list:
https://www.wikiwand.com/en/Adjacency_list
Each vertex is associated with an unordered list that contains its

neighbors.
More efficient than the adjacency matrix in returning a given vertex's
neighbors, O(1); less efficient in testing whether two vertices are
neighbors, O(|V|), because in the worst case the whole list of neighbors
has to be scanned.
|V| is the number of vertices.
Slow in removing a vertex / edge, because must find all vertices /

edges.
To represent weights, each node in the neighbor list could also contain
the weight.
In practice the simplest way is this; in Python, just use defaultdicts:
Graph is a dict of nodes where each node is mapped to a set of
its neighbors
its neighbors.
Weights is a dict where each origin and dest-node is a tuple that

is mapped to the weight val. (Or could do it where the set of
neighbors is a tuple of (neighbor, weight)).
But then the problem is I could have two edges between the
same nodes but with different weights. *Could* be desirable
though.
Code:
From https://leetcode.com/problems/evaluate-division/description/
# initialize graph where each node is mapped to a set of its

neighbors
# and each node 2-tuple is mapped to its result
G, W = defaultdict(set), defaultdict(float)
for (a, b), val in zip(equations, values):
G[a], G[b] = G[a].union({b}), G[b].union({a})
W[a, b], W[b, a] = val, 1.0 / val
Adjacency matrix:
https://www.wikiwand.com/en/Adjacency_matrix
A matrix where each non-diagonal entry Aij is the number of edges

from vertex i to vertex j, and the diagonal entry Aii, depending on the
convention, is either once or twice the number of edges (loops) from
vertex i to itself.
Undirected graphs often use the latter convention of counting loops twice,
whereas directed graphs typically use the former convention.
Sometimes the matrix value is just a boolean, if there can be no parallel

edges (i.e., the graph is a simple graph, not a multigraph).
In a graph without loops, the diagonal in the matrix will have all zero
entries.
Adjacency matrices are space-efficient, because each matrix entry only
requires one bit. However, for a sparse graph (few edges), adjacency
lists use less space because they do not represent nonexistent edges.
More efficient than the adjacency list in determining whether two
vertices are neighbors, O(1); less efficient in getting all of a given
vertex's neighbors because the entire row must be scanned, O(|V|).
Slow to add or remove vertices, because the matrix must be resized /
copied.
To represent weights, the matrix value could be the weight of that
edge.
The third (less common but very simple) implementation method is just
objects and pointers:
Naturally, Nodes / Vertex[es] can be represented as objects, with
i
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE i hb d If li f h i i k i 24/112
pointers to neighbor nodes. If a list of those pointers is kept, it
becomes very similar to an adjacency list

(http://stackoverflow.com/questions/5886274/comparing-object-
graph-representation-to-adjacency-list-and-matrix-representatio).
Also, Edges are another natural object that can be used.
Graph traversals:
http://www.wikiwand.com/en/Graph_traversal and for implementations in Python
http://eddmann.com/posts/depth-first-search-and-breadth-first-search-in-python/
Depth-first search (DFS):

Depth-first search is like walking through a corn maze. You explore one
path, hit a dead end, and go back and try a different one.
DFS visits the child nodes before visiting the sibling nodes; i.e., it
traverses the depth of any particular path before exploring its breadth.
Pro: usually less memory than BFS and easily implemented with
recursion.
Con: doesn't necessarily get shortest path.
Pseudo-pseudocode:
Visit node r and then iterate through each of r's unvisited
neighbors.
When visiting a neighbor n, immediately visit all of his neighbors.
And so on.
Thus depth-first. So neighbor n and its children are exhaustively searched
before r moves on to its other adjacent nodes.
If no neighbors, i.e., a dead end, backtrack (pop the stack) until

reach unvisited node.
Python code:
http://eddmann.com/posts/depth-first-search-and-breadth-first-search-in-
python/
Adjacency list implementation:

graph = {'A': set(['B', 'C']),
'B': set(['A', 'D', 'E']),
'C': set(['A', 'F']),
'D': set(['B']),
'E': set(['B', 'F']),
'F': set(['C', 'E'])}
Iterative:
def dfs(graph, start):
visited, stack = set(), [start]
while stack:
vertex = stack.pop()
if vertex not in visited:
visited.add(vertex)
stack.extend(graph[vertex] - visited)
return visited
Recursive:
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
for next in graph[start] - visited:
dfs(graph, next, visited)
return visited
Easy to iteratively implement with a stack. Just make sure to "mark"
visited nodes to prevent infinite loops.
Also easy to recursively implement (also via a stack, i.e., the program's
call stack).
Breadth-first search (BFS):
Breadth-first search is like throwing a stone in the center of a pond.
The nodes you explore "ripple out" from the starting point.
BFS explores the neighbor nodes first, before moving to the next
"level."
Pro: will always find the shortest path.
Con: usually requires more memory than DFS.
Pseudo-pseudocode:
Visit node r and then iterate through each of r's unvisited
neighbors.
When visiting a neighbor n, add its neighbors to queue, but don't
visit them yet.
Visit all of node r's neighbors before visiting any of r's "
(grand)children."
Python code:
http://eddmann.com/posts/depth-first-search-and-breadth-first-search-in-
python/
Adjacency list implementation:

graph = {'A': set(['B', 'C']),
'B': set(['A', 'D', 'E']),
'C': set(['A', 'F']),
'D': set(['B']),
'E': set(['B', 'F']),
'F': set(['C', 'E'])}

Iterative:
def bfs(graph, start):
visited, queue = set(), [start]
while queue:
vertex = queue.pop(0)
if vertex not in visited:
visited.add(vertex)
queue.extend(graph[vertex] - visited)
return visited
Implement iteratively with a queue and mark visited as always.
"Remember that breadth-first uses a queue and depth-first uses
a stack (could be the call stack or an actual stack object). That's not just a
clue about implementation, it also helps with figuring out the differences in
behavior. Those differences come from whether we visit nodes in the order
we see them (first in, first out) or we visit the last-seen node first (last in, first
out)."
DFS is the easiest if want to visit every node.
But BFS can get shortest path(s) first.
Both traversals (for the entire graph) take time linear to the size of the graph
(number of vertices + number of edges), O(|V| + |E|) and space linear to the
number of vertices.
In other words: runtime is O(N + M) where M is the number of edges
(we check each neighbor twice per edge).
Note that just as trees are a special case of graphs, tree traversals are a
special case of graph traversals.
Two common graph problems:
Shortest-path tree (or single-source shortest paths (SSSP) problem):
A shortest-path tree rooted at vertex v is a spanning tree T of the graph
G such that the path distance from v to any other vertex u in the tree is
the shortest path from v to u in G. I.e., the shortest-path tree rooted at
v is the tree that comprises the shortest paths from v to every node.
A spanning tree of G is a tree (a connected undirected graph with
no cycles) T such that (1) every vertex in G belongs to it and (2)
every edge in T belongs to G.
Must be a spanning tree because otherwise it wouldn't actually be
a tree of paths from the root node to all nodes.
Simpler, related version is finding the shortest path from one node to
another. But finding the shortest path from the former to all nodes
takes the same amount of time.
In an unweighted graph or a graph in which all edge weights are 1, the

shortest-path tree is isomorphic to the BFS tree.
Be careful to remember that edge lengths or distances are the same as
edge weights. So if the graph is a weighted one, the path distance from
vertex a to b is actually the sum of the edge weights of that path.
If the weights are 1 or nonexistent, then the path distance is just
the number of edges from 1 node to another.
Algorithms:
Dijkstra's algorithm:
https://www.wikiwand.com/en/Dijkstra's_algorithm
Fastest shortest-path tree / single-source shortest paths

algorithm for [di]graphs with nonnegative edge weights.
Pseudo-pseudocode:
http://www.egr.unlv.edu/~larmore/Courses/CSC269/pathing
The distance of node Y is the distance from the root

node to Y. First assign some initial distance values and
then seek to greedily improve them step by step.
1) Assign to every node an initial tentative distance
value: set it to 0 for our root node and to infinity
(float('inf')) for all other nodes.
2) Assign the root node as current. Create an unvisited
node set and put all nodes in it.
3) For the current node c, consider all of its unvisited
neighbors (e.g., d) and calculate their tentative
distances: tentativeDist(d) = tentativeDist(c) +
c.getEdgeLength(d). Compare the newly calculated
tentative distance to the current assigned value and
assign the smaller one. For example, if the current
node A is marked with a distance of 6, and the edge
connecting it with a neighbor B has length 2, then the
distance to B (through A) will be 6 + 2 = 8. If B was
previously marked with a distance greater than 8 then
change it to 8. Otherwise, keep the current value.
This process is called relaxation; we are relaxing
the edges by trying to find if the tentative
distances of c's neighbors could be improved by
diverting the path through c.
4) When we are done considering / updating all of the
neighbors of the current node, mark the current node
as visited and remove it from the unvisited set. A
visited node will never be checked again and it is now
g
in the shortest-path tree. Its tentative distance value is

the final distance value. I.e., it is now the actual
distance or cost of the shortest path from the source
root node to the current just-visited node.
5) The algorithm is done if the destination node has
been marked visited (for the simple case where we are
calculating shortest path between 2 nodes) or if the
unvisited set is empty or if the smallest tentative
distance among the nodes in the unvisited set is
infinity (occurs when there is no connection between
the initial node and remaining unvisited nodes).
6) Select the unvisited node that is marked with the
smallest tentative distance, and set it as the new
current node, then go back to step 3.
Implementation:
"The simplest implementation of the Dijkstra's
algorithm stores vertices of set Q in an ordinary linked
list or array, and operation Extract-Min(Q) is simply a
linear search through all vertices in Q. In this case, the
running time is O(V^2).
For sparse graphs, that is, graphs with much less than
V^2 edges, Dijkstra's algorithm can be implemented
more efficiently by storing the graph in the form of
adjacency lists and using a binary heap or Fibonacci
heap as a priority queue to implement the Extract-Min
function. With a binary heap, the algorithm requires
O((E+V)lgV) time, and the Fibonacci heap improves
this to O(E + VlgV)."
Pretty understandable Python implementation:
http://stackoverflow.com/questions/22897209/dijkstra
s-algorithm-in-python
Specifically the prof answer with helper functions.
More complicated but better:
https://gist.github.com/econchick/4666413
Need to have a predecessor or previous-like reference
for each node to actually build a tree. Otherwise, just
returns each node and its distance from the root node.
Or could build path as go through algorithm, as in the
gist implementation above.
Re: predecessors, getting a predecessor subgraph is a
pretty nice way to form the actual (shortest-path) tree:
The edges in such a subgraph are (preds[v],v),

where preds is the array of predecessors for each
vertex.
In the algorithm, whenever relaxation happens,
just add the correct predecessor. E.g., if vertex u's
distance value can be improved by going
through w, then preds[u] = w.
Then can form shortest-path tree easily at the
end.
https://www.cs.auckland.ac.nz/software/AlgAnim/
dijkstra.html
With a min-priority queue to get the unvisited node with the
smallest tentative distance, can run in O( |E| + |V| log |V| ).
Not capable of handling negative weights (Bellman-Ford
can, however).
Known as a greedy algorithm because at each stage, it
greedily selects the unvisited minimum-weight node.
Supposed to be for directed graphs.
Dijkstra's is a special case of A*.
A* (A star) is an extended version of Dijkstra's that uses
heuristics to guide its search and be a best-first search.
http://www.wikiwand.com/en/A*_search_algorithm
Dijkstra's is just A* where the heuristic / evaluation

function for each node is null.
Bellman-Ford algorithm:
http://www.wikiwand.com/en/Bellman%E2%80%93Ford_algorithm
Slower than Dijkstra's but does handle negative edge

weights. And can report the existence of negative cycles.
A negative cycle means that there is no shortest or cheapest path,
because any presumably shortest path could be made shorter by
going around the cycle again.
Similar to Dijkstra's in that it depends on relaxation: each

node's distance starts out as an overestimate and is
iteratively improved through edge relaxation (diverting the
old path through a new node to improve the distance).
Different than Dijkstra's in that at each step the relaxation
process is done on all edges at once, instead of just on the
outgoing edges from some current node. Thus, this
relaxation is done |V|-1 times.
Implementation:
"Simply put, the algorithm initializes the distance to

the source to 0 and all other nodes to infinity. Then for
all edges, if the distance to the destination can be
shortened by taking the edge, the distance is updated
to the new lower value. At each iteration i that the
edges are scanned, the algorithm finds all shortest
paths of at most length i edges. Since the longest
possible path without a cycle can be |V|-1 edges, the
edges must be scanned |V|-1 times to ensure the
shortest path has been found for all nodes. A final scan
of all the edges is performed and if any distance is
updated, then a path of length |V| edges has been
found which can only occur if at least one negative
cycle exists in the graph."
Minimum spanning tree (MST):
https://www.wikiwand.com/en/Minimum_spanning_tree and
http://algs4.cs.princeton.edu/43mst/
A minimum spanning tree is a spanning tree (a subgraph that is a tree

and that connects all the vertices) that has the lowest total weight.
Properties:
If all edge weights are the same, then every spanning tree is a
MST.
On the contrary, if all edge weights are distinct, then there is 1
unique MST.
There are |V| - 1 edges in a spanning tree.
If unconnected graph, can still use MST algorithms to find the
MST for each connected component, together forming a
minimum spanning forest.
If weights are positive (don't have to be), then the MST is just the
subgraph with minimal weight that connects all the vertices. I.e.,
don't have to say it's a tree (which has no cycles), since those
subgraphs which have cycles are necessarily greater weight.
Algorithms:
Note that:
Cuts are very important, specifically, the cut property (the
basis of the algorithms): Given any cut in an edge-weighted
graph (with distinct weights), the crossing edge of minimum
weight is in the MST of the graph.
Simple greedy algorithm:
The following method colors black all edges in the the MST
of any connected edge-weighted graph with V vertices:
Starting with all edges colored gray, find a cut with no black
edges, color its minimum-weight edge black, and continue
until V-1 edges have been colored black.
Prim's algorithm:
Prim's algorithm works by attaching a new edge to a single
growing tree at each step: Start with any vertex as a single-
vertex tree; then add V-1 edges to it, always taking next
(coloring black) the minimum-weight edge that connects a
vertex on the tree to a vertex not yet on the tree (a crossing
edge for the cut defined by tree vertices).
This is a relatively simple method, but the crucial step is
efficiently detecting the minimum-weight crossing edge to
be added at each step:
Lazy way:
Just maintain a priority queue (PQ) of the
crossing edges (priority is obviously by minimum
weight). Each time a crossing edge is added to
the MST, a vertex is also added. The new crossing
edges to be added to the PQ are just the edges
from the new vertex to all non-MST vertices.
However, there might be old edges in the PQ
which are now ineligible (i.e., they are non-
crossing and now connect two MST vertices). This
way is lazy since it leaves these edges in the PQ.
Eager way:
The first improvement is to delete ineligible
edges from the PQ, such that only crossing edges
from the growing MST to the rest of the graph
are present. But we can do even better. Since (1)
we are only concerned with the minimal crossing
edge and since (2) when we add a new edge /
vertex v to the MST, the only possible
improvement for each non-tree vertex w is that
adding v brings w closer to the tree, we really just
need to keep track of the minimum-weight edge
and check whether the addition of v to the tree
necessitates that we update that minimum
(because of an edge v-w that has lower weight),
which we can do as we process each edge In
which we can do as we process each edge. In
other words, we maintain on the priority queue

just one edge for each non-tree vertex: the
shortest edge that connects it to the tree.
Linear time for a graph is O(N + M) where N is the number of nodes and M is the
number of edges.
E.g., it's the runtime for BFS *and* DFS.
Edge cases in graph problems are nodes with no edges, cycles, and loops (single-
node cycles).
DAGs and topological sorts are important concepts to know:
Topological sort is relatively easy to implement, just remember the definition
(for vertices u, v where u has a directed edge to v, u has to come before v in
the topological ordering) and use a stack.
Start at a given vertex, add to visited set, recursively (depth-first) visit and
explore unvisited neighbors, when have fully explored v's neighbors add v to
stack, and pop all at end for topologically sorted ordering.
Also think of it as a modification of DFS:
"We can modify DFS to find Topological Sorting of a graph. In DFS, we
start from a vertex, we first print it and then recursively call DFS for its
adjacent vertices. In topological sorting, we use a temporary stack. We
don’t print the vertex immediately, we first recursively call topological
sorting for all its adjacent vertices, then push it to a stack. Finally, print
contents of stack. Note that a vertex is pushed to stack only when all of
its adjacent vertices (and their adjacent vertices and so on) are already
in stack."
Sorting and Searching:
Binary search is indispensable and also surprisingly tricky to get right:
https://leetcode.com/problems/search-insert-position/description/
consider l and r to be walls (i.e., exclusive not inclusive) around my target
value
that way can just set l or r to be the guess/median value
also to get the median, do floor + (distance between divided by two)
def searchInsert(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: int
"""
l, r = -1, len(nums)
while l + 1 < r:
m = l + ((r - l) / 2)
curr = nums[m]
if target == curr:
return m
elif target > curr:
l=m
else:
r=m
return l + 1
could also do:
while l <= r:
m = l + (r - l) / 2
curr = nums[n]
if target == curr:
return m
elif target > curr:
l=m+1
else:
r=m-1
return -1
Bubble sort and selection sort suck. Both O(n^2).
Merge sort and quick sort are both good and, in the average case, O(n Log(n)).
Merge sort:
Worse case is O(n Log(n)).
From Algorithms class:
like quicksort in that it recursively splits array and builds it back in right
order (both also divide-and-conquer)
divide array into 2 halves, recursively sort halves, merge results
recursive calls then sorting action (contra quicksort); "on the way up";
only way sorting occurs is through these "on the way up" merges
pro: guaranteed N log N performance
con: extra space proportional to N (aux array for merging)
time efficiency: average, best, and worst are all O(N log N); does
require some aux space
All the work happens in the merging where consider both arrays and pick
the smallest element to copy to main array each time. Base case is merging
two 1-element arrays (or null arrays) because these are sorted by definition.
Simplified pseudocode:
if len(list) < 2: return list
sorted_first = mergesort(first_half)
sorted_second = mergesort(second_half)
return merge(sorted_first, sorted_second)
Most important part is merging the sorted lists. For this, just add smallest
element each time until one of the lists is empty, and then copy any
remaining elements over.
Usually stable.
Quicksort:
Worse case is O(n^2) when array already sorted, but small constant factor
and can be in-place. Can also mitigate / check for the worst degenerate case.
From Algorithms class:
array rearranged such that, when two subarrays sorted, the whole array
is ordered (contra merge where array is broken into 2 subarrays to be
sorted and then combined to make whole ordered array)
recursive calls happen after working on whole array
partition/pivot not necessarily in middle
Or necessarily the median value, leading to the worst case performance.
improvements: insertion sort for tiny arrays, median of 3, randomize

beforehand
pros: average case N log N, in-place so small usage of memory (small
aux stack), shorter inner loop so fast in practice as well
cons: worst case is quadratic, happens with already sorted array (where
pivot = first item)
time efficiency: average & best are O(N log N), worst is O(N^2), small
constant factor
To partition, have two pointers at each end working toward each other. Stop
pointers when each reaches a value out of place. Swap them.
This results in all the smaller (than some x) values on the left side and
all the larger values on the right side—though each side is not itself
sorted yet.
Simplified pseudocode:
pivot = partition(list)
quicksort(list[:pivot])
quicksort(list[pivot:])
Usually not stable.
A good resource is here: #link
http://www.algolist.net/Algorithms/Sorting/Quicksort
And also of course visualgo: http://visualgo net/
And also, of course, visualgo: http://visualgo.net/
Difference is that in mergesort, the sorting action happens on the way up (when
ordered subarrays are merged), whereas in quicksort, the sorting happens on the
way down when the (sub)array is split and the low and high elements are
separated and the pivot element is placed in its correct final position.
"The difference between the two is basically which linear operation they
leverage. QuickSort is efficient because you can partition a list into items that
come before and items that come after a given item in linear time.
MergeSort is efficient because given two already sorted lists you can
combine the two, and maintain sorted order, in linear time."
Dynamic Programming & Memoization:
Top-down recursion can be memory-intensive because of building up the call
stack. Can avoid by going bottom-up and using DP.
Current point.
DP often involves making a binary decision at each (bottom-up) step, e.g., do I

include this coin / item / character in my knapsack / sum / subsequence. If I do
include it, is that more optimal (value or stock market profit or so on) than if I
don't, where each calculation usually makes use of previously calculated optima or
solutions to subproblems.
To get those previous locally optimal calculations, we generally use a matrix
or list to store the previous solutions. But sometimes that can store more
state than we really need, see e.g. the bottom-up Fibonacci implementation
where we can store the answers to the previous subproblems we need in just
2 variables.
Always think: at step n, what happens if I *do* include this thing? What
happens if I don't? What do I need to compare and take the max of? And
how do I use my answer to step n-1?
Interview Cake:
"Dynamic programming is kind of like the next step up from greedy . You're
taking that idea of "keeping track of what we need in order to update the
best answer so far," and applying it to situations where the new best answer
so far might not just have to do with the previous answer, but
some other earlier answer as well.
So as you can see in this problem, we kept track of all of our previous
answers to smaller versions of the problem (called "subproblems") in a
big list called ways_of_doing_n_cents.
Again, same idea of keeping track of what we need in order to update the
answer as we go, like we did when storing the max product of 2, min
product of 2, etc in the highest product of 3 question. Except now the thing
we need to keep track of is all our previous answers, which we're keeping
in a list.
We built that list bottom-up, but we also talked about how we could do it

top-down and memoize. Going bottom-up is cleaner and usually more
efficient, but often it's easier to think of the top-down version first and try to
adapt from there."
When we memoize (cache a subproblem's answer / recursive function's
output so we don't need to recompute it again), the memoization dictionary
probably needs to be outside the function to save/share state. I.e., make a
class. Interview Cake #5:
https://www.interviewcake.com/question/python/coin. Don't always have to,
can sometimes just pass a default list or dict (be careful of the default
mutable gotcha here).
Memoization is top-down depth-first and DP is bottom-up breadth-first:
http://stackoverflow.com/questions/27609246/if-memoization-is-top-down-
depth-first-and-dp-is-bottom-up-breadth-first-what?rq=1
http://blog.racket-lang.org/2012/08/dynamic-programming-versus-
memoization.html
Tushar Roy on the 0-1 knapsack problem: https://www.youtube.com/watch?
v=8LusJS5-AGo
Advanced Algorithms & Data Structures & Math:
Incomplete.
Dynamic Programming overview:

General approach from recursion:
Find the recurrence relation.
Start with top-down recursion, then use a cache to memoize.
Then to save space, move to bottom-up DP with a table.
And then finally, try to use O(1) space with just variables.
https://leetcode.com/problems/house-robber/discuss/156523/From-
good-to-great.-How-to-approach-most-of-DP-problems.
Another one demonstrating the evolution:
https://leetcode.com/problems/decode-ways/discuss/30451/Evolve-
from-recursion-to-dp
Emblematic problems:
Easy:
https://leetcode.com/problems/house-robber/description/
code:
def rob(self, nums):
prev1 = prev2 = 0
for n in nums:
prev1, prev2 = max(prev2 + n, prev1), prev1
return prev1
Medium:
https://leetcode.com/problems/decode-ways/description/
code:
def numDecodings(self, s):
# dp[i] =
# dp[i-1] if dp[i] != 0
# + dp[i-2] if 9 < dp[i-2:i] < 27
if not s or s[0] == '0': return 0
pre = cur = 1
for i, c in enumerate(s[1:]):
if c == '0': cur = 0
if 9 < int(s[i:i+2]) < 27:
cur = cur + pre
pre = cur - pre
else: pre = cur
return cur
Knuth-Morris-Pratt:
the key is that if we have a prefix == suffix match in the substring up to the
pattern[i] where pattern[i] stopped matching with text[j], then the suffix suf
must also be the last characters we saw in text, text[ j-len(suf) : j ]; and since
this substring suf is equal to some prefix pre, we can slide the pattern over,
match up pre with text[ j-len(suf) : j ], and start trying to match again at
pattern[ len(p) ] and text[j]
https://www.wikiwand.com/en/Knuth%E2%80%93Morris%E2%80%93Pratt_al
gorithm
https://www.youtube.com/watch?v=GTJr8OvyEVQ
Bloom filters:
http://prakhar.me/articles/bloom-filters-for-dummies/
Tells you either that a value is possibly in the set or that it's definitely not in
the set.
I.e., some false positives, but guaranteed no false negatives. Probabilistic
data structure.
Trie and suffix trees:
Union-find / disjoint-set:
Network connectivity, connected components
Basic idea is that each distinct subset is identified by its root, so if two
elements have the same root, they're in the same set; and to merge, just
need to make one subset's root the other subset's
https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
Useful for questions where I need to merge disjoint sets or network-like
problems etc. like https://leetcode.com/problems/number-of-islands-
ii/description/:
clear answer: https://leetcode.com/problems/number-of-islands-
ii/discuss/75459/JavaPython-clear-solution-with-UnionFind-Class-
(Weighting-and-Path-compression)
https://leetcode.com/problems/number-of-islands-
ii/discuss/75468/Compact-Python.
also https://leetcode.com/problems/friend-circles/submissions/
Union-find template code:
class UnionFind(object):
def __init__(self, n):
self.count = n
self.parent = [_ for _ in xrange(n)]
self.rank = [1 for _ in xrange(n)]
def find(self, k):
while k != self.parent[k]:
self.parent[k] = self.parent[self.parent[k]]
k = self.parent[k]
return k
def union(self, k1, k2):
k1, k2 = self.find(k1), self.find(k2)
if k1 == k2: return
# ensure k1 is smaller component
if self.rank[k1] > self.rank[k2]:
k1, k2 = k2, k1
self.parent[k1] = k2
self.rank[k2] += self.rank[k1]
self.count -= 1
A*:
LRU/LFU cache:
Least recently used item is invalidated / evicted when cache is full.
https://www.kunxi.org/blog/2014/05/lru-cache-in-python/
The idea here is to use an OrderedDict and when returning an item, pop and
re-insert to maintain the order invariant. popitem(last=False) lets us pop off
the tail, i.e., in FIFO order.
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = OrderedDict()
def get(self, key):
if key not in self.cache: return -1
val = self.cache.pop(key)
self.cache[key] = val
return val
def set(self, key, value):
try:
self.cache.pop(key)
except KeyError:
if len(self.cache) == self.capacity:
self.cache.popitem(last=False)
self.cache[key] = value
Without OrderedDict, would have to create my own. Basically would have a
normal dictionary and then a doubly linked list. The dictionary would map
each key to its node; there would be dummy head and tail pointers; and I'd
add to the tail.prev and remove/invalidate from the head.next:
https://leetcode.com/problems/lru-cache/discuss/45926/Python-Dict-
+-Double-LinkedList
like this:
class LLNode(object):
def __init__(self, key=0, val=0):
self.key = key
self.val = val
self.prev = None
self.next = None
class LRUCache(object):
# dict mapping key to LLNode
self.cache = {}
# dummy nodes to simplify
# head -next-> node <-prev- tail

# remove from head, add to tail
self.head = LLNode()
self.tail = LLNode()
self.head.next = self.tail
self.tail.prev = self.head
self.capacity = capacity
def get(self, key):
if key not in self.cache: return -1
node = self.cache[key]
new_n = LLNode(key, node.val)
self._remove(node)
self._add(new_n)
self.cache[key] = new_n
return new_n.val
def put(self, key, value):
if key in self.cache:
self._remove(self.cache[key])
elif len(self.cache) >= self.capacity:
to_remove = self.head.next
self.cache.pop(to_remove.key)
self._remove(to_remove)
node = LLNode(key, value)
self._add(node)
self.cache[key] = node
def _remove(self, node):
p, n = node.prev, node.next
p.next = n
n.prev = p
def _add(self, node):
p = self.tail.prev
p.next = node
node.prev = p
node.next = self.tail
self.tail.prev = node
harder version is LFUCache where least frequently used item is evicted (so
need to keep track of frequency / # of accesses) *and* ties are broken by
need to keep track of frequency / # of accesses) and ties are broken by
least recent too

https://leetcode.com/problems/lfu-cache/description/
use two dicts, one of which is an OrderedDict

from collections import defaultdict, OrderedDict
class LFUNode(object):
def __init__(self, val, freq=1):
self.val = val
self.freq = freq
class LFUCache(object):
self.remain = capacity
# need OrderedDict to break freq ties w LRU
self.freq2node = defaultdict(OrderedDict)
self.least_freq = 1
self.key2node = defaultdict(LFUNode)
def _update(self, key, val=None):
node = self.key2node[key]
if not val: val = node.val
freq = node.freq
self.freq2node[freq].pop(key)
if not len(self.freq2node[self.least_freq]):
self.least_freq += 1
node.val, node.freq = val, freq + 1
self.freq2node[freq+1][key] = node
self.key2node[key] = node
def get(self, key):
if key not in self.key2node: return -1
self._update(key)
return self.key2node[key].val
def put(self, key, value):
if key in self.key2node:
self._update(key, value)
return
self.key2node[key] = LFUNode(value)
self.freq2node[1][key] = LFUNode(value)
if not self.remain:
# pop FIFO
evicted =
self.freq2node[self.least_freq].popitem(last=False)
self.key2node.pop(evicted[0])
else: self.remain -= 1
self.least_freq = 1
Minimum edit distance:
https://www.youtube.com/watch?v=We3YDTzNXEk
Key ideas are that the edit distance of two strings x, y has to be least the edit
distance of their prefixes, e.g. x[:-1], y; insertion and deletion are
"symmetric"; if the current characters are the same, the edit distance at that
point is just the edit distance of the two strings minus the current character;
if not, they're the min of the diagonal, previous column and previous row
edit distances + 1 (to represent the operation itself)
i.e., assume I did delete, so I look at the previous column for that edit
distance, then add 1 for the current deletion; if I replaced, then it's the
diagonal plus 1 etc.
Fisher-Yates shuffle:
import random
def inplace_shuffle(lst):
n = len(lst)
if n < 2: return lst
for i in xrange(n - 1):
repl = random.randint(i, n - 1)
if i != repl: lst[i], lst[repl] = lst[repl], lst[i]
The naive algorithm of swapping every number with any other number fails
because the number of possible orderings it outputs is n^n, but the number
of actual distinct possible orderings is n!, so by the pigeonhole principle,
some distinct permutations / "buckets" will be overrepresented.
Dijkstra's:
Boyer-Moore majority vote algorithm:
O(n) time and O(1) space algo for finding majority element if it exists
used in LC #277 Find the Celebrity https://leetcode.com/problems/find-the-
celebrity/description/
https://www.wikiwand.com/en/Boyer%E2%80%93Moore_majority_vote_algor
ithm
Also in https://leetcode.com/problems/majority-element/description/
Basically have a candidate majority element and a count, increment the
t if
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE
did t if t i 0 th i did t l 43/112
count if curr == candidate, if count is 0 then curr is new candidate, else
decrement count.
Reverse a number (integer):
I.e., how to turn a number into its reverse without converting to a string.
Basically use mod (%) and integer division (//) to find each digit and
construct the reverse number.
def reverse(num):
rev = 0
while num:
rev *= 10
rev += num % 10
num /= 10
return num
E.g. useful for testing numerical palindromes without converting to a string.
Kadane's algorithm (maximum subarray problem):
https://www.wikiwand.com/en/Maximum_subarray_problem
def max_subarray(A):
max_ending_here = max_so_far = A[0]
for x in A[1:]:
max_ending_here = max(x, max_ending_here + x)
max_so_far = max(max_so_far, max_ending_here)
return max_so_far
Sieve of Eratosthenes:
https://www.wikiwand.com/en/Sieve_of_Eratosthenes
https://leetcode.com/problems/count-primes/description/
def countPrimes(self, n):
if n < 2: return 0
arr = [1] * n
# from 2 to ceil(sqrt of n)
for i in xrange(2, int(n ** 0.5) + 1):
# if prime
if arr[i]:
# increment by i starting at i squared
for j in xrange(i ** 2, n, i):
arr[j] = 0
count = sum([1 for x in arr if x])
return count - 2
Heaps:
Heaps:
https://docs.python.org/2/library/heapq.html
Heaps can be very useful if I need to maintain some kind of min/max value,
but it needs to be dynamically updated over time among other possible
previous values. Notably of course, the invariant just applies to the start/min
of the list; the rest of the list remains unsorted.
Basically just initialize heap as a normal list (or use in-place heapify) and then
use heaq.heappush(arr, x), heappop, and heapreplace to push, pop, and
replace an item while maintaining min-heap invariant.
Meeting room variant where getting fewest # of rooms needed:
https://leetcode.com/problems/meeting-rooms-ii/description/
intervals.sort(key=lambda x: x.start)
# keep track of min end time
heap = []
for n in intervals:
# heap[0] is always the min end time
# this means we can reuse this room but just need to update end
if heap and n.start >= heap[0]:
heapreplace(heap, n.end)
# otherwise we need to allocate a new room
else:
heappush(heap, n.end)
return len(heap)
Common hard problem of merging k sorted (linked) lists (O(n k log(k)):
https://leetcode.com/problems/merge-k-sorted-lists/description/
dummy = curr = ListNode(0)

h = [(n.val, n) for n in lists if n]
heapify(h)
while h:
# don't pop yet, only change heap size if nec
val, n = h[0]
if not n.next:
heappop(h)
else:
# insert new val, n tuple while keeping invar
heapreplace(h, (n.next.val, n.next))
curr.next = n
curr = curr.next
return dummy.next
Backtracking:
g
Backtracking is a powerful technique. It allows us to try all possible

configurations/solutions, but in an optimal and incremental order, so that we
can eliminate wholesale bad approaches. Basically we generate partial
solutions at each iteration: if it passes the constraint check, we recur deeper
(DFS style); if not, we continue / "prune" that entire branch and backtrack up
the search tree to an unexplored branch. We generate new solutions through
a loop breadth-wise and explore them further through DFS recurrence. The n
queens problem is a typical example.
Importantly, to differentiate from DFS, it's not enough to just continue, you
often need to actively and selectively prune. For example, just having a
visited set might not be enough, you might need to remove from the set
after you've tried out that path (recurred deeper). See the word search
problem.
For board/matrix problems, another way of doing it would be
modifying the board itself. I.e., mark the board spot with a special
character, DFS/recur deeper, then reset the board spot to original
value.
Good summary of backtracking pattern/approach for
subsets/permutations/etc. questions:
https://leetcode.com/problems/permutations/discuss/18239/A-
general-approach-to-backtracking-questions-in-Java-(Subsets-
Permutations-Combination-Sum-Palindrome-Partioning)
Choose, (recursively) explore, un-choose (backtrack!)
https://www.youtube.com/watch?v=78t_yHuGg-0
Word search:
https://leetcode.com/problems/word-search/description/
def exist(self, board, word):

for i, row in enumerate(board):
for j, cell in enumerate(row):
if cell != word[0]: continue
visited = set()
stk = [(i, j, 0, False)]
while stk:
x, y, word_i, backtrack = stk.pop()
# prune branch
if backtrack:
visited.remove((x, y))
continue
if board[x][y] != word[word_i]: continue
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE f d l d 46/112
if word_i + 1 == len(word): return True
visited.add((x, y))
# add backtracking node before forward nodes
stk.append((x, y, word_i, True))
cands = []
for a, b in [(-1,0), (0,-1), (0,1), (1,0)]:
xa, yb = x + a, y + b
if (xa < 0 or xa >= len(board) or yb < 0 or
yb >= len(board[0])
or (xa, yb) in visited or board[xa][yb] !=
word[word_i+1]):
continue
stk.append((xa, yb, word_i+1, False))
return False
N queens:
https://www.geeksforgeeks.org/n-queen-problem-backtracking-3/
http://www.thecodenote.com/2017/05/beginners-guide-to-solving-n-
queens.html
code:
def isSafe(board, row, col):
# Check this row on left side
for i in range(col):
if board[row][i] == 1:
return False
# Check upper diagonal on left side
for i,j in zip(range(row,-1,-1), range(col,-1,-1)):
if board[i][j] == 1:
return False
# Check lower diagonal on left side
for i,j in zip(range(row,N,1), range(col,-1,-1)):
if board[i][j] == 1:
return False
return True
def solveNQUtil(board, col):
# base case: If all queens are placed
# then return true
if col >= N:
return True
# Consider this column and try placing

# this queen in all rows one by one
for i in range(N):
if isSafe(board, i, col):
# Place this queen in board[i][col]
board[i][col] = 1
# recur to place rest of the queens
if solveNQUtil(board, col+1) == True:
return True
# If placing queen in board[i][col
# doesn't lead to a solution, then
# queen from board[i][col]
board[i][col] = 0
# if the queen can not be placed in any row in
# this column col then return false
return False
Android unlock patterns:
https://leetcode.com/submissions/detail/175474501/
code:
def numberOfPatterns(self, m, n):
# keep track of used keys and also can use math to figure
out allowed jumps
flags = [False] * 10
# for index i + 1, disallowed 1-step moves
illegal = [(3, 7, 9), (8,), (1, 7, 9), (6,), (-1,), (4,), (1, 3, 9), (2,), (1,
3, 7)]
def dfs(prev, level):
# base case
if level == 1: return 1
num_patterns, flags[prev] = 0, True
for i in xrange(1, 10):
# if satisfy constraint, recur deeper
if not flags[i] and (i not in illegal[prev - 1] or
flags[(i + prev) / 2]):
num_patterns += dfs(i, level - 1)
# otherwise continue / abandon branch (ie all
solutions w this key as a successor)
flags[prev] = False
return num_patterns
# many keys are symmetric
return sum([dfs(1, i) * 4 + dfs(2, i) * 4 + dfs(5, i) for i in
xrange(m, n + 1)])
DFS in detail:
Very common in island or flood fill-type problems. Make sure that I'm
resetting the node or point to avoid infinite loops. I.e., once something
passes the constraint, make sure I mark it in some way (could also just be a
visited set or flags[i] boolean arr).
https://leetcode.com/problems/flood-fill/description/
code:
def floodFill(self, image, sr, sc, newColor):
def dfs(x, y, prev_color):
# make sure we're within bounds
if x < 0 or x >= len(image) or y < 0 or y >=
len(image[0]):
return
curr_color = image[x][y]
# make sure we don't loop indefinitely
if image[x][y] != prev_color or image[x][y] ==
newColor:
return
# fill ..
image[x][y] = newColor
# .. recursively
dfs(x - 1, y, prev_color)
dfs(x, y - 1, prev_color)
dfs(x, y + 1, prev_color)
dfs(x + 1, y, prev_color)
dfs(sr, sc, image[sr][sc])
return image
For island problems where I have to count, make sure I'm not double-
counting. Be careful with how I'm updating the return value. Either pass in
the value each time or just add incrementally, but not both.
https://leetcode.com/problems/max-area-of-island/description/
code:
def maxAreaOfIsland(self grid):
def maxAreaOfIsland(self, grid):
def dfs(i, j):

if i < 0 or i >= len(grid) or j < 0 or j >= len(grid[0]) or
not grid[i][j]:
return 0
grid[i][j] = 0
return 1 + dfs(i-1, j) + dfs(i, j-1) + dfs(i, j+1) + dfs(i+1,
j)
res = 0
for i, row in enumerate(grid):
for j, cell in enumerate(row):
if grid[i][j]:
res = max(res, dfs(i, j))
return res
Fenwick or binary indexed trees:
Used to store running prefix sums. Otherwise if I want to do both insert
element at i as well as get sum of elements up to i, requires O(1) and O(n).
With BITs, can reduce to O(log n) for both.
Based on the idea that any integer (index) can be represented as a sum of
powers of 2.
"For example 19 can be represented as 16 + 2 + 1. Every node of BI
Tree stores sum of n elements where n is a power of 2. For example, in
the above first diagram for getSum(), sum of first 12 elements can be
obtained by sum of last 4 elements (from 9 to 12) plus sum of 8
elements (from 1 to 8). The number of set bits in binary representation
of a number n is O(Logn). Therefore, we traverse at-most O(Logn)
nodes in both getSum() and update() operations. Time complexity of
construction is O(nLogn) as it calls update() for all n elements."
https://www.wikiwand.com/en/Fenwick_tree
https://www.geeksforgeeks.org/binary-indexed-tree-or-fenwick-tree-2/
https://www.topcoder.com/community/data-science/data-science-
tutorials/binary-indexed-trees/
Cyclic replacements / circle shifting w GCD:
https://leetcode.com/articles/rotate-array/
https://www.geeksforgeeks.org/array-rotation/
https://stackoverflow.com/questions/876293/fastest-algorithm-for-circle-
shift-n-sized-array-for-m-position
Binary GCD / Stein's algorithm:
https://www.geeksforgeeks.org/steins-algorithm-for-finding-gcd/
M ti h t
Matrix search etc.:
To find an element in a row- and column-wise sorted matrix, just start at

bottom left element and move up if target is smaller, else move right. The
intuition here is that these are basically two arrays sorted in reverse order.
https://www.geeksforgeeks.org/search-in-row-wise-and-column-wise-
sorted-matrix/
Saddleback search:
http://typeocaml.com/2015/03/31/pearl-no-3-saddleback-search/
If it's a fully element sorted matrix (ie even more strictly sorted), can still do
the same thing, but can also apply a binary search variant for O(log m + lg n)
time. Basically binary search in a middle row, if not there, can know it's
between last two elements and can eliminate the top left and bottom right
"rectangles." https://www.geeksforgeeks.org/search-element-sorted-matrix/
Can use similar intuition for kth smallest questions, including for k pairs with
smallest sums where can transform array sums into matrix:
https://leetcode.com/problems/find-k-pairs-with-smallest-
sums/description/ (reminder that heaps are great!)
Merge k sorted lists (k-way merge):

https://www.wikiwand.com/en/K-way_merge_algorithm
Using a heap is kinda cheating, but can just use merge 2 linked lists function
recursively
code:
def mergeKLists(self, lists):
if not lists: return None
if len(lists) == 1: return lists[0]
def merge(l1, l2):
dummy = curr = ListNode(0)
while l1 and l2:
if l1.val <= l2.val:
curr.next = l1
l1 = l1.next
else:
curr.next = l2
l2 = l2.next
curr = curr.next
curr.next = l1 or l2
return dummy.next
mid = len(lists) / 2
l lf
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE l d 51/112
l = self.mergeKLists(lists[:mid])
r = self.mergeKLists(lists[mid:])
return merge(l, r)
important for external sort when arrays can't fit into main memory:
https://www.wikiwand.com/en/External_sorting#/External_merge_sort
One example of external sorting is the external merge sort algorithm,
which is a K-way merge algorithm. It sorts chunks that each fit in RAM,
then merges the sorted chunks together.[1][2]
The algorithm first sorts M items at a time and puts the sorted lists
back into external memory. It then recursively does a -way merge on
those sorted lists. To do this merge, B elements from each sorted list
are loaded into internal memory, and the minimum is repeatedly
outputted.
For example, for sorting 900 megabytes of data using only 100
megabytes of RAM:
Read 100 MB of the data in main memory and sort by some
conventional method, like quicksort.
Write the sorted data to disk.
Repeat steps 1 and 2 until all of the data is in sorted 100 MB
chunks (there are 900MB / 100MB = 9 chunks), which now need
to be merged into one single output file.
Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted
chunk into input buffers in main memory and allocate the
remaining 10 MB for an output buffer. (In practice, it might
provide better performance to make the output buffer larger and
the input buffers slightly smaller.)
Perform a 9-way merge and store the result in the output buffer.
Whenever the output buffer fills, write it to the final sorted file
and empty it. Whenever any of the 9 input buffers empties, fill it
with the next 10 MB of its associated 100 MB sorted chunk until
no more data from the chunk is available. This is the key step that
makes external merge sort work externally -- because the merge
algorithm only makes one pass sequentially through each of the
chunks, each chunk does not have to be loaded completely;
rather, sequential parts of the chunk can be loaded as needed.
Topological sort for DAGs / Kahn's algorithm:
https://en.wikipedia.org/wiki/Topological_sorting
First, find a list of "start nodes" which have no incoming edges and
insert them into a set S; at least one such node must exist in a non-
empty acyclic graph. Then:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edge

while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
If the graph is a DAG, a solution will be contained in the list L (the
solution is not necessarily unique). Otherwise, the graph must have at
least one cycle and therefore a topological sorting is impossible.
https://leetcode.com/problems/alien-dictionary/
code:
def alienOrder(self, words):
# a -> b
adj = defaultdict(set)
# in-degree
deg = {c: 0 for w in words for c in w}
for i, w1 in enumerate(words[:-1]):
w2 = words[i + 1]
for c1, c2 in zip(w1, w2):
if c1 == c2: continue
if c2 not in adj[c1]: deg[c2] += 1
adj[c1].add(c2)
break
res = ''
# start w 0 indegree nodes
q = deque([c for c in deg if not deg[c]])
while q:
c = q.popleft()
res += c
for n in adj[c]:
deg[n] -= 1
deg[ ]
if not deg[n]: q.append(n)

return res if len(res) == len(deg) else ''
Reservoir sampling:
https://www.wikiwand.com/en/Reservoir_sampling
O(n) code:
S has items to sample, R will contain the result
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]
The algorithm creates a "reservoir" array of size k and populates it with
the first k items of S. It then iterates through the remaining elements
of S until S is exhausted. At the ith element of S, the algorithm
generates a random number between 1 and i. If j is less than or equal
to k, the jth element of the reservoir array is replaced with
the ith element of S. In effect, for all i, the ith element of S is chosen to
be included in the reservoir with probability k/i. Similarly, at each
iteration the jth element of the reservoir array is chosen to be replaced
with probability 1/k * k/i, which simplifies to 1/i. It can be shown that
when the algorithm has finished executing, each item in has equal
probability (i.e. k / len(S)) of being chosen for the reservoir.
Sliding window approach:
example problem: https://leetcode.com/problems/fruit-into-baskets/
basically max subarray with 2 distinct elements)
def totalFruit(self, tree):
# sliding window approach
window = defaultdict(int)
max_len = j = 0
for i, fr in enumerate(tree):
window[fr] += 1
# j will automatically stop before i
while len(window) > 2:
prev = tree[j]
window[prev] -= 1
if not window[prev]:
del window[prev]
j += 1
max_len = max(i - j + 1, max_len)
return max_len
template explained here: https://leetcode.com/problems/find-all-anagrams-
in-a-string/discuss/92007/sliding-window-algorithm-template-to-solve-all-
the-leetcode-substring-search-problem
note that for the anagram problem (https://leetcode.com/problems/find-all-
anagrams-in-a-string/) the window stays a constant size as it slides to the
right. for the fruit problem, each time the window moves 1 to the right, if the
window no longer fits the criteria, we shrink the window starting from the
left end of it
Combinatorics:
permutations = order matters, arrangements
n permute r = n! / (n-r)!
combinations = order does not matter, selections of elements
n choose r = n! / (r! * (n-r)!)
Common or Noteworthy Problems / (Named) Algorithms

Knuth-Morris-Pratt:
class Solution(object):
https://leetcode.com/problems/implement-strstr
def strStr(self, haystack, needle):

if not needle: return 0
if not haystack: return -1
next_arr = self.create_next(needle)
i=j=0
while i < len(haystack) and j < len(needle):
if haystack[i] == needle[j]:
# Matched, so return the haystack's match start index.
if j == len(needle) - 1:
return i - len(needle) + 1
i, j = i + 1, j + 1
else:
# Slide pattern over.
if j: j = next arr[j-1]
if j: j = next_arr[j 1]
else: i += 1
return -1
# Build next jump table.
def create_next(self, pattern):
next_arr = [0] * len(pattern)
pre_i, suf_i = 0, 1
while suf_i < len(pattern):
# Found prefix-suffix match.
if pattern[pre_i] == pattern[suf_i]:
next_arr[suf_i] = pre_i + 1
pre_i, suf_i = pre_i + 1, suf_i + 1
else:
if pre_i:
pre_i = next_arr[pre_i-1]
else:
next_arr[suf_i] = 0
suf_i += 1
return next_arr
Making change:
https://www.interviewcake.com/question/python/coin
Task:
Write a function that, given:
an amount of money
a list of coin denominations
computes the number of ways to make amount of money with coins of
the available denominations.
Code:
Bottom-up DP memoization:
def change_possibilities_bottom_up(amount, denominations):
ways_of_doing_n_cents = [0] * (amount + 1)
ways_of_doing_n_cents[0] = 1
for coin in denominations:
for higher_amount in xrange(coin, amount + 1):
higher_amount_remainder = higher_amount -
coin
ways_of_doing_n_cents[higher_amount] +=
ways_of_doing_n_cents[higher_amount_remainde
r]
return ways_of_doing_n_cents[amount]
Maximum sum subarray / Kadane's algorithm:
https://leetcode.com/problems/best-time-to-buy-and-sell-stock/
Task:
find the contiguous subarray which has the largest sum
Algorithm:
"a scan through the array values, computing at each position the
maximum (positive sum) subarray ending at that position. This subarray
is either empty (in which case its sum is zero) or consists of one more
element than the maximum subarray ending at the previous position.
The algorithm only needs to keep track of the ending position because
the implied starting position is just after the last position at which the
sum went negative; a higher sum can always be found by dropping any
negative-sum prefix."
https://www.wikiwand.com/en/Maximum_subarray_problem
"look for all positive contiguous segments of the array

(max_ending_here is used for this). And keep track of maximum sum
contiguous segment among all positive segments (max_so_far is used
for this). Each time we get a positive sum compare it with max_so_far
and update max_so_far if it is greater than max_so_far"
http://www.geeksforgeeks.org/largest-sum-contiguous-subarray/
Basically, keep accumulating sum (max_ending_here) until the sum

becomes negative, at which point you start over again (e.g., set back to
0). Then max_so_far is max_ending_here if the latter is greater than the
previous max_so_far. The key is that no solution will include a negative
sum prefix (because the sum could be larger if it just left that prefix off).
Pseudocode:
Complexity: O(N).
Wiki:
def max_subarray(A):
max_ending_here = max_so_far = 0
for x in A:
max_ending_here = max(0, max_ending_here + x)
max_so_far = max(max_so_far, max_ending_here)
return max_so_far
Not as clean:
def maxSubArraySum(a,size):
max_so_far = -maxint - 1
max_ending_here = 0
for i in range(0, size):
max_ending_here = max_ending_here + a[i]

if (max_so_far < max_ending_here):
max_so_far = max_ending_here
if max_ending_here < 0:
max_ending_here = 0
return max_so_far
Examples:
https://leetcode.com/problems/best-time-to-buy-and-sell-stock/
Say you have an array for which the ith element is the price of a
given stock on day i.
If you were only permitted to complete at most one transaction
(ie, buy one and sell one share of the stock), design an algorithm
to find the maximum profit.
Example 1:
Input: [7, 1, 5, 3, 6, 4]
Output: 5
max. difference = 6-1 = 5 (not 7-1 = 6, as selling price needs
to be larger than buying price)
Example 2:
Input: [7, 6, 4, 3, 1]
Output: 0
In this case, no transaction is done, i.e. max profit = 0.
def maxProfit(self, prices):
"""
:type prices: List[int]
:rtype: int
"""
max_curr = max_so_far = 0
for i, price in enumerate(prices[1:]):
max_curr = max(0, max_curr + price - prices[i])
max_so_far = max(max_so_far, max_curr)
return max_so_far
Maximum size subarray sum equals k:
https://leetcode.com/problems/maximum-size-subarray-sum-equals-k/
Task:
Given an array nums and a target value k, find the maximum length of
a subarray that sums to k. If there isn't one, return 0 instead.
Note: The sum of the entire nums array is guaranteed to fit within the

32-bit signed integer range.
Example 1:
Given nums = [1, -1, 5, -2, 3], k = 3,
return 4. (because the subarray [1, -1, 5, -2] sums to 3 and is the
longest)
Example 2:
Given nums = [-2, -1, 2, 1], k = 1,
return 2. (because the subarray [-1, 2] sums to 1 and is the
longest)
Follow Up: Can you do it in O(n) time?
Code:
def maxSubArrayLen(self, nums, k):
"""
:type k: int
:rtype: int
"""
sum_dict = {0: -1}
ans = acc = 0
for i, num in enumerate(nums):
acc += num
if acc not in sum_dict: sum_dict[acc] = i
if acc-k in sum_dict: ans = max(ans, i - sum_dict[acc-k])
return ans
Maximum depth of binary tree:
https://leetcode.com/problems/maximum-depth-of-binary-tree/
Task:
Given a binary tree, find its maximum depth.
The maximum depth is the number of nodes along the longest path
from the root node down to the farthest leaf node.
Code:
Recursive:
# Definition for a binary tree node.
# class TreeNode:
# def __init__(self, x):
# self.val = x
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE lf l f 59/112
# self.left = None
# self.right = None
class Solution:
# @param {TreeNode} root
# @return {integer}
def maxDepth(self, root):
# Base case: no node.
if not root: return 0
# Recursive case: the maximum of the depths of the
left and right sides plus the root depth (1).
return max(Solution.maxDepth(self, root.left),
Solution.maxDepth(self, root.right)) + 1
Iterative:
from collections import deque
def maxDepth(self, root):
"""
:type root: TreeNode
:rtype: int
"""
# Iterative method: count number of levels.
if not root: return 0
q, depth = deque(), 0
q.append(root)
while(True):
nodect = len(q)
if not nodect: return depth
depth += 1
while nodect:
node = q.popleft()
if node.left: q.append(node.left)
if node.right: q.append(node.right)
nodect -= 1
Binary tree paths
https://leetcode.com/problems/binary-tree-paths/
Task:
Given a binary tree, return all root-to-leaf paths.
For example, given the following binary tree:
1
/ \
2 3
\
5
All root-to-leaf paths are:
["1->2->5", "1->3"]
Code:
DFS recursive:
def binaryTreePaths(self, root):
"""
:type root: TreeNode
:rtype: List[str]
"""
if not root: return []
res = []
self.dfs(root, "", res)
return res
def dfs(self, root, ret_str, res):
if not root.left and not root.right:
res.append(ret_str + str(root.val))
if root.left:
self.dfs(root.left, ret_str + str(root.val) + '->', res)
if root.right:
self.dfs(root.right, ret_str + str(root.val) + '->', res)
DFS iterative (with stack):
def binaryTreePaths1(self, root):
if not root:
return []
res, stack = [], [(root, "")]
while stack:
node, ls = stack.pop()
if not node.left and not node.right:
res.append(ls+str(node.val))
if node.right:
stack.append((node.right, ls+str(node.val)+"->"))
if node.left:
t k d(( d l ft l t( d l) " >")) 61/112
stack.append((node.left, ls+str(node.val)+"->"))
return res
BFS (with queue):
def binaryTreePaths2(self, root):
if not root:
return []
res, queue = [], collections.deque([(root, "")])
while queue:
node, ls = queue.popleft()
if not node.left and not node.right:
res.append(ls+str(node.val))
if node.left:
queue.append((node.left, ls+str(node.val)+"->"))
if node.right:
queue.append((node.right, ls+str(node.val)+"-
>"))
return res
Single number:
https://leetcode.com/problems/single-number/
Task:
Given an array of integers, every element appears twice except for one.
Find that single one.
Note:
Your algorithm should have a linear runtime complexity. Could you
implement it without using extra memory?
Code:
My non-xor one:
def singleNumber(self, nums):
"""
:rtype: int
"""
# 1, 1, 2, 3, 3
# 1, 2, 2
# 1, 1, 2, 2, 3
nums.sort()
for i, n in enumerate(nums):
# If odd number of elements and on last element,
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE b dd 62/112
must be odd man out.
if len(nums) % 2 != 0 and i == len(nums) - 1: return n

# Compare every pair.
if i % 2 != 0: continue
if n != nums[i+1]: return n
XOR solutions (xor of the same numbers returns 0; xor of a nonzero
number and zero returns the nonzero number):
res = 0
for num in nums:
res ^= num
return res
return reduce(lambda x, y: x ^ y, nums)
Reverse linked list:
https://leetcode.com/problems/reverse-linked-list/
Task:
Reverse a singly linked list.
Hint: A linked list can be reversed either iteratively or recursively. Could
you implement both?
Code:
Concise iterative:
def reverseList(self, head):
"""
:type head: ListNode
:rtype: ListNode
"""
prev = None
curr = head
# Change one link at a time.
while(curr is not None):
# Hold on to next node since will redirect link.
next = curr.next
# Reverse link direction.
curr.next = prev
# Update prev pointer.
prev = curr
# Update curr pointer for next iteration.
curr = next
# Not curr since will be None.
return prev
Concise recursive:
def reverseList(self, head, prev=None):
"""
:rtype: ListNode
"""
if not head: return prev
# Reverse links, making sure to hold on to the next node.
curr = head.next
head.next = prev
return self.reverseList(curr, head)
Worked out example:
# 4 -> 1 -> 3 -> None
# None <- 4 <- 1 <- 3
# = 3 -> 1 -> 4 -> None
#
# curr = 1
# head.next = 4.next = None -- None <- 4; curr = 1 -> 3 ->
None
# return reverseList(curr, head) = reverseList(1 -> 3 -> None,
None <- 4)
# curr = 3
# head.next = 1.next = 4 -- None <- 4 <- 1; curr = 3 ->
None
# return reverseList(curr, head) = reverseList(3 -> None,
None <- 4 <- 1)
# curr = None
# head.next = 3.next = 1 -- None <- 4 <- 1 <- 3; curr =
None
# return reverseList(None, None <- 4 <- 1 <- 3)
# not head: return prev = None <- 4 <- 1 <- 3
Naive brute force iterative:
"""

:rtype: ListNode
"""
if not head: return None
ll_arr = []
node = head
while node:
ll_arr.append(node.val)
node = node.next
r_cur = r_head = ListNode(ll_arr[-1])
for ele in list(reversed(ll_arr[:-1])):
r_cur.next = ListNode(ele)
r_cur = r_cur.next
return r_head
Valid parentheses:
https://leetcode.com/problems/valid-parentheses/
Code:
def isValid(self, s):
map = {'(': ')', '{': '}', '[': ']'}
# Use a list as a stack.
stack = []
for i, c in enumerate(s):
if c in map:
stack.append(c)
elif c in map.values():
# Also fails if stack is empty -> too many closing
parens.
if not stack or map[stack.pop()] != c: return False
return not stack
Remove duplicates from unsorted singly linked list:
CTCI 2.1
Code:
With buffer / auxiliary data structure:
from linkedlist import Node
def remove_dupes(head):
# Edge case where single node list.
if not head.next_node: return head
curr, prev = head, Node()
dupes = {}
while curr:
if curr.data in dupes:
prev.next_node = curr.next_node
else:
dupes[curr.data] = 1
prev = curr
curr = curr.next_node
Without buffer:
# Now do it without another data structure.
def remove_dupes2(head):
if not head.next_node: return head
curr, runner = head, head.next_node
# Curr, not curr.next, because otherwise would miss last
node.
while curr:
runner = curr
while runner.next_node:
# Compare using runner.next because no prev
pointer.
if curr.data == runner.next_node.data:
runner.next_node =
runner.next_node.next_node
else:
runner = runner.next_node
curr = curr.next_node
Two sum:
https://leetcode.com/problems/two-sum/
Task:
Given an array of integers, return indices of the two numbers such that
they add up to a specific target.
You may assume that each input would have exactly one solution.
Example:
Given nums = [2, 7, 11, 15], target = 9,
Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].
Code:
def twoSum(self, nums, target):
"""

:type target: int
:rtype: List[int]
"""
nums_tups = []
for i, num in enumerate(nums):
diffs = [n[1] for n in nums_tups]
if num in diffs:
# diffs and nums are in sync
return [diffs.index(num), i]
else:
nums_tups.append((num, target - num))
Add two numbers:
https://leetcode.com/problems/add-two-numbers/
Task:
You are given two non-empty linked lists representing two non-
negative integers. The digits are stored in reverse order and each of
their nodes contain a single digit. Add the two numbers and return it as
a linked list.
You may assume the two numbers do not contain any leading zero,
except the number 0 itself.
Input: (2 -> 4 -> 3) + (5 -> 6 -> 4)
Output: 7 -> 0 -> 8
Code:
def addTwoNumbers(self, l1, l2):
"""
:type l1: ListNode
:type l2: ListNode
:rtype: ListNode
"""
l1_str = ''
l2_str = ''
while l1.next != None:
l1_str += str(l1.val)
l1 = l1.next
while l2.next != None:
l
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE l l 67/112
l2 = l2.next
l1_num, l2_num = int(l1_str[::-1]), int(l2_str[::-1])
sum = l1_num + l2_num
sum_str_rev = str(sum)[::-1]
# Need a copy of root node to be able to return at the end since
modifying while iterating.
root = ListNode(0)
curr = root
for c in sum_str_rev:
l_curr = ListNode(int(c))
curr.next = l_curr
curr = curr.next
return root.next
Sparse matrix multiplication:
https://leetcode.com/problems/sparse-matrix-multiplication/
Task:
Given two sparse matrices A and B, return the result of AB.
You may assume that A's column number is equal to B's row number.
Example:
A=[
[ 1, 0, 0],
[-1, 0, 3]
]
B=[
[ 7, 0, 0 ],
[ 0, 0, 0 ],
[ 0, 0, 1 ]
]
| 1 0 0 | | 7 0 0 | | 7 0 0 |
AB = | -1 0 3 | x | 0 0 0 | = | -7 0 3 |
| 001|
Code:
def multiply(self, A, B):
"""
:type A: List[List[int]]
:type B: List[List[int]]
:rtype: List[List[int]]
"""
# Matrix is a list of lists where each list is a row.
"""
if not A or not B: return None
AB = [[0 for _ in B[0]] for _ in A]
# Iterate through rows of A.
for i, row_a in enumerate(A):
# 0, 1
#Iterate through cols of A.
for j, ele_a in enumerate(row_a):
# 0, 1, 2
if ele_a:
# Iterate through rows of B.
for k, ele_b in enumerate(B[j]):
# 0, 1, 2
if ele_b:
AB[i][k] += A[i][k] * B[k][j]
return AB
ZigZag conversion:
https://leetcode.com/problems/zigzag-conversion/
Task:
The string "PAYPALISHIRING" is written in a zigzag pattern on a given
number of rows like this: (you may want to display this pattern in a
fixed font for better legibility)
P A H N
A P L S I I G
Y I R
And then read line by line: "PAHNAPLSIIGYIR"
Write the code that will take a string and make this conversion given a
number of rows:
string convert(string text, int nRows);
convert("PAYPALISHIRING", 3) should return "PAHNAPLSIIGYIR".
Code:
def convert(self, s, numRows):
# The key is that the distance down then up (or up then down) to
get to the next letter on the current row is symmetric regardless
of the horizontal distance traveled / the diagonal
of the horizontal distance traveled / the diagonal.
linenum = 1
ret_str = ''
ind = 0
# Have to alternate strategy for middle lines.
mid_cycle_index = 1
if numRows == 1: return s
for i in xrange(len(s)):
ret_str += s[ind]
# First line.
if linenum == 1:
ind += 2 * (numRows - 1)
elif linenum == numRows:
ind += 2 * (linenum - 1)
else:
if mid_cycle_index % 2 == 1:
ind += 2 * (numRows - linenum)
else:
ind += 2 * (linenum - 1)
mid_cycle_index += 1
# Go to next line.
if ind >= len(s):
linenum += 1
ind = linenum - 1 # ind is 0-based, linenum 1-based
mid_cycle_index = 1
return ret_str
Meeting rooms:
https://leetcode.com/problems/meeting-rooms/
Task:
Given an array of meeting time intervals consisting of start and end
times [[s1,e1],[s2,e2],...] (si < ei), determine if a person could attend all
meetings.
For example,
Given [[0, 30],[5, 10],[15, 20]],
return false.
Code:
# Definition for an interval.
# class Interval(object):
# def __init__(self, s=0, e=0):

# self.start = s
# self.end = e
def canAttendMeetings(self, intervals):
"""
:type intervals: List[Interval]
:rtype: bool
"""
if not intervals or len(intervals) == 1: return True
intvl_st = sorted(intervals, key=lambda x: x.start)
for i, intvl in enumerate(intvl_st[1:]):
# i, not i-1, since already skipped first one in intvl_st[1:].
prev = intvl_st[i]
if intvl.start < prev.end: return False
return True
Iterative inorder traversal:
https://leetcode.com/problems/closest-binary-search-tree-value/
def inorder_traversal(root):
stack = [root]
while stack:
curr = stack.pop()
if not curr.left and not curr.right:
print curr.val
else:
if curr.right: stack.append(curr.right)
stack.append(Node(curr.val))
Dummy leaf node so next time will print right away.
if curr.left: stack.append(curr.left)
Rotate a matrix 90 degrees:
A[:] = zip(*A[::-1])
Reverse then transpose (rows as cols and vice versa).
Lowest common ancestor:

Can just use BST properties, one node needs to be in left subtree (< root
value) and the other in the right subtree (> root)
Trickier for binary tree
Get linked list of binary tree leaves in-place:
http://www.geeksforgeeks.org/connect-leaves-doubly-linked-list/
Python Misc :
Python Misc.:
s = "".join(c for c in s if c not in ('a,'c','e'))

Easy to convert a string to a list of characters: list(thestr)
The converse is just ''.join(lst_str)
Conditionals in list comprehensions:
alst = ['%20' if c == ' ' else c for c in alst]
This returns a (possibly modified) element for each element in the list
conditional on what the element is.
alst = ['%20' for c in alst if c == ' ']
This filters, i.e., picks what elements of the list the comprehension will
operate on.
Any/all can be helpful:
Check if all elements of a list are x:
if all(c == ' ' for c in alst): return -1
Check if any tuples contain a negative value:
if any(x < 0 or y < 0 for (x, y) in list_ranges):
raise Exception("Negative start/end times don't make sense.")
list[::-1] reverses the list
list.reverse() is in-place; reversed(list) returns the reversed list (technically an
iterator).
Similarly, list.sort() is in-place, sorted(list) returns the sorted list.
Can supply arguments: sorted(list, key=lambda x: x[1], reverse=True) returns
a reversed sorted list of tuples sorted by the second element in each tuple.
For a secondary sort, use a tuple like: sorted(list, key=lambda x: (x[1], x[0]))
If want to reverse but keep the indices the same, use xrange's arguments:
for i in xrange(len(int_list) - 1, -1, -1):
Time complexity of Python data structures:
https://wiki.python.org/moin/TimeComplexity
list membership is O(n) so if need to do membership tests, use a (frozen)set.
if elem not in arr
Use a frozenset if need immutable / hashable set (e.g., if using as a dict

key or a nested set).
set membership checking is O(1) since it's implemented as a dummy dict
dict membership is O(1) for keys
string constants can be useful:
if char in string.letters
if char in string.punctuation
space = set(string.whitespace)
if str digit in string digits
if str_digit in string.digits
Can easily swap variables without using a temporary variable:

a, b = b, a
E.g.: this works:
a = 1; b = 2
a, b = b + 1, a + 1
# a = 3, b = 2; instead of e.g. a = 3, b = 4
More practically, to reverse a linked list, just do:
def reverse_ll(head):
curr, prev = head, None
while curr:
curr.next, prev, curr = prev, curr, curr.next
return prev
Object oriented / classes:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def another_method(self, arg):
pass
To add to a linked list in a loop, need to set a head node at the start to be able to
reference the entire thing at the end (because cur_node will be constantly
updated while iterating through). Sometimes might need a dummy head node in
which case will need to return dummy.next at the end.
E.g.:
"""
:rtype: ListNode
"""
ll_arr = []
node = head
# This excludes last node, need to add back.
while node.next:
node = node.next

r_cur = r_cur.next
return r_head
When iterating through a linked list, use:
while node:
Not while node.next since that will exclude the last node.

"""
:rtype: ListNode
"""
ll_arr = []
node = head
while node:
node = node.next
r_cur = r_cur.next
return r_head
If deleting node in singly linked, consider using runner.next.
If need to reverse or parse or do tree/graph traversal in a problem, consider using
a stack:
"Two common uses for stacks are:
parsing (like in this problem)
tree or graph traversal (like depth-first traversal)
So remember, if you're doing either of those things, try using a stack!"
c.isalnum() tests whether alphanumeric character
s.strip() removes leading and trailing whitespace
To initialize a list of (empty) lists, do [[] for i in range(3)]
result: [[], [], []]
Also, if I need a visited grid or some kind of matrix, do:
visited = [[False for x in xrange(len(grid[0]))] for y in xrange(len(grid))]
or: res = [[1 for i in xrange(len(A[0]))] for i in xrange(len(A))]

[0 for i in xrange(5)] —> [0, 0, 0, 0, 0]
Possible to do an if elif else ternary in one line:
return num % 9 if num % 9 else 9 if num else 0
Digital root.
Stacks are easily implemented with lists: just use append() for push and pop() for
pop.
Append adds to end of list, pop (without explicit index) removes and returns from end of list.
Queues, however, should be implemented with deques, with append() for queue
and popleft() for dequeue.
Re: popleft(), Imagine the list going from right-to-left, where the right is the
end. (I.e., add elements at the right).
To initialize a deque with a tuple, need to do deque([(x, y)]):
but can append with just q.append((x, y))
If don't want to use a deque, can just do pop(0).
equivalent to popleft()
When updating / incrementing a dictionary, use get to simplify the null check
(whether the key already exists):
for c in s:
freqs[c] = freqs.get(c, 0) + 1
or: my_dict[some_key] = my_dict.get(some_key, 0) + 1
the set equivalent is my_dict.setdefault(some_key, default), but most of the
time, can either use (preferably) defaultdict or get
Dictionary comprehension and sorting:
iterating over a dict only iterates over its keys
i.e., for i in dict: i will be each key, not each key-value pair
similarly, to test for key membership, just do: if key in dict
to get the actual key-value items:
for i, j in dict.iteritems()
applies similarly for comprehensions: new_d = {k: v+1 for k, v in
d.iteritems()}
sorted list of items (k-v tuples) by key:
sorted(d.iteritems())
sorted list of keys:
sorted(d)
sorted list of items (tuples) by value:
sorted(d.iteritems(), key=lambda x: x[1])
sorted list of keys by value:
sorted(d, key=lambda x: d[x])
Sometimes might need positive or negative infinity values:

-float('Inf')
Logical operator evaluation:
http://thomas-cokelaer.info/tutorials/python/boolean.html
The evaluation using the and and or operators follow these rules:

and and or evaluates expression from left to right.
with and, if all values are True, returns the last evaluated value. If any
value is false, returns the first one.
or returns the first True value. If all are False, returns the last value
E.g.:
not x: Returns True if x is True, False otherwise
x and y: Returns x if x is False, y otherwise
x or y: Returns y if x is False, x otherwise
To remove duplicates from a list but maintain order, can just do
list(OrderedDict.fromkeys(arr)):
more hacky way would be using a set
http://www.martinbroadhurst.com/removing-duplicates-from-a-list-while-
preserving-order-in-python.html
seen = set()
return [x for x in arr if not (x in seen or seen.add(x)]
To prepend, use list.insert(0, x), but if doing this, consider just using a deque which
has appendleft and popleft.
To extend but at the start, can do a[:0] = b.
If using a deque and extendleft, remember that it reverses the order of the
iterable passed in.
Heaps can be very useful if I need to maintain some kind of min/max value, but it
needs to be dynamically updated over time among other possible previous values:
https://docs.python.org/2/library/heapq.html
Basically just initialize heap as a normal list and then use heaq.heappush(arr,
x), heappop, and heapreplace to push, pop, and replace an item while
maintaining min-heap invariant.
https://leetcode.com/problems/meeting-rooms-ii/description/
intervals.sort(key=lambda x: x.start)
# keep track of min end time
heap = []
for n in intervals:
# heap[0] is always the min end time
# this means we can reuse this room but just need to update end
if h d
https://workflowy.com/s/study-guide/RD5kZ682pWX5oxiE h [0] 76/112
if heap and n.start >= heap[0]:
heapq.heapreplace(heap, n.end)
# otherwise we need to allocate a new room
else:
heapq.heappush(heap, n.end)
return len(heap)
When using reduce, might need to have an initializer / default value (e.g., when it's
an empty list):
prod = prev_prod * reduce(lambda x, y: x*y, list_ints[i+1:], 1)
Deep vs. shallow copying: https://www.wikiwand.com/en/Object_copying
https://www.cs.utexas.edu/~scottm/cs307/handouts/deepCopying.htm
Shallow copies duplicate as little as possible. A shallow copy of a collection is
a copy of the collection structure, not the elements. With a shallow copy, two
collections now share the individual elements. A chance in the underlying
elements will change the copy as well.
Deep copies duplicate everything. A deep copy of a collection is two
collections with all of the elements in the original collection duplicated. This
means that the elements are unique to each copy.
Can't use a list as a dict key (can use tuple, but can't have a mutable type inside
like a list):
Keys need to be immutable to be hashable.
But can use the string representation.

while amount_left >= 0:
if (amount_left, repr(denominations[current_index+1:])) in memoize:
rest_poss = memoize[(amount_left,
repr(denominations[current_index+1:]))]
else:
rest_poss = change_possibilities_top_down(amount_left,
denominations, current_index + 1)
memoize[(amount_left, repr(denominations[current_index+1:]))] =
rest_poss
num_possibilities += rest_poss
amount_left -= current_coin
Can also convert the list to a tuple or frozenset.
Dynamic programming & memoization:
When we memoize (cache a subproblem's answer / recursive function's
output so we don't need to recompute it again), the memoization dictionary
probably needs to be outside the function to save/share state. I.e., make a
class. Interview Cake #5:
https://www.interviewcake.com/question/python/coin
p // /q /py /
Top-down recursion can be memory-intensive because of building up the

call stack. Can avoid by going bottom-up.
To use, always think about starting from the end, i.e., if I had an answer for
everything but the last something (element, cell, subproblem, etc.), what
more do I need? Basically the n+1 step of an induction proof.
Interview Cake:
"Dynamic programming is kind of like the next step up from greedy .
You're taking that idea of "keeping track of what we need in order to
update the best answer so far," and applying it to situations where the
new best answer so far might not just have to do with the previous
answer, but some other earlier answer as well.
So as you can see in this problem, we kept track of all of our previous
answers to smaller versions of the problem (called "subproblems") in a
big list called ways_of_doing_n_cents.
Again, same idea of keeping track of what we need in order to update
the answer as we go, like we did when storing the max product of 2,
min product of 2, etc in the highest product of 3 question. Except now
the thing we need to keep track of is all our previous answers, which
we're keeping in a list.
We built that list bottom-up, but we also talked about how we could
do it top-down and memoize. Going bottom-up is cleaner and usually
more efficient, but often it's easier to think of the top-down version
first and try to adapt from there."
Memoization is top-down depth-first and DP is bottom-up breadth-first:
http://stackoverflow.com/questions/27609246/if-memoization-is-top-
down-depth-first-and-dp-is-bottom-up-breadth-first-what?rq=1
http://blog.racket-lang.org/2012/08/dynamic-programming-versus-
memoization.html
Don't be scared of adding null elements/children/etc. to a container, can always
check for it when popping from the container and just ignore or continue if null.
To convert a letter to a number/index, can use ord. ord('a') = 97, so just subtract
that.
Note that tuple comprehensions don't exist—using parentheses creates a
generator. Instead, just do a list comprehension and convert it to a tuple
afterward.
In a for-else, the else clause executes when the loop terminates naturally, i.e.,
without a break.
http://python-
notes.curiousefficiency.org/en/latest/python_concepts/break_else.html
http://book.pythontips.com/en/latest/for_-_else.html
while-else can be useful too and is equivalent to

while condition:
handle_true()
else:
# condition is false now, handle and go on with the rest of the
program
# didn't break
handle_false()
so basically the else always happens unless I exit the loop through a break
useful here https://leetcode.com/problems/asteroid-collision/submissions/
def asteroidCollision(self, asteroids):
"""
:type asteroids: List[int]
:rtype: List[int]
"""
stk = []
for ast in asteroids:
if not stk or ast > 0:
stk.append(ast)
continue
while stk and stk[-1] >= 0:
prev = -1 * stk[-1]
if prev < ast:
break
elif prev == ast:
stk.pop()
break
else:
stk.pop()
else:
stk.append(ast)
return stk
List appends and extends modify the list (and don't return anything). If I want to
return something, just use + (i.e. concatenation):
This doesn't work:
curr = [1, 1]
for i in xrange(2, k + 1):
for i in xrange(2, k 1):
c1 = [0].extend(curr)
c2 = curr.append(0)
curr = [c1 + c2 for c1, c2 in zip(c1, c2)]
This does work:
for i in xrange(2, k + 1):
c1 = [0] + curr
c2 = curr + [0]
Can just convert a string into a set of the chars directly: str_set = set(the_str)
In any case where I need to do 2 for loops going forward then backward (2
pointers), remember that I can always construct the reverse pointer index from the
"forward" one, i.e.:
for i in xrange(len(nums)):
j = -1 - i
will give me the opposite end pointer each time
can also do: new_j = -j - 1
Be careful when initializing a matrix:
do: self.board = [[None] * n for _ in xrange(n)]
not: self.board = [[None] * n] * n
The second way makes copies of / references to the inner list such that any
modification to one row changes the other rows
To modify a list in place, have to use nums[:], if use nums =, then that generates
new list.
One common Python gotcha involves default mutable arguments:
If we try to do something like: def func(x, arr=[]):
"A new list is created once when the function is defined, and the same list is
used in each successive call.
Python’s default arguments are evaluated once when the function is defined,
not each time the function is called (like it is in say, Ruby). This means that if
you use a mutable default argument and mutate it, you will and have
mutated that object for all future calls to the function as well."
http://docs.python-guide.org/en/latest/writing/gotchas/
Instead, do:
def func(x, arr=None):
if not arr: arr = []
To compare dictionaries (i.e., whether the keys and values are the same), just use
==. Dictionaries are not ordered so it works.
or shortcircuits, so can do things like: cur.next = l1 or l2
This will add the rest of linked list 1 if it's not null, otherwise add l2.
Stack of iterators is a nice design pattern for iterative DFS:

http://garethrees.org/2016/09/28/pattern/
def search(root):
stack = [iter([root])]
while stack:
for node in stack[-1]:
stack.append(iter(children(node)))
break
else:
stack.pop()
This avoids the three problems above:
Pushing and popping a list is faster than calling a function in Python.
Lists can grow without limit, unlike the function call stack.
Since there’s no recursion, you can just return when you have a result:
def search(root):
stack = [iter([root])]
while stack:
for node in stack[-1]:
if target(node):
return node
stack.append(iter(children(node)))
break
else:
stack.pop()
"The control flow here might seem a bit tricky if you’re not used to the way
that Python’s for ... else construct interacts with break. The pattern works by
maintaining a stack of iterators that remember the position reached in the
iteration over the children of the node at each level of the search. After
pushing a new iterator on the stack, the break exits the for loop, bypasses
the else: clause, and goes round the while loop again, so that it picks up the
new iterator from stack[-1]. When there are no more children in the current
iteration, the for loop exits via the else: clause and pops the stack. Then the
next iteration of the while loop picks up the iteration at the previous level
from where it left off."
count function is useful:
if s.count(c) == 1:
also works for arrays of nums!
freqs1 = {i: nums1.count(i) for i in nums1}
freqs1 {i: nums1.count(i) for i in nums1}
repr() is intended to be unambiguous; str() intended to be readable.

is / is not compares object references (tests for object identity); == / != compares
values (tests for object value equality).
is will return True if two variables point to the same object; == if the objects referred to by the
variables are equal.
Beware of using IndexError to bounds check, since negative indices are valid and
will just wrap around.
Note that abstract data types (e.g., stack, queue) are distinct from data structures
(e.g., array, heap). The former is a mathematical model, whereas the latter is a
concrete representation of data from the perspective of the implementer, not the
user.
From Wiki.
For example, a list is not the same sort of thing as a linked list -- one is an
ADT, the other a data structure.
Further, abstract data types specify a sort of contract: what operations can
be done (e.g., pop, push), what constraints exist on those operations (e.g.,
dequeue returns the first element), and what sort of data on which the
operations act (e.g., nodes). Data structures, then, are just the
implementations of ADTs.
Remember that algorithm efficiency analysis (i.e., Big O) represents the upper
bound of an algorithm's running time. Only the highest order term matters,
disregarding constants. For example, natural log running time has the same
complexity as logarithmic (base 10) running time: O(log n).
I.e., the runtime is expressed in terms of how quickly it grows relative to the
input, as the input gets arbitrarily large.
http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-
exactly
Logarithms of all bases grow at the same asymptotic rate.
http://stackoverflow.com/questions/17121006/confusion-around-big-o-
with-natural-logs
Remember that parameters are passed by assignment (by reference but
references passed as values), so if pass a mutable argument (like list), can mutate
it in a function and have the changes reflected elsewhere / outside the scope.
http://stackoverflow.com/questions/986006/how-do-i-pass-a-variable-by-reference
In a nested list comprehension, the order of for- and if-expressions is the same as
it would be in a loop format.
I.e., if trying to duplicate:
for row in grid:
for x in row
would just be:
[x for row in grid for x in row]
[x for row in grid for x in row]
Strings are immutable, so to do in-place need to convert to a list and do

something like s[:] = list(s).
getting a copy of s, basically.
Shallow vs deep copy:

http://stackoverflow.com/questions/17246693/what-exactly-is-the-
difference-between-shallow-copy-deepcopy-and-normal-assignm
Instead of i = -i + 1, can use ~:
def is_palin(self, s):
for i in xrange(len(s) / 2):
if s[i] != s[~i]: return False
return True
Generators are nice if need to know state while iterating through a list or if the list
is potentially very large / infinite:
https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-
generators-explained/
binary tree DFS generator iterator:
https://leetcode.com/problems/maximum-width-of-binary-tree/discuss/106707/Python-
Straightforward-BFS-and-DFS-solutions
def widthOfBinaryTree(self, root):

def dfs(node, depth = 0, pos = 0):
if node:
yield depth, pos
yield from dfs(node.left, depth + 1, pos * 2)
yield from dfs(node.right, depth + 1, pos * 2 + 1)
left = {}
right = {}
ans = 0
for depth, pos in dfs(root):
left[depth] = min(left.get(depth, pos), pos)
right[depth] = max(right.get(depth, pos), pos)
ans = max(ans, right[depth] - left[depth] + 1)
return ans
Use list(itertools.chain(*listoflists)) to flatten a nested list.
In a dict comprehension, when using a ternary operator, the 'else' is just for the
value:
d = {k: str(v) if k in str_keys else v for k, v in d.iteritems()}
not: d = {k: str(v) if k in str_keys else k: v for k, v in d.iteritems()}
If implementing a boolean method (has_next) for a list:
use: return bool(list)
or, more hacky, return not not list

To time a function:
def gen_permutes(s):
stuff
if __name__ == '__main__':
from timeit import timeit
def wrapper(func, *args, **kwargs):
def wrapped():
return func(*args, **kwargs)
return wrapped
s = 'aabfcd'
wrapped = wrapper(gen_permutes, s)
wrapped2 = wrapper(gen_permutes2, s, 0, 5)
g1 = round(timeit(wrapped, number=10000), 3)
print 'gen_permutes of {} took {} seconds to get:'.format(s, str(g1))
Default arguments can be useful (see, e.g.,
https://docs.python.org/2/tutorial/controlflow.html), but be careful when the
default is a mutable object like a list, since default values are only evaluated once.
Can get ceiling with integer division by using negative sign, since Python [2]
always rounds down / does floor. I.e.: 8 / 3 = 2 but -8 / 3 = -3. So to get positive,
just add another negative: times = -(-len(B) / len(A))
Portable shebang line: #!/usr/bin/env python
Java Misc.:
Length: array.length; ArrayList.size(); string.length();
HashMap<T>.put(key, value); map.get(key);
ArrayList:
add(element)
contains(element)
get(index)
LinkedList:
add(element)
With or without an additional index argument. Without, adds to tail.
remove(element/index)
get(index)
getFirst()
Dequeue if implementing queue.
list.size()
Stack:
Stack:
empty()
peek()
pop()
push(element)
search(element)
Returns offset from top of stack.
Arrays.sort(primitiveArray);
Collections.sort(arrayList);
Primitive arrays have to be initialized with a capacity, ArrayLists don't -- they
dynamically grow.
currChar = theStr.charAt(i);
From char to String and back:
String s = String.valueOf(c);
char c = s.charAt(0);
char[] chars = s.toCharArray();
From int to String and back:
String str = String.valueOf(foo);
Note that this is the same as above!
int foo = Integer.parseInt(str)

The conditional operator can seem weird:
int x = booleanExpr ? thisNum : thatNum;
Note that it has to return a value.
But it's just equivalent to this:

if (booleanExpr) x = thisNum;
else { x = thatNum; }
Data Engineering:
SQL:
Cheatsheet: http://files.zeroturnaround.com/pdf/zt_sql_cheat_sheet.pdf
Use HAVING when you want to filter on aggregate columns, WHERE won't
work for that.
WHERE comes before GROUP BY; HAVING after.
HAVING requires aggregation, like "HAVING sum(population) >
256000".
"SELECT * FROM a LEFT OUTER JOIN b ON a.id = b.a_id WHERE b IS NULL;"
gets you the records that are only in table a; the null part excludes the right-
hand table's records / matching records are excluded
https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
Order of clauses:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
When using a subquery with conditional logic, ensure the subquery is either
scalar (returns just 1 value) or the operator is IN:
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE Date = (SELECT MIN(date)
)
or
SELECT *
WHERE Date IN (SELECT date
ORDER BY date
LIMIT 5
)
Case details:
Useful to "bucket" values in a column (e.g., output 1 when the value is
X, 0 otherwise).
The CASE statement always goes in the SELECT clause
CASE must include the following components: WHEN, THEN,
and END. ELSE is an optional component.
You can make any conditional statement using any conditional
operator (like WHERE) between WHEN and THEN. This includes
stringing together multiple conditional statements using AND and OR.
You can include multiple WHEN statements, as well as
an ELSE statement to deal with any unaddressed conditions.
Example:
SELECT player_name,
CASE WHEN year = 'FR' AND position = 'WR' THEN 'frosh_wr'
WHEN year = 'SO' AND position = 'WR' THEN 'soph_wr'
ELSE NULL END AS sample_case_statement
FROM benn.college_football_players
With aggregate functions:
But what if you also wanted to count a couple other conditions?

Using the WHEREclause only allows you to count one condition.
Here’s an example of counting multiple conditions in one query:
SELECT CASE WHEN year = 'FR' THEN 'FR'
WHEN year = 'SO' THEN 'SO'
WHEN year = 'JR' THEN 'JR'
WHEN year = 'SR' THEN 'SR'
ELSE 'No Year Data' END AS year_group,
COUNT(1) AS count
FROM benn.college_football_players
GROUP BY 1
Window functions:
https://www.compose.com/articles/metrics-maven-window-functions-
in-postgresql/
Provide the ability to perform computations across sets of rows that
are related to the current query row, like GROUP BY, but without having
to collapse the set of rows into a single output row and with the ability
to look at arbitrary overlapping windows of rows (vs. exclusive
"groups").
E.g., this gives us the total national population alongside each state's
population:
SELECT name AS state_name,
popestimate2015 AS state_population,
SUM(popestimate2015) OVER() AS national_population
no partition by so this is over all the data
FROM population
WHERE state > 0 -- only state-level rows
ORDER BY name
Without window function, would have to either get the national
population by itself (without the state-level information) or use a
subquery:
SELECT name AS state_name,
popestimate2015 AS state_population,
x.national_population
FROM population,
(
SELECT SUM(popestimate2015) AS national_population
FROM population
)x
ORDER BY name
Can apply conditions with PARTITION BY (a la GROUP BY) and ORDER
BY:
PARTITION BY specifies the subset of rows considered (in the
window).
ORDER BY orders the rows in the specified window frame, useful
for RANK() and running sum-type information.
ORDER BY can act as an implicit partition:
Query:
SELECT start_time, duration_seconds,
SUM(duration_seconds) OVER (ORDER BY
start_time) AS running_total
FROM tutorial.dc_bikeshare_q1_2012
Results:
start_time | duration_seconds |
running_total
00:04:00 475 475
00:10:00 1162 2782
00:10:00 1145 2782
00:15:00 485 3738
00:15:00 471 3738
00:17:00 358 4096
00:18:00 1754 5850
Also note that the ordering "goes up to" the current query
row based on the column being ordered by; that is, the sum
is the sum of the current row plus the previous rows (hence
running total).
Evaluated after the join, group, and having clauses, at the same time as
other select statements—meaning window functions can't refer to
other fields in the select statement.
Running average of daily active users example:
Daily active users (DAU):
select date(created_at), count(distinct user_id)
from activity
group by 1
Lots of noise, so let's do a daily running average:
select
date(created_at) d,
avg(count(distinct user_id)) over (
order by d
rows between 7 preceding and current row
)
from activity
group by 1
Specific unique window functions:
RANK vs DENSE_RANK vs ROW_NUMBER:
| V | ROW_NUM | RANK | DENSE_RANK |
|---|----------------|---------|-------------|
|a| 1 | 1 | 1 |
|a| 2 | 1 | 1 |
|a| 3 | 1 | 1 |
|b| 4 | 4 | 2 |
|c| 5 | 5 | 3 |
|c| 6 | 5 | 3 |
|d| 7 | 7 | 4 |
|e| 8 | 8 | 5 |
NTILE:
You can use window functions to identify what percentile (or
quartile, or any other subdivision) a given row falls into. The
syntax is NTILE(*# of buckets*). In this case, ORDER BY determines
which column to use to determine the quartiles (or whatever
number of ‘tiles you specify).
LAG / LEAD:
It can often be useful to compare rows to preceding or following
rows, especially if you’ve got the data in an order that makes
sense. You can use LAG or LEAD to create columns that pull
values from other rows—all you need to do is enter which column
to pull from and how many rows away you’d like to do the pull.
LAG pulls from previous rows and LEAD pulls from following rows:
Using count and case:
SELECT
COUNT(CASE WHEN rsp_ind = 0 then 1 ELSE NULL END) as
"New",
COUNT(CASE WHEN rsp_ind = 1 then 1 ELSE NULL END) as

"Accepted"
from tb_a
Unions vs. joins:
SQL joins allow you to combine two datasets side-by-side, but UNION
allows you to stack one dataset on top of the other. Put differently,
UNION allows you to write two separate SELECT statements, and to
have the results of one statement display in the same table as the
results from the other statement.
SELECT *
FROM tutorial.crunchbase_investments_part1
UNION
SELECT *
FROM tutorial.crunchbase_investments_part2
If every record in A necessarily includes information in B, A should have a
foreign key that references a primary key on B to maintain referential
integrity. E.g., if a row in A is inserted but doesn't have the B-related columns
(the FK), it should be rejected.
There can be multiple unique keys, but just one primary key. Unique
keys are nullable, primary keys aren't.
In a 1:many relationship, the foreign key is the "anchor" on the many side
(the primary key that it references is the "anchor" on the 1 side).
1:many = PK:FK (multiple records on FK side have PK values).
1:1 relationships are rare and should usually just merge the tables
together. But if not, could do a primary:unique foreign key setup.
1:1 = PK:FK (one record on FK side has PK values).
many:many relationships should call for a 3rd (junction) table

containing the primary keys of the two tables.
With junction table, 1:many and many:1.
Normalized (no dupes) schema is better for writes; denormalized better for
reads.
LIMIT:
SELECT * FROM tbl LIMIT 5,10;
returns rows 6-15
The offset (the first number) is 0-indexed.
Get second highest value:

SELECT MAX( col )
FROM table
WHERE col < ( SELECT MAX( col ) FROM table );
Delete duplicate rows:
Delete duplicate rows:
Where cols 1-3 unique the table.
DELETE FROM MyTable

WHERE RowId NOT IN
(SELECT MIN(RowId) FROM MyTable GROUP BY Col1, Col2, Col3);
Order of operations:
FROM clause
WHERE clause
This is why you can't reference a SELECT alias in the WHERE clause.
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
You can reference a SELECT alias in the ORDER BY.
In JOIN, can filter one table but not the other:

https://community.modeanalytics.com/sql/tutorial/sql-joins-where-vs-on/
Putting the condition in the WHERE clause would filter it out
completely. Putting the condition in the JOIN just removes it from the
specified table.
VARCHAR vs. CHAR:
VARCHAR is variable-length.
CHAR is fixed length.
If your content is a fixed size, you'll get better performance with CHAR.
MySQL single quotes vs. backticks vs. double quotes:
http://stackoverflow.com/questions/11321491/when-to-use-single-
quotes-double-quotes-and-backticks-in-mysql
SQL Cookbook:
emp:
create table emp (
empno int unsigned,
ename varchar(128),
job varchar(128),
mgr int unsigned,
hiredate date,
sal int unsigned,
comm int unsigned,
deptno int unsigned
);
insert into emp (empno, ename, job, mgr, hiredate, sal, comm,
deptno) values
(7369, 'smith', 'clerk', 7902, '1980-12-17', 800, null, 20),
(7499, 'allen', 'salesman', 7698, '1981-02-20', 1600, 300, 30),
(7369, 'smith', 'clerk', 7902, '1980-12-17', 800, null, 20),
MapReduce:
purpose:
scalable, parallelizable computations on big data
abstracts away looping
simple to write programs, just need to supply mapper and reducer
functions
algorithm:
take a bunch (a list) of data items
system/master program splits list and gives a sublist to each (mapper)
node
mapper takes its list and applies some map function to it that
transforms the list of items into a list of key, value pairs
list of (k,v) pairs output to some buffer
system takes the lists of (k,v) pairs and filters them such that it can give
each reducer node a list where all the keys are the same
reducer aggregates values (since system ensured they all share the
same key)
example:
operates on a file in HDFS in a similar way. in HDFS, a file is split into chunks,
or blocks. many mapper nodes can work on different blocks of the same file.
then, reducer nodes can do the same. scales essentially linearly
Tez is Apache's MapReduce successor:
better performance
intermediate data doesn't need to be written to HDFS (can be
cached in-memory)
YARN containers can be reused
simpler "API"
computations expressed as workflow DAG vs multiple MR jobs
supports Hive, Pig, etc.
Hive:
Performance tips:
https://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/
use the Tez execution engine (vs Map-Reduce)

set hive.execution.engine=tez;
use ORCFile storage format (better than Parquet or RCFile)

https://www.dropbox.com/s/irhhjl6ivgiak0z/Screenshot%202017-04-
22%2017.52.50.png?dl=0
create table driver_mileage

stored as orc
as
select driverid, sum(miles) as total_miles
from truck_mileage
group by driverid;
use vectorization (performs operations in batches of 1024 rows)
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
use cost-based optimization (CBO)
Meta:
Hive optimizes each query’s logical and physical execution
plan before submitting for final execution. These
optimizations can include the cost of the query if CBO is
enabled.
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
then:
analyze table tweets compute statistics for columns;
avoid joins where possible
Consider a click-stream event table:
CREATE TABLE clicks (
timestamp date, sessionID string, url string, source_ip string
) STORED as ORC tblproperties (“orc.compress” =
“SNAPPY”);
Each record represents a click event, and we would like to find the
latest URL for each sessionID.
Could use a join:
SELECT clicks.* FROM clicks inner join
(select sessionID, max(timestamp) as max_ts from clicks
group by sessionID) latest
ON clicks.sessionID = latest.sessionID and
clicks.timestamp = latest.max_ts;
In the above query, we build a sub-query to collect the timestamp

of the latest event in each session, and then use an inner join to
filter out the rest.
Can use over and rank to rewrite more efficiently as:
SELECT * FROM
(SELECT *, RANK() over (partition by sessionID,
order by timestamp desc) as rank
FROM clicks) ranked_clicks
WHERE ranked_clicks.rank=1;
In Hive, columns can store arrays or maps. We can use what are called table-
generating functions to "unpack" these multivalued columns.
Example built-in table-generating functions include explode and stack:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+U
DF#LanguageManualUDF-Built-inTable-GeneratingFunctions(UDTF)
Often used in conjunction with lateral view as a type of join:
"Lateral view is used in conjunction with user-defined table
generating functions such as explode(). As mentioned in Built-in
Table-Generating Functions, a UDTF generates zero or more
output rows for each input row. A lateral view first applies the
UDTF to each row of base table and then joins resulting output
rows to the input rows to form a virtual table having the supplied
table alias."
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Lateral
View
Lateral view is basically a lateral.. table view, made by expanding /

unpacking a column or columns into multiple rows.
http://www.ericlin.me/how-to-use-hive-lateral-view-in-your-query
Imagine we start out with an answer table like:
qId cId vId
1 1 v1
1 2 v1
1 2 v2
1 2 v3
We can combine vld values like so:
qId cId vIds
1 1 “v1”
1 2 “v1, v2”, “v3”
And then selectively unpack (with a "where" filter):
SELECT qId, cId, vId
FROM answer
LATERAL VIEW explode(vIds) visitor AS vId
WHERE cId = 2
“vIds” is the field we're exploding, “visitor” is the LATERAL VIEW
TABLE alias and “vId” is the new column alias so that it can be
selected in the query.
To get:
qId cId vId
1 2 v1
1 2 v2
1 2 v3
Spark:
Spark is in-memory, more functional since it uses RDDs that distinguish
between transformations and actions (enabling lazy execution/evaluation),
and has playback abilities (owing to the lazy evaluation).
In-memory as in it doesn't write intermediate results to disk/HDFS like map-
reduce does.
Dimensional Modeling:
Facts vs. dimensions:
Facts are (almost always) numeric measurements / low-level metrics we
use for calculation purposes
Represent a granular measurement event
Dimensions are the (usually non-numeric) attributes we use to filter or
group by
The "by" nouns, e.g., "revenue by product", so product is a
dimension
Facts = business process metrics; dimensions = descriptive attributes
A fact table has foreign keys to dimension tables
In a star schema, key assumption is that the fact::dimension relationship is
many::1, e.g., many transactions can have a shared product dimension
If many::many, then aggregations will be wonky (think about the
cardinalities in the join: 1 * N = N, but M * N = MN)
Generally, need to represent many::many relationships as two many::1 and
1::many relationships using a bridge/junction table:
https://www.dropbox.com/s/kra73kh4l9tm468/Screenshot%202017-03-
04%2016.24.31.png?dl=0
"Think about a simple relationship like the one between Authors and
Books. An author can write many books. A book could have many
authors. Now, without a bridge table to resolve the many-to-many
relationship, what would the alternative be? You'd have to add multiple
Author_ID columns to the Books table, one for each author. But how
many do you add? 2? 3? 10? However many you choose, you'll
probably end up with a lot of sparse rows where many of the Author_ID
values are NULL and there's a good chance that you'll run across a case
where you need "just one more." So then you're either constantly
modifying the schema to try to accommodate or you're imposing some
artificial restriction ("no book can have more than 3 authors") to force
things to fit."
In other words, with a many::many relationship, there's no meaningful
parent-child relationship, no way to place foreign keys that makes
sense. So with a bridge/junction/linking table, each record of the
bridge table represents each unique combination of, say, author and
book. There can be many authors and many books, but each
combination only shows up once in the bridge.
select *
from author a join author_book ab on a.id = ab.author_id
join book b on b.id = ab.book_id;
Database modeling notation:
https://www.dropbox.com/s/dsld9gs3frocxxi/Screenshot%202017-03-
04%2016.34.02.png?dl=0
https://www.dropbox.com/s/24l3nwsogzxgxhm/Screenshot%202017-
03-04%2016.33.27.png?dl=0
https://www.smartdraw.com/entity-relationship-diagram/
Snowflake vs. star schema:

Star schema has a fact table and then dimension tables directly
associated with it through foreign keys. Denormalized for BI query
performance. Simpler queries as well.
Looks like, well, a star: with the dimension tables surrounding a central fact table.
Snowflake schema normalizes the dimension tables, e.g., nests

categorical / hierarchical data. Less duplication and more referential
integrity protection. Also uses less space.
The dimension tables branching out is reminiscent of a snowflake.
http://www.vertabelo.com/blog/technical-articles/data-warehouse-
modeling-star-schema-vs-snowflake-schema
https://www.dropbox.com/s/l680pyvr8gxn9ue/Screenshot%20201
7-03-04%2016.55.32.png?dl=0
https://www.dropbox.com/s/573602bdme4vex2/Screenshot%202
017-03-04%2016.55.57.png?dl=0
Dealing with Slowly Changing Dimensions (SCDs):
https://www.dropbox.com/s/xfvf0556idivn75/Screenshot%202017-03-
05%2011.59.38.png?dl=0
5 methods / types of change tracking:
Type 0 - The passive method (just keep original value))

Type 1 - Overwriting the old value
Type 2 - Creating a new additional record (use date effective
columns)
Type 3 - Adding a new column
Type 4 - Using historical table (or split off the rapidly changing
dimension columns into a mini dimension table, the primary key
of which will be a FK in the fact table)
Type 6 - Combine approaches of types 1,2,3 (overwrite current
value columns in historical records, new current row (current
value col = historical value col), there's a current flag column
and/or effective date columns)
6=1+2+3
Conformed dimensions = master dimensions that can be used across

multiple fact tables (meaning is identical, think date dimension)
Representing inheritance / subtype relationships (think FB actions:
comments, likes, shares (of, say, a news story)):
http://stackoverflow.com/questions/3579079/how-can-you-represent-
inheritance-in-a-database
3 methods:
1 table = just 1 base / parent table, lots of null values for the attributes
that only apply to some subtypes
3 tables (assuming 3 subtypes) = 1 table for each subtype / child table,
hard to get the set of all elements across all subtypes
4 tables (assuming 3 subtypes) = 1 base table and 3 derived subtype
tables, base table contains common attributes and child tables have an
FK and PK that is the base table's PK
Base table passes down common attributes and PK to subtype child tables.
http://www.vertabelo.com/blog/technical-articles/inheritance-in-a-
relational-database
http://searchsqlserver.techtarget.com/feature/Supertype-and-subtype-
tables-in-SQL-Server
Conditional join on column value:
http://stackoverflow.com/questions/25255240/sql-conditional-join-
column
http://www.mysqldiary.com/conditional-joins-in-mysql/
basically, do left joins then combine results into 1 column to get rid of
nulls
Freeform text comments should arguably be in a dimension table
Modeling groups (e g FB users and groups of users):
Modeling groups (e.g., FB users and groups of users):
just model a many::many relationship using a bridge table like

UserGroup
then User::UserGroup would be 1::Many (UserGroup FK to User PK) and
Group::UserGroup would also be 1::Many (UserGroup FK to Group PK)
Clickstream data dimensional modeling:
Unique extra dimensions: visitor/user, page, event/action, session,
referral/origin
https://www.slideshare.net/mobile/alberthui/albert-hui-
clickstreamdwturningclicksintocustomers-19174248
ETL optimization:
Extract operational data from business sources, transform into required end format (e.g., facts
& dimensions), load into data warehouse or mart.
Incremental loading:
https://dwbi.org/etl/etl/53-methods-of-incremental-loading-in-data-
warehouse
System Design & Distributed Systems:
HiredInTech:
https://www.hiredintech.com/classrooms/system-design/lesson/59#
A strong process is crucial to successfully solving system design questions.
We broke it down into four steps:
Scope the problem: Don't make assumptions; Ask questions;
Understand the constraints and use cases.
Sketch up an abstract design that illustrates the basic components of
the system and the relationships between them.
Think about the bottlenecks these components face when the system
scales.
Address these bottlenecks by using the fundamentals principles of
scalable system design.
Make sure to start with asking questions, what constraints are we under?
Typical access patterns? How much data? How many requests? Etc.
(Database) Sharding:
Just partition data by hashing the primary key.
https://www.hiredintech.com/classrooms/system-design/lesson/60
http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-
database-design-the-coming-of-the.html
http://www.addsimplicity.com/adding_simplicity_an_engi/2008/08/shard-
lessons.html
https://www.percona.com/blog/2009/08/06/why-you-dont-want-to-shard/
To fix the problem of hotspots (incoming data not evenly distributed), use a
good hash function. To fix the problem of node failures or of being able to
scale up (with naive hashing, would be forced to migrate tons of data /
nearly all data to new nodes), use consistent (ring) hashing:
https://www.paperplanes.de/2011/12/9/the-magic-of-consistent-
hashing.html
http://michaelnielsen.org/blog/consistent-hashing/
"But unlike naive hashing, consistent hashing requires only a
relatively small amount of data to be moved: if you add a
machine to the cluster, only the data that needs to live on that
machine is moved there; all the other data stays where it is."
https://akshatm.svbtle.com/consistent-hash-rings-theory-and-
implementation
Could also use "dynamic" sharding where there's a specialized metadata
locator service that points an incoming query to the right partition (basically
just key:value store of key range:partition). Think, e.g., HDFS' NameNode.
Easy to modify mappings, but single point of failure.
partioning is vertical; sharding horizontal on some attribute of the data, eg
for users, sharded into a-m and n-z datasets; federation is splitting the data
completely on some obvious data attribute, like putting all user data on one
node and all purchases data on another
CAP:
A dismal guide to concurrency:
https://www.facebook.com/notes/facebook-engineering/a-dismal-guide-to-
concurrency/379717628919/
"A Consistent/Available system means that reading and writing always

works the way you expect, but requires a majority or quorum of nodes
to be running in order to function. Think of a parliament that must
have more than half of members present in order to hold a vote. If too
many can't make it, say because a flood washes out the bridge, a
quorum can't be formed and business can't proceed. But when enough
members are in communication the decision-making process is fast
and unambiguous.
Impossible in larger-scale distributed systems.
Consistent/Partitionable means that the system can recover from

failures, but requires so much extra coordination that it collapses under
heavy use. Imagine having to send and receive a status report for every
decision made at your company. You'll always be current, and when
you come back from vacation you will never miss a thing, but making
actual progress would be very slow.
Available/Partitionable means that you can always read and write
values, but the values you read might be out of date. A classic example
is gossip: at any point you might not know the latest on what Judy said
to Bill but eventually word gets around When you have new gossip to
to Bill but eventually word gets around. When you have new gossip to
share you only have to tell one or two people and trust that in time it
will reach everyone who cares. Spreading gossip among computers is a
bit more reliable because they are endlessly patient and (usually) don't
garble messages."
Eventually Consistent — Amazon.com paper
http://queue.acm.org/detail.cfm?id=1466448
"Eventual consistency. This is a specific form of weak consistency; the

storage system guarantees that if no new updates are made to the
object, eventually all accesses will return the last updated value. If no
failures occur, the maximum size of the inconsistency window can be
determined based on factors such as communication delays, the load
on the system, and the number of replicas involved in the replication
scheme. The most popular system that implements eventual
consistency is DNS (Domain Name System). Updates to a name are
distributed according to a configured pattern and in combination with
time-controlled caches; eventually, all clients will see the update.
The eventual consistency model has a number of variations that are
important to consider:
Causal consistency. If process A has communicated to process B
that it has updated a data item, a subsequent access by process B
will return the updated value, and a write is guaranteed to
supersede the earlier write. Access by process C that has no
causal relationship to process A is subject to the normal eventual
consistency rules.
Read-your-writes consistency. This is an important model where
process A, after it has updated a data item, always accesses the
updated value and will never see an older value. This is a special
case of the causal consistency model.
Session consistency. This is a practical version of the previous
model, where a process accesses the storage system in the
context of a session. As long as the session exists, the system
guarantees read-your-writes consistency. If the session terminates
because of a certain failure scenario, a new session needs to be
created and the guarantees do not overlap the sessions.
Monotonic read consistency. If a process has seen a particular
value for the object, any subsequent accesses will never return
any previous values.
Monotonic write consistency. In this case the system guarantees
to serialize the writes by the same process. Systems that do not
guarantee this level of consistency are notoriously hard to
program."
ACID vs BASE:
ACID = atomicity, consistency (not the same as CAP consistency), isolation,

durability
BASE = basically available, soft state, eventually consistent
Basically available = the system does guarantee availability, in terms of
the CAP theorem.
Soft state = the state of the system may change over time, even
without input (because of eventual consistency)
Eventual consistency = the system will become consistent over time,
given that the system doesn't receive input during that time
Two ends of the CAP continuum: ACID focuses on consistency and BASE
focuses on availability and partition tolerance—often relaxes consistency
guarantees to e.g. eventual consistency. Maps pretty well to SQL vs NoSQL
and to why the latter usually scales better.
Asynchronous vs synchronous:
sync means that processes must wait for each other to complete before
beginning execution
async means that other stuff can happen while a process is executing
the meaning of synchronous/asynchronous is in the context of a global
clock: operations are synchronous when they use the same global clock/lock,
and asynchronous when they operate on their own clocks (i.e., so they don't
have to wait for each other)
http://stackoverflow.com/questions/748175/asynchronous-vs-synchronous-
execution-what-does-it-really-mean
Case studies:
Pastebin or bit.ly:
https://github.com/donnemartin/system-design-
primer/blob/master/solutions/system_design/pastebin/README.md
http://blog.gainlo.co/index.php/2016/03/08/system-design-interview-
question-create-tinyurl-system/
Paxos and other consensus methods:
https://www.quora.com/Distributed-Systems-What-is-a-simple-explanation-
of-the-Paxos-algorithm
Numbers every programmer should know about latency, memory, etc.:
Numbers (
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 100 ns
Main memory reference 100 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 10,000 ns

Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from network 10,000,000 ns
Read 1 MB sequentially from disk 30,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns
Note that:
1 ns = 10^-9 seconds
Nanosecond.
1 us = 10^-6 seconds = 1,000 ns

Microsecond.
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Millisecond.
Things to notice:
Notice the magnitude differences in the performance of different
options.
Datacenters are far away so it takes a long time to send anything
between them.
Memory is fast and disks are slow.
By using a cheap compression algorithm a lot (by a factor of 2) of
network bandwidth can be saved.
Writes are 40 times more expensive than reads.
Global shared data is expensive. This is a fundamental limitation of
distributed systems. The lock contention in shared heavily written
objects kills performance as transactions become serialized and slow.
Architect for scaling writes.
Optimize for low write contention.
Optimize wide. Make writes as parallel as you can.
https://gist.github.com/jboner/2841832
http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-
envelope-calculations-to-choo.html
https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html
http://highscalability.com/blog/2013/1/15/more-numbers-every-awesome-
programmer-must-know.html
Designing Data-Intensive Applications notes:
logs (not narrow concept of application logs, but any immutable append-
only key:value sequence structure) are a crucial abstraction for distributed
only key:value sequence structure) are a crucial abstraction for distributed
systems:
https://engineering.linkedin.com/distributed-systems/log-what-every-
software-engineer-should-know-about-real-time-datas-unifying
LSM-trees (log-structured merge trees) and SSTables (sorted string tables)
are an important under-the-hood concept:
write to memtable (in-memory [ordered] balanced tree structure like a
red-black or AVL tree)
when full, write to disk as an SSTable file/segment, becomes most
recent segment of DB; while being written to disk, writes continue on
new memtable instance
an SSTable is just sorted key:value pairs, but with some offsets
stored in an index file. so when searching, can check against index
to narrow down block that I have to look through
to serve reads, first try to find key in memtable, then in most recent on-
disk SSTable segment, then next most recent. can use a bloom filter to
avoid having to check all segments for nonexistent key
to avoid losing recent writes to memtable in event of DB crash,
separate log on disk where every write is immediately appended
(append-only immutable log etc.); not in order bc only purpose is
to restore memtable. each time successfully written to disk,
discard it
in background, from time to time, run merging and compaction
process to combine segments and discard overwritten/deleted values
(think tombstone for deletions)
Links:
General:
https://www.palantir.com/2011/10/how-to-ace-a-systems-design-
interview/
https://www.youtube.com/watch?v=-W9F__D3oY4
Focusing on scalability.
https://github.com/donnemartin/system-design-primer
Concurrency:
https://www.facebook.com/notes/facebook-engineering/a-dismal-
guide-to-concurrency/379717628919/
http://cs.lmu.edu/~ray/notes/introconcurrency/
Machine Learning & Stats:
Algorithms:
Meta:
cheatsheets:
https://machinelearningmastery.com/a-tour-of-machine-learning-
algorithms/
sci-kit learn flowchart: http://scikit-
learn.org/stable/tutorial/machine_learning_map/index.html
SAS more readable / general:
https://blogs.sas.com/content/subconsciousmusings/2017/04/12/
machine-learning-algorithm-use/
multi-class classification:
just use one-vs-all (aka one-vs-rest)
train a model for each class where the negative labels are all the
other classes
regression:
linear vs logistic: linear is continuous, logistic is categorical
finding coefficients for standard linear equation: y = a + bx
for logistic, we use logistic function (s-shaped curve) to convert output
into 0 - 1 value, then we can apply a rule to get a binary label
like >0.5 then yes else no
decision trees:
start with most important feature, split based on that, then continue
splitting train set on successive features (where each split has the same
value for that feature) until get to leaf nodes (which are the output,
e.g., the yes/no decision)
just like any binary tree
https://medium.com/@chiragsehra42/decision-trees-explained-easily-
28f23241248
prone to overfitting (high variance), so some ways to account for that
are:
bagging:
similar to bootstrapping, construct multiple tree models for
different samples of input, then for each new data point,
average all the predictions
random forests:
instead of selecting optimal split (i.e., instead of selecting
splits that best result in even "sides"), introduce randomness
boosting (think gradient boosted decision trees / GBDT):
combine weak learners
each successive learner attempts to correct for the errors of
the previous one
the previous one
place more weight on data points where the prev learner

went wrong
collaborative filtering:
based on assumption that people like things similar to other things
they like, and things that are liked by other people with similar taste
types of filtering:
A user-item filtering takes a particular user, finds users who are
similar to that user based on similarity of ratings, and recommend
items that those similar users liked. In contrast, item-item filtering
will take an item, find users who liked that item, and find other
items that those users or similar users also liked. It takes items
and outputs other items as recommendations.
https://towardsdatascience.com/various-implementations-of-collaborative-
filtering-100385c6dfe0
User-Item Collaborative Filtering: “Users who are similar to

you also liked …”
Item-Item Collaborative Filtering: “Users who liked this item
also liked …”
SVM or support vector machines:
basically trying to find the hyperplane (non-linear line) that best
separates the input into separate classes. do this by maximizing the
margin, i.e., the distance from the hyperplace to the closest data points
(the support vectors)
k-means classification:
not to be confused w k nearest neighbors, which is quite different
https://www.wikiwand.com/en/K-nearest_neighbors_algorithm
algorithm:
iterative, start with k randomly assigned centroid points, then
iterate by:
for each point, assign to the closest centroid
for each centroid, update/move to the middle (mean) of the
points assigned to it
stop iterating when no more point assignments or centroid
movements need to be made
Feature engineering:
cross-validation
1-hot encoding
Performance metrics / evaluation:
confusion matrix:
actual vs predicted
https://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix_f
iles/confusion_matrix_1.png
Predicted
P N
P: TP FN
N: FP TN
recall or sensitivity = TP / (TP + FN)
or true positives
of all actual positive labels, how many did we successfully predict?

precision / true positive rate = TP / (TP + FP)
of all predicted positive labels, how many were actually positive?
false positive rate = FP / (FP + TN)
specificity or true negatives = TN / (TN + FP)
F1 is the harmonic mean of precision and recall
2TP / (2TP + FP + FN)
ROC curve is true positives (recall or sensitivity) vs false positives rate
we want the curve to be at the top left—this would indicate 0 FPs
sensitivity as a function of fallout (another name for FPR)
AUC (area under [ROC] curve):
https://medium.com/greyatom/lets-learn-about-auc-roc-curve-
4a94b4d88152
ROC curve is graph of recall/sensitivity vs [1- ]specificity, which is the
false positive rate
in other words, graph of true positives vs false positives
so area under the curve measures how well the positive probabilities
are separated from negative
intuitively, measures the probability that the classifier will rank a
randomly chosen positive example higher than a randomly chosen
negative example
0.5 = random predictor
better for skewed or imbalanced classes, where it will reward
discriminative classifiers over representative (eg punishes a model that
only outputs ham for problem where 90% is ham)
also it's a measure of the entire classifier and not dependent on the
specific threshold you choose
Learning to rank:
used for ranking a list of items, eg search engine results or FB newsfeed
differs from traditional ML classification/regression bc prediction is on more

than a single instance at a time:
https://www.quora.com/What-is-the-intuitive-explanation-of-Learning-to-Rank-and-
algorithms-like-RankNet-LambdaRank-and-LambdaMART-In-what-types-of-data-
variables-can-these-techniques-be-used-What-are-their-strengths-and-
limitations/answer/Nikhil-Dandekar
Traditional ML solves a prediction problem (classification or regression)

on a single instance at a time. E.g. if you are doing spam detection on
email, you will look at all the features associated with that email and
classify it as spam or not. The aim of traditional ML is to come up with
a class (spam or no-spam) or a single numerical score for that instance.
LTR solves a ranking problem on a list of items. The aim of LTR is to
come up with optimal ordering of those items. As such, LTR doesn't
care much about the exact score that each item gets, but cares more
about the relative ordering among all the items.
3 main approaches: pointwise, pairwise, and listwise
https://medium.com/@nikhilbd/pointwise-vs-pairwise-vs-listwise-
learning-to-rank-80a8fe8fadfd
RankNet, LambdaRank and LambdaMART all transform ranking into
a pairwise classification or regression problem. That means you look at
pairs of items at a time, come up with the optimal ordering for that pair
of items, and then use it to come up with the final ranking for all the
results.
common loss function is minimizing # of inversions: cases where a
pair's order should be inverted
Normalized Discounted Cumulative Gain (NDCG) is a measure of ranking
quality:
https://www.wikiwand.com/en/Discounted_cumulative_gain
emphasizes highly relevant documents appearing earlier in list
Optimization:
Gradient descent:
https://machinelearningmastery.com/gentle-introduction-mini-batch-
gradient-descent-configure-batch-size/
Ranking ML reading materials:
Literature
Machine Learning Open Course/Reading Materials
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?
course=MachineLearning
http://web.stanford.edu/class/cs246/handouts.html
http://cs229.stanford.edu/materials.html
Andrew Ng lectures - https://www.youtube.com/playlist?

list=PLA89DCFA6ADACE599
http://ocw.mit.edu/courses/electrical-engineering-and-computer-
science/6-867-machine-learning-fall-2006/index.htm
https://work.caltech.edu/lectures.html#lectures
Survey
Recommender systems survey
http://www.sciencedirect.com/science/article/pii/S0950705113001044
Towards the Next Generation of Recommender Systems: A Survey of
the State-of-the-Art and Possible Extensions
http://www.stanford.edu/class/ee378b/papers/adomavicius-recsys.pdf
A survey of collaborative filtering techniques
http://dl.acm.org/citation.cfm?id=1722966
The YouTube video recommendation system
Sparse Models
http://static.googleusercontent.com/media/research.google.com/en/us
/pubs/archive/41159.pdf
http://soihub.org/group/26/hawaii-2-linkedin.pptx
http://www.ideal.ece.utexas.edu/seminar/Deepak_slides.pdf
Learning to Rank
Learning to rank using gradient descent http://dl.acm.org/citation.cfm?
id=1102363
Learning to Rank for Information Retrieval
http://research.microsoft.com/en-us/um/beijing/events/lr4ir-
2008/proceedings-lr4ir%202008.pdf
Learning to rank: from pairwise approach to listwise approach
CLiMF: learning to maximize reciprocal rank with collaborative less-is-
more filtering
Learning to rank recommendations with the k-order statistic loss
Latent Collaborative Retrieval
Efficient Retrieval
Fast top-k retrieval for model based recommendation
Item-based top-n recommendation algorithms
Exploration/Exploitation:
A contextual-bandit approach to personalized news article
recommendation
Hi hi l
l ti f l ti t t lb dit 108/112
Hierarchical exploration for accelerating contextual bandits
An Empirical Evaluation of Thompson Sampling

Document/Language Representation
Word2Vec
https://code.google.com/p/word2vec/
Efficient Estimation of Word Representations in Vector Space
Distributed Representations of Words and Phrases and their
Compositionality
Linguistic Regularities in Continuous Space Word Representations
Collaborative topic modeling for recommending scientific articles
Representing Documents Through Their Readers
TagSpace (EMNLP 2014)
Distributed Representations of Sentences and Documents (ICML 2014)
Large-Scale High-Precision Topic Modeling on Twitter (KDD 2014)
Collaborative Filtering/Matrix Factorization:
Matrix factorization techniques for recommender systems
A matrix factorization technique with trust propagation for
recommendation in social networks
Probabilistic matrix factorization
Pairwise interaction tensor factorization for personalized tag
recommendation
fLDA: matrix factorization through latent dirichlet allocation
Factorization Meets the Neighborhood: a Multifaceted Collaborative
Filtering Model
Nonlinear Latent Factorization by Embedding Multiple User Interests
(RecSys 2013)
Scalable Recommendation with Poisson Factorization
Factorization Machines
Latent Factor Models
Regression-based latent factor models
Latent factor models with additive and hierarchically-smoothed user
preferences
Latent Structured Ranking
User Modeling
Learning Relevance from a Heterogeneous Social Network and Its
Application in Online Targeting (Rajat Raina's work on our data)
Understanding the Interaction between Interests, Conversations and
Friendships in Facebook
Turning down the noise in the blogosphere
Turning down the noise in the blogosphere
Style in the Long Tail: Discovering Unique Interests with Latent Variable
Models in Large Scale Social E-commerce (KDD 2014)
Evaluation:
Large-scale validation and analysis of interleaved search evaluation
Deep Learning
Learning Deep Architectures for AI
Wide & Deep Learning for Recommender Systems
Deep Neural Networks for YouTube Recommendations
General Machine Learning
Learning Classifiers from Only Positive and Unlabeled Data (KDD 2008,
useful for lookalike)
Behavioral:
Situation -> Task -> Action -> Result
context -> challenge -> action / solution -> results
Haseeb:
http://haseebq.com/how-to-break-into-tech-job-hunting-and-interviews/
Almost every question you’ll be asked will be a permutation of one of these

four:
What’s your story / walk me through your resume / why’d you leave
your last job? (these are essentially the same question)
Why us?
Tell me about a challenging bug you faced and how you solved it.
Tell me about an interesting project you worked on.
The first question is particularly important. Essentially, they want to hear your
personal narrative. Your answer will strongly influence their perception of
you.
This really just comes down to storytelling. Consider yourself a character in a
story, and structure the story with a beginning, middle, and end. There
should be inflection points, characterization, and easy to understand
motivations. Keep it as short as possible, while preserving color and what
makes you interesting. Try not to be negative. Frame your story around
seeking challenge and wanting to better yourself, rather than rejecting or
disliking things.
You will tell this narrative again and again. If you interview enough,
eventually it will congeal into a script. The best way to improve at it is to
literally practice it out loud and listen to a recording of it. Also try to get
feedback from someone whose judgment you trust, and ask them to be as
ruthlessly critical as possible.
For the remaining three questions, you should have have pre-crafted
answers. If you’re at a loss for stories, it may help to sit down and just
brainstorm every relevant story you can remember (for example, every bug
you remember working on), and then narrow it down from a larger list.
It’s hard to practice this effectively in a vacuum, so this is a good thing to
work with someone else on.
Interview Cake:
https://www.interviewcake.com/coding-interview-tips#chitchat
Chitchat like a pro.

Before diving into code, most interviewers like to chitchat about your
background. They're looking for:
Metacognition about coding. Do you think about how to code well?
Ownership/leadership. Do you see your work through to completion?
Do you fix things that aren't quite right, even if you don't have to?
Communication. Would chatting with you about a technical problem
be useful or painful?
You should have at least one:
example of an interesting technical problem you solved
example of an interpersonal conflict you overcame
example of leadership or ownership
story about what you should have done differently in a past project
piece of trivia about your favorite language, and something you do and
don't like about said language
question about the company's product/business
question about the company's engineering strategy (testing, Scrum,
etc)
Nerd out about stuff. Show you're proud of what you've done, you're amped
about what they're doing, and you have opinions about languages and
workflows.
Questions to ask:
https://www.reddit.com/r/cscareerquestions/comments/4ce2s3/resource_int
erview_questions_my_massive/
http://treycausey.com/data_science_interviews.html:
What does success look like for this position? How will I know if I am
accomplishing what is expected of me?
What is the last project you shipped? What was the goal, how long did
it take, what were the stumbling blocks, what tools did you use, etc.
What will my first 90 days in this role look like? First 180 days?
Who will I report to and how many people report to that person? Do
they have regular 1:1 with their team members?
Why did the last person who quit this team leave? The company?
y p q p y
If a startup, how long is your runway? How are financial decisions

made?
What would be my first project here? Has someone already been
working on this or is this in the aspirational stage?
What is the current state of the data infrastructure? How much work
needs to be done on getting the infrastructure and pipeline into shape
before we start analyzing that data?
What's the biggest technical challenge or problem on this team currently?
What makes this team unique compared to other teams?
What do engineers on this team do day to day? How are projects to work on
decided? How closely do people work with other teams or functions (PM, DS,
etc.)?
What does the ramp-up process look like? Is there a formal mentoring
program?
Time and Space Complexity:
http://bigocheatsheet.com/
O(1) < O(log n) < O(n) < O(n log n) < O(n^2) < O(n^3) < O(2^n) < O(n!)
lg n usually means base 2 but doesn't matter because same sans constants / in simplified form.
Time complexity of recursive algorithms that make multiple calls per call is often
exponential, think of the fib algorithm: f(n) = f(n-1) + f(n-2).
This isn't just O(2n) since 2 calls are being made for each call, not 2 calls for
the whole thing. So it's O(2^n) instead.
On a similar note, even if a function doesn't take any extra space with local
variables, if it makes a recursive call, that state has to be added to the call stack.
(The caller function "pauses" and the callee is pushed to the call stack. When the
callee is done executing, it's popped and control is returned to the caller.)
If the recursion is tail recursive (last call of function is the recursive call),
however, then can be tail call optimized so that the last call acts as a GOTO
and there's no space overhead—i.e., don't need to push frame to call stack.
Space used for a recursive function is proportional to the maximum depth of its
recursion tree, in other words, the length of the longest path in the tree.
https://www.dropbox.com/s/x7emb4iy4f8vsz6/Screenshot%202017-03-11%2010.30.52.png?
dl=0
It's not the number of function calls since the call stack is popping function
stack frames as it finishes executing each.

Study Guide 1

Uploaded by

Copyright:

Available Formats

Study Guide 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Study Guide 1

Uploaded by

Copyright:

Available Formats

4/25/2021 Study guide: - WorkFlowy

Start with naive simplest brute force solution.

"Instead, we started off by coming up with a slow (but correct) brute

easier? If so, try to solve recursively and/or with dynamic programming.

lg(N) usually comes from iteratively cutting stuff in half, so we

Backtracking is a powerful technique. It allows us to try all possible

If signed integer, 4 methods; most common is 2's complement where negating a

int -- Implemented using C's long, giving at least 32 bits of precision.

Multiplication can be done with left bitshifts. So because 3 * 5 is equivalent

x >>> n = right-shifts x by n places where 0s are shifted into the leftmost

~ = flips bits (unary operator)

"The Hamming distance between two integers is the number of

slow = fast = head

def sortList(self, head):

return self.merge_sorted_lists(left, right)

def merge_sorted_lists(self, h1, h2):

Two stacks are Turing-complete.

The graph definition: a connected (directed in this context, but undirected in,

children at indices 2i and 2i +1

for (i = arr.length / 2; i >= 1; i--)

Guarantees searching, insertion, and deletion in O(log n).

In a 2-4 tree, there are 3 kinds of nodes:

1) every node is either red or black;

3) if a node is red, both its children are black; and

The balance property is guaranteed through the combination of

A tree rotation is a binary tree operation that changes the

More rigidly balanced than red-black trees, leading to slower insertion

Many important applications, e.g., systems of events or partial

Maximal such subgraph because otherwise there would be

Easy to compute connectivity: if the number of nodes visited through

Each vertex is associated with an unordered list that contains its

Slow in removing a vertex / edge, because must find all vertices /

Weights is a dict where each origin and dest-node is a tuple that

# initialize graph where each node is mapped to a set of its

A matrix where each non-diagonal entry Aij is the number of edges

Sometimes the matrix value is just a boolean, if there can be no parallel

becomes very similar to an adjacency list

Depth-first search (DFS):

If no neighbors, i.e., a dead end, backtrack (pop the stack) until

Adjacency list implementation:

Adjacency list implementation:

'F': set(['C', 'E'])}

In an unweighted graph or a graph in which all edge weights are 1, the

Fastest shortest-path tree / single-source shortest paths

The distance of node Y is the distance from the root

in the shortest-path tree. Its tentative distance value is

The edges in such a subgraph are (preds[v],v),

Dijkstra's is just A* where the heuristic / evaluation

Slower than Dijkstra's but does handle negative edge

Similar to Dijkstra's in that it depends on relaxation: each

"Simply put, the algorithm initializes the distance to

A minimum spanning tree is a spanning tree (a subgraph that is a tree

Simple greedy algorithm:

other words, we maintain on the priority queue

improvements: insertion sort for tiny arrays, median of 3, randomize

DP often involves making a binary decision at each (bottom-up) step, e.g., do I

We built that list bottom-up, but we also talked about how we could do it

Dynamic Programming overview:

# dummy nodes to simplify

# head -next-> node <-prev- tail

# def init(self, s=0, e=0):