Great Algorithms are poetry of computation.
Great Algorithms are poetry of computation.
Preface
Several months ago, I began to learn and review Algorithms in Coursera, and took some notes about it.
As time passes, I am aware that I had exactly walked a long way, and this “note” is not just a note about Coursera’s Algorithm — I dedicated almost my whole spare time to it, and it was out of bound as a simple note for the course. Thus I decide to build the collection of Algorithm, based on this “note” — that’s what it is right now, a collection of poetry, poetry of computation.
At this time, the collection still has many bugs, and it is obviously incomplete. I would still dedicate to polish it — that’s probably a long-term workload. And if you find any bug or have some great idea, take it easy to open the issue or pull request for source repository of this blog.
If you want to report/fix any issue about the collection or just skim the source code, please jump to the following repository:
Algorithms
I also want to concentrate on maintaining a repository just for pure algorithms ( Any programming language even pseudo-code is fine ), and as you see, it is still at an entirly initial stage. If you have any great idea, DM me in Twitter or just contribute in the following repository:
AlgorithmsCollections
Tips
If you wanna run programs in each part(folder), it is necessary to add -cp [classpath]
flag when compiling.
For example:
1 | java -cp [classpath] [filename].java |
Algorithms in Computer Science
Computer Graphics
APP | Solution | Type |
---|---|---|
Video Game Shoot Aid | Kd Tree | Symbol Table |
Event-driven Simulation | Priority Queue | Binary Heap |
Computer Network
APP | Solution | Type |
---|---|---|
Web Crawler | Breadth First Search | Directed Graph |
Routing[1] | Dijkstra Algorithm | Directed Weighted Graph |
Routing[2] | Bellman Ford | Directed Weighted Graph |
Choose IP in Routing Table | Longest prefix | Trie |
P2P network search | Patrica Trie | Trie |
Operating System
APP | Solution | Type |
---|---|---|
FileSystem | Bitmap | Symbol Table |
Scheduling | Topological Sort | DAG |
Search & Word Processing | Suffix Sort & Tries | String |
Stack Frame | Stack | Stack |
Page Table | R-Way Trie | Trie |
Programming Language (Compiler)
APP | Solution | Type |
---|---|---|
Mark-sweep Garbage Collection | Depth First Search | Digraph |
Cyclic inheritance | Depth First Search | DAG |
Symbol Table | Symbol Table | String |
javac | RE & NFA | String |
Regular Expression Engine | NFA | String |
Control Flow | DCG | DFS |
Compiler Register Allocation | BFS | Linear Programming |
Async Concurrency | Automata(DFA/NFA) | Pattern Match |
Database
APP | Solution | Type |
---|---|---|
File Organization | B+ Tree | Symbol Table |
Search & Autocomplete | Patricia Trie | Trie |
AI
APP | Solution | Type |
---|---|---|
Supervised Learning: Classification | KNN | Minimum Spanning Tree |
Unsupervised Learning: Clustering | K-Mean | Minimum Spanning Tree |
NLP | LRS | Suffix Sort |
Application
APP | Solution | Type |
---|---|---|
Flooding | Depth First Search | Photo Processing |
Seam Carving | Dijkstra’s Algorithm | Photo Processing |
Compressed quad-tree for N-Body simulation | Patricia Trie | Trie |
Grep | Generalized regular expression print | RE |
Achiever | LZW/Huffman/Run-length | Data Compression |
Tree | DFS | Graph |
Union-Find
Union Find could be modelized as a dynamic connectivity problem.
Model Union Find
- public class Union Find
- union(p, q): connect p and q
- find(p, q): find whether p-q is connected.
Idea
- Reflexive: p is connected to p.
- Symmetric: if p-q is connected, also q-p.
- Transmitive: if p-q, q-r, thus p-r.
- connected components.
Quick Find
Connected node i’s id[i]
are all equal.
1 | public boolean find(int p, int q){ |
algo | init | union | find |
---|---|---|---|
quick find | N | N | 1 |
Quick Union
Interpreter
id[i]
is i’s parent.- root of i is
id[id[...id[i]...]]
union(p,q) => id[root(p)] = root(q);
Quick Union+
Weighten quick union
It could avoid the worst case.
use sz[i]
in union.
O() < lgN
Paths compression
use id[i] = id[id[i]]
in root()
.
O() < N + MlgN
<- Proved by Tjan, and I would discuss about it in my blog later.
algo | init | union | find |
---|---|---|---|
quick union(worst) | N | N | N |
quick union+ | N | N+MlgN | N+MlgN |
The Analysis of Algorithms
Reasons to analyze algorithms
- Predict performance.
- Compare algorithms.
- Provide guarantees.
- Understand theoretical basis.
Scientific Method
- Observe some feature of the nature world.
- Hypothesize a model that is consistent with the observation.
- Predict events using the hypothesis.
- Verify the predictions by making further observations.
- Validate by repeating until the hypothesis and observations.
Observation
1 | Stopwatch();//Princeton STL |
Mathematic Model
$$
RunningTime = Frequency \times Cost
$$
Simplifying the calculations
- Cost Model
Use some basic operation as a proxy for running time. - Tilde Notation
Estimate running time (or memory) as a function of input size N.
Ignore lower order terms.
- when N is large, terms are negligible.
- when N is small, we don’t care.
Estimate as Discrete Math Model:
$$
\sum_N i = \int_0^N xdx = \sim N^2
$$
$$
\sum_N\sum_N\sum_N i= \int_0^N (\int_0^z (\int_0^yxdx)dy)dz = \sim N^4
$$
Order-of-growth
The small set of functions:
$1, \log N, N, NlogN, N^2, N^3,$ and $2^N$
suffices to describe ordered-of-growth of typical algorithms.
- Common order-of-growth classifications
Order of Growth | Name | Typical code framework | description | example |
---|---|---|---|---|
1 | Constant | a += b |
statement | add two numbers |
$\log N$ | Logarithmic | while(N > 1){N = N / 2;...} |
divide in half | BinarySearch |
N | Linear | for(i = 0; i < N){...;} |
loop | find max/min |
$N\log N$ | Linearithmic | mergesort | divide and conquer | mergesort |
$N^2$ | Quadratic | double loop | double loop | check all pairs |
$N^3$ | Cubic | triple loop | triple loop | check all triples |
$2^N$ | exponential | combinatorial search | exhaustive search | check all subsets |
Theory of Algorithms
Upper Bound(Big Oh) |
---|
Optimal Algo(Tilde Notation, Big theta) |
Lower Bound(Big Omega) |
Memory
Typical memory usage for objects in Java
- Object overhead. 16 bytes
- Reference. 8 bytes
- Padding. Each object uses a multiple of 8 bytes.
Typical Memory usage summary
Total memory usage for a data type value:
- Primitive type: 4 bytes for int, 8 bytes for double,…
- Object reference: 8 bytes.
- Array: 24 bytes + memory for each array entry
- Object: 16 bytes + memory for each instance variable + 8 bytes if inner class(for pointer to enclosing class)
- Padding: round up to multiple of 8 bytes.
Shallow memory usage: Don’t count referenced objects.
Deep memory usage: If array entry or instance variable is a reference, add memory(recursively) for referenced object.
Stacks and Queues
Stack
push()
: Tail is NULL
Implementation
Resizing Array –> Space Efficiency
Linked-List –> Time Efficiency
- Resizing Array
Resize stragedy:
- When N == s.length, double;
- When N == s.length/4, half;
- s.length / 4 < N < s.length;
- Point:
- push()
1 | public void push(T item){ |
- pop()
1 | public String pop(){ |
- resize()
1 | public void resize(int cap){ |
- Linked List
- Point:
- push(item):
1 | public void push(T item){ |
- pop()
1 | public void pop(){ |
Queue
Idea:
- 2 Nodes: first & last
- if(IsEmpty()): (At least one node left)
- Enqueue(): first = last;
- Dequeue(): last = first;
JAVA’s Generic
Sort of similar with C++’s Template.
E: Element in Set
T: Class
1 |
|
Good Code has Zero Cast.
Primitives all have Wrapper Type.
i.e.: int has a wrapper type called Integer
AutoBoxing: Automatic cast between primitives and wrappers
Syntactic Sugar: Behind-the-scenes casting
Java’s API
1 | import java.util.List; |
Main Application: Adding items to a collection and iterating.
1 | public class Bag<item> implements Iterable<item>{ |
Dijkstra’s Two Stack Algorithms
Ubiquitously used in Compiler Design
- Left Parenthesis: Ignore
- Right Parenthesis: Operate
- Value: push into value stack
- Operator: push into operator stack
Syntax Memo
1 | s[N++] = 10; |
Sort
Element Sort
Two useful abstractions
- Helper functions: Refer to data through compares and exchanges.
- Less: Is item v less than w?
- Exchange: Swap item in array a[] at index i with the one at index j.
compareTo()
- Totality
- Transitivity
- Antisymmetry
Selection Sort
- In iteration i, Find index min of smallest remaining entry.
- swap a[min] and a[i].
- O(n^2)
- Defect: It’s a waste of time for partially-sorted array.
Insertion Sort
- In iteration i, swap a[i] with each larger entry to its left.
- Partially-sorted array: O(N)–> (Because N(inv) < cN; N(Exchange) = N(inv), N(inv) < N(Compare) < N(inv) + N - 1)
- Random array: O(N^2)
Shell Sort(Insertion sort)
- h-sort: First, define increment; Second, insertionsort every h increments.
1 | while( h > 0 )//change increment |
Why insertion sort?
- Big increments => small subarray
- Small increments => nearly in order
Which increment sequence to use?
Problem
- Why increment h = 3h + 1 ?
- Why O(N^3/2) ?
Shuffling
- In iteration i, pick integer r between 0 and i uniformly at random, swap a[i] and a[r].
Convex Hull
- Graham Scan
MergeSort
Divide and Conquer
- Recurrence(Extra Space)
- Divide array into two halves.
- Recursively sort each half.
- Merge two halves.
- Bottom-up
- Pass through array, merging subarrays of size 1.
- Repeat for subarrays of size 2, 4, 8…
Quicksort
Basic Plan
- Shuffle the array
- Partition so that, for some j
- Entry a[j] is in place.
- No larger entry to the left of j
- No smaller entry to the right of j
- Sort each piece recursively
Partition
Repeat until i and j pointers cross
- Scan i from left to right so long as (a[i] < a[lo]).
- Scan j from right to left so long as (a[j] > a[lo]).
- Exchange a[i] with a[j].
When pointers cross
- Exchange a[lo] with a[j].
Dijkstra 3-Way Quicksort
Java’s sort
- Tuned QuickSort for Primitives, Tuned MergeSort for objects.
1 | import java.util.Arrays; |
JAVA Syntax
Implement “swap” func in JAVA
- Definite out of Method(in the class)
- Definite in obj(
int[], String[]
)
Priority queue
Collection:
Insert and delete items.
Delete:
- Stack: Remove the item most recently added
- Queue: Remove the item least recently added
- Randomized Queue: Remove a random item
- Priority queue: Remove the largest(or Samllest) item
Priority Queue
- Unordered: Insert: O(1), Delmax: O(N), Max: O(N)
- Ordered: Insert: O(N), Delmax: O(1), Max: O(1)
Binary Heap
- Binary Tree & Parents >(<) Children
Basic Plan
- Swim(int children)
- Sink(int Parent)
Non Comparison Sort
- Radix Sort
- Counting Sort
- Bucket Sort
Radix Sort
Counting Sort
Bucket Sort
Search & Symbol Table
Symbol Table
Key-value pair abstraction.
- Insert a value with specified key.
- Given a key, search for the corresponding value.
Convention
- Values are not null.
- Method
get()
returnsnull
if key not present. - Method
put()
overwrites old value with new value.
Intended Consequences
- Easy to implement
contains()
- Can implement lazy version of
delete()
. - Equal (x.equal(y)(PS: Non-Null)) for objects
Elementary Table
Only 1 to 1
- Associative array Abstractions
Sequential search in a linked list
Symbol Table <– Also Known as key-value pair abstraction.
In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier(or symbol) in a program’s source code is associated with information relating to its declaration or appearance in the source. In other words, the entries of a symbol table store the information related to the entry’s corresponding symbol.
Binary Search Trees
- Del: Hibbard del
- Get
- Put
Correspondence betweeen QuickSort’s Partition & BinarySearch Tree: 1-1
2-3 Tree and Red-Black Tree (The Derivative of 2-3 Tree)[Weak Balance]
In there, I just wanna talk about Red Black Tree.
Basic Idea
- Rotate left: Right child Red.
- Rotate Right: Left child & left-left grandchild Red.
- Flip Red: Both sides of children Red.
Problem
Red Black Tree and 2-3 Tree only support insert and search functions which could guarantee the balance. The deletion machanism is implemented by Hibbard deletion, which tends to make balance crash.
Update: You could perfectly trick this problem by using
moveRedLeft(Node)
andmoveRedRight(Node)
API.
Deletion
The discussion about how to delete in RedBlack BST is here.
AVL Tree [Rough Balance]
- Evaluate Tree’s Balance.
Recursive Definition
N.Height equals:
- if N is a leaf:
1 + max(N.left.Height, N.right.Height)
1 |
|
AVL Property
AVL trees maintain the following property:
For all nodes N,
|N.left.Height - N.right.Height| <= 1
Implementation
1 | AVLInsert(k, R) { |
Geometric Application of BST
1d Range Search
Unordered Array:
Range Count & Search: O(N) <– Sequential Search
Ordered Array:
Range Count & Search: O(logN) <– Binary Search for lo
& hi
.
Line Segment Intersection
Sweep-Line Analysis
Sweep vertical line from left to right.
- x-Coordinates define events.
- h-segment(left endpoint): insert y-coordinate into BST.
- h-segment(right endpoint): remove y-coordinate from BST.
- v-segment: range search for interval of y-endpoints.
Kd Tree: Video Game
Space partitioning trees
- Partitioning Nodes by direction of Segments(Vertical or Horizonal).
Interval Search Trees
Basic Idea
- Left Endpoint is Key.
Rectangle Intersection Search: Microprocessor Design
Hash Table
Hash Table is other type of Symbol Table implementation which is distinct with BSTs.
Uniform Hash Assumptions
Each key is equally likely to hash to an integer between 0 and M-1.
Hash Functions
JAVA Implementation: hash()
& hashCode()
1 | public final class PseudoString { |
- “31x + y” Rule
Hash Collision
Seperate Chainning
Linear Probing
Rehashing
Hash Table & BSTs API in JAVA
- Hash Table:
java.util.HashMap
,java.util.IdentityHashMap
- Red-Black Tree:
java.util.TreeMap
,java.util.TreeSet
Graph & Search
Some Graph-Processing Problems
- Path. Is there a path between
s
andt
? - Shortest path. What is the shortest path between
s
andt
? - Cycle. Is there a cycle in the graph ?
- Euler tour. Is there a cycle that uses each edge exactly once ?
- Hamilton tour. Is there a cycle that uses each vertex exactly once?
- Connectivity. Is there a way to connect all of the vertices ?
- MST. What is the best way to connect all of the vertices ?
- Biconnectivity. Is there a vertex whose removal disconnects the graph ?
- Planarity. Can you draw the graph in the plane with no crossing edges ?
- Graph isomorphism. Do two adjacency lists represent the same graph ?
Undirected Graph
Graph API
1 | public class Graph { |
Graph-Processing Algorithms
- Design Pattern For Graph Processing
Decouple graph data type from graph processing.
- Create a Graph object.
- Pass the Graph to a graph-processing routine.
- Query the graph-processing routine for information.
API: Paths
1 | public class Paths { |
Depth-First Search(Put unvisited vertices on a stack)
- Goal: Systematically search through a graph.
- Idea: Mimic Maze Exploration.
1 | DFS(to visit a vertex v) |
- Typical application:
Find all vertices connected to a given source vertex;
Find a path between two vertices.
Breadth-First Search(Put unvisited vertices on a queue)
- Shortest path(Bellman-Ford): Find path from s to t that uses fewest number of edges.
1 | BFS(From source vertex s) |
Connected Components
- Goal. Partition vertices into connected components.
1 | Connected Components |
Directed Graph (Digraph)
Digraph API is almost same as Graph API besides the addEdge(int, int)
part.
Direected BFS (Same as Undirected Graph)
Application
- Multisource shortest Path
- Route
Web Crawler: Bare-bones web crawler.
Goal. Crawl web, starting from some root web page.
Solution.[BFS with implicit digraph]
- Choose root web page as source s;
- Maintain a Queue of websites to explore;
- Maintain a SET of discovered websites;
- Dequeue the next website and enqueue websites to which it links. (Provide you haven’t done so before.)
Directed DFS (Same as Undirected Graph)
Application
- Reachability APP: Program Control-flow Analysis; Mark-sweep Garbage Collection
- Path finding
- Directed Cycle detection
- Topological Sort
Topological Sort
Precedence scheduling
Goal. Given a set of tasks to be completed with precedence constraints, in which order should we schedule the tasks?
Digraph model. vertex = task; edge = precedence constraint.
Topological sort
DAG. Directed acyclic graph.
Topological sort. Redraw DAG so all edges point upwards.
1 | Topological sort |
Directed Cycle detection Application
- Precedence scheduling
- Java Compiler: Cyclic inheritance
- Microsoft Excel: Spreadsheet recalculation.
Strongly-connected components
Strongly connected
Def. Vertices v and w are strongly connected if there is both a directed path from v to w and a directed path from w to v.
Key propety. Strong connectivity is an equivalence relation:
- v is strongly connected to v.
- if v->w, w->v.
- if v->x, x->w, thus v->w.
Strong component
Def. A strong component is a maximal subset of strongly-connected vertices.
Kosaraju-Sharir Algorithm: intuition
Reverse Graph. Strong components in G are same as G’.
Kernal DAG. Contract each strong component into a single vertex.
Idea.
- Compute topological order(reverse postorder) in kernel DAG.
- Run DFS, considering vertices in reverse topological order.
1 | Kosaraju-Sharir Algorithm |
PS. The DFS in the first phase is crucial. The phase 2 is trival as long as the algorithms could mark the set of vertices reachable.
Graph & Optimize
Now let’s looking at Optimal Problem in Graph, which is much important in Discrete Math & Computer Science.
Minimum Spanning Tree
MinCut and Maxflow
String & Sort
Counting Sort
String & Trie: Search
Part 1: String Symbol Table & Trie
Part 2: Substring Search
Keypoint: Backup, misamtched characters.
Knuth-Morris-Pratt
KMP is just a clever way to implement Brute Force. The essence is that utilizing restart state X to store next loop of Brute Force. (Avoid backup)
It is just a simple Brute Force(Equivalent) which async next step state.
DFA
Mealy State Machine. (Transition based on current state and input)
- Finite number of states (including start and halt).
- Exactly one transition for each char in alphabet.
- Accept if sequence of transitions leads to halt state.
1 | if in state j, reading char C: |
The implementation of NFA version’s KMP:
1 | void GetNext(string Pat, vector<int>& Next) { |
String: Pattern Match
NFA
Nondeterministic Finate state Automaton.
Moore State Machine.(Transition based on state.)
Building an NFA corresponding to an RE
- States. Include a state for each symbol in the RE, plus an accept state.
- Concatenation. Add match-transition edge from state corresponding to characters in the alphabet to next state.
- Parentheses. Add epsilon-transition edge from parentheses to next state.
- closure. Add three epsilon-transition edge for each * operator.
- Or. Add two epsilon-transition edges for each | operator.
NFA construction
Goal. Write a program to build the epsilon-transition digraph.
Challenges. Remember left parentheses to implement closure and or; remember | to implement or.
Solution. Maintain a stack. (Like Dijkstra Machine)
- ( symbol: push ( onto stack.
- | symbol: push | onto stack.
- ) symbol: pop corresponding ( and any intervening |; add epsilon-transition edges for closure/or.
String: Compression
Bitmaps
Huffman Trie and Algorithms
LZW Compression
Burrows Wheeler
FFT
Appendix
JAVA Syntax Memo
-> About Iterable
<T>
& Iterator<T>
1 | public interface Iterable<T> { |
Interface like this is convenient to definite ReversedIterator & Iterator in sequential data structures.
Summary
Iterable
- In Java Containers, all Subclass of Collection would use
Iteratable
interface to implementfor each
functions.
1 |
|
Generic
- Generic Array: When you wanna initialize a generic array, you should use mandatory casting like that:
1 |
|
- Use Generic Object to new Generic Object:
1 |
|
Queue
<T>
is just a implements;
To utilize Queue<T>
, We should initialize it like following:
1 | Queue<T> q = new LinkedList<T>(); |
Same Package’s Compile
When you wanna compile file in the same package, you should add classpath when compiling. i.e:
1 | javac -cp [classpath] xxx.java |
Reverse Set
You could use Collections.reverse()
to reverse.
1 | import java.util.Collections; |
Java Memory Management
- You could only be allowed to change object when you invoke it.
1 | /* FalseCase */ |
Java Programming Method
- Recursive-instance method.
- Static: for all classes, not just current class.(Not allowed to use generic creation.)