Skip to content

Commit 9bc2007

Browse files
author
Ziming M
committed
update English version of Union-Find
1 parent 7078491 commit 9bc2007

File tree

2 files changed

+301
-294
lines changed

2 files changed

+301
-294
lines changed
Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
# Detailed Explanation of Union-Find
2+
3+
Today I will talk about the Union-Find algorithm, which is often referred to as the Disjoint-Set algorithm, mainly to solve the problem of "dynamic connectivity" in graph theory. Nouns look a little confusing, but they ’re really easy to understand. We will explain it later. Moreover, the application of this algorithm is also very interesting.
4+
5+
Speaking of this Union-Find, it should be my "Enlightenment Algorithm", this algorithm was introduced at the beginning of *Algorithms 4th edition*, I have to say that this algorithm shocked me! Later I discovered that leetcode also has related topics and is very interesting. Moreover, the solution given in *Algorithms 4th edition* can be further optimized. With only a small modification, the time complexity can be reduced to O (1).
6+
7+
8+
First, I will explain what is meant by dynamic connectivity.
9+
10+
### Ⅰ. Problem Introduction
11+
12+
Briefly, dynamic connectivity to a fact can be abstracted to connect a graph with lines. For example the following figure depicts a total of 10 nodes, they are disconnected, respectively numerals 0 to 9:
13+
14+
![](../pictures/unionfind/1.jpg)
15+
16+
Now our Union-Find algorithm mainly needs to implement these two APIs:
17+
18+
```java
19+
class UF {
20+
/* Connecting the p and q */
21+
public void union(int p, int q);
22+
/* Determine whether p and q are connected */
23+
public boolean connected(int p, int q);
24+
/* Returns the number of connected components in the graph */
25+
public int count();
26+
}
27+
```
28+
29+
Here's "connectivity" is an equivalence relation, that has the following three properties:
30+
31+
1. reflexivity: `p` and` p` node is connected.
32+
33+
2. Symmetry: If `p` and` q` node communication, and then `q`` p` also in communication.
34+
35+
3. Transitivity: If the nodes `p` and` q` are connected, and `q` and` r` are connected, then `p` and` r` are also connected.
36+
37+
For example, in the previous picture, any two **different points** from 0 to 9 are not connected, and calling `connected` will return false with 10 connected components.
38+
39+
Now,if you call `union (0, 1)`, the 0 and 1 are connected and the connected components are reduced to 9.
40+
41+
then,when we call `union (1, 2)`, the 0,1,2 are connected. Calling `connected (0, 2)` will also return true, and the connected components will become 8.
42+
43+
![](../pictures/unionfind/2.jpg)
44+
45+
This "equivalent relationship" judgment is very useful, such as the compiler to judge different references to the same variable, or to count the number of friends in social networks, etc.
46+
47+
Now, you probably understand what dynamic connectivity is. The key to the Union-Find algorithm is the efficiency of the `union` and` connected` functions. So what model should we use to represent the connected state of this graph? And what data structure is more appropriate to implement the code?
48+
49+
### Ⅱ. Motivation
50+
51+
Note that I just separated the "model" from the specific "data structure" because we use forests (several trees) to represent the dynamic connectivity of the graph, and use arrays to implement this forest.
52+
53+
How to use forest to represent connectivity? We set each node of the tree to have a pointer to its parent node, and if it is the root node, this pointer points to itself. For example, the graph of 10 nodes just now didn't communicate with each other at the beginning, like this:
54+
55+
![](../pictures/unionfind/3.jpg)
56+
57+
```java
58+
class UF {
59+
// recording connected components
60+
private int count;
61+
// the parent of node x is parent [x]
62+
private int[] parent;
63+
64+
/* construct a function where n is the total number of nodes in the graph */
65+
public UF(int n) {
66+
// disconnected at first
67+
this.count = n;
68+
// parent node pointer points to itself
69+
parent = new int[n];
70+
for (int i = 0; i < n; i++)
71+
parent[i] = i;
72+
}
73+
74+
/* Other functions */
75+
}
76+
```
77+
78+
**If two nodes are already connected, then connect the root node of one node to the root node of the other node:**
79+
80+
![](../pictures/unionfind/4.jpg)
81+
82+
```java
83+
public void union(int p, int q) {
84+
int rootP = find(p);
85+
int rootQ = find(q);
86+
if (rootP == rootQ)
87+
return;
88+
// Merge two trees into one
89+
parent[rootP] = rootQ;
90+
// parent[rootQ] = rootP
91+
count--; // Combine two components into one
92+
}
93+
94+
/* Returns the root node of a node x */
95+
private int find(int x) {
96+
// parent[x] == x
97+
while (parent[x] != x)
98+
x = parent[x];
99+
return x;
100+
}
101+
102+
/* Returns the number of the current connected components */
103+
public int count() {
104+
return count;
105+
}
106+
```
107+
108+
**In this way, if the nodes `p` and` q` are connected, they must have the same root node:**
109+
110+
![](../pictures/unionfind/5.jpg)
111+
112+
```java
113+
public boolean connected(int p, int q) {
114+
int rootP = find(p);
115+
int rootQ = find(q);
116+
return rootP == rootQ;
117+
}
118+
```
119+
120+
At this point, the Union-Find algorithm is basically complete. Isn't it amazing? We only use arrays to simulate a forest, and cleverly solve this more complicated problem!
121+
122+
So what is the complexity of this algorithm? We found that the complexity of the main API `connected` and` union` is caused by the `find` function, so they are the same complexity as` find`.
123+
124+
The main function of `find` is to traverse from a certain node to the root of the tree, and its time complexity is the height of the tree. We may customarily think that the height of the tree is `logN`, but this is not necessarily the case. The height of `logN` exists only in a balanced binary tree. For general trees, extreme imbalance may occur, causing the“ tree ”to almost degenerate into a“ linked list ”. In the worst case, the tree height may become` N`.
125+
126+
![](../pictures/unionfind/6.jpg)
127+
128+
So the above solution, the time complexity of `find`,` union`, `connected` is O (N). This complexity is very unsatisfactory. What you want graph theory to solve is the problem of huge data scales such as social networks. The calls to `union` and` connected` are very frequent, and each call requires linear time completely unbearable.
129+
130+
**The point is, how do you find ways to avoid tree imbalances?**
131+
132+
### Ⅲ. Balance optimization
133+
134+
We know that in the process of `union` imbalances may arise either case:
135+
136+
```java
137+
public void union(int p, int q) {
138+
int rootP = find(p);
139+
int rootQ = find(q);
140+
if (rootP == rootQ)
141+
return;
142+
// Merge two trees into one
143+
parent[rootP] = rootQ;
144+
// parent[rootQ] = rootP also works
145+
count--;
146+
```
147+
148+
At the beginning, we simply and rudely connected the tree where `p` was located under the root node of the tree where` q` was located. Then, a "top-heavy" imbalance situation may occur here, such as the following situation:
149+
150+
![](../pictures/unionfind/7.jpg)
151+
152+
Over time, the tree may grow imbalanced. **We actually hope that the smaller trees are connected to the larger ones, so that we can avoid top-heavy and more balanced**. The solution is to use an additional `size` array to record the number of nodes in each tree. We might as well call it "weight":
153+
154+
```java
155+
class UF {
156+
private int count;
157+
private int[] parent;
158+
// Added an array record tree "weight"
159+
private int[] size;
160+
161+
public UF(int n) {
162+
this.count = n;
163+
parent = new int[n];
164+
// Since there is only one node per tree initially, the
165+
// weight should be initialized to 1
166+
size = new int[n];
167+
for (int i = 0; i < n; i++) {
168+
parent[i] = i;
169+
size[i] = 1;
170+
}
171+
}
172+
/* Other function */
173+
}
174+
```
175+
176+
For instance, `size [3] = 5` means that the tree rooted at node` 3` has a total of `5` nodes. This way we can modify the `union` method:
177+
178+
```java
179+
public void union(int p, int q) {
180+
int rootP = find(p);
181+
int rootQ = find(q);
182+
if (rootP == rootQ)
183+
return;
184+
185+
// The small tree is more balanced under the big tree
186+
if (size[rootP] > size[rootQ]) {
187+
parent[rootQ] = rootP;
188+
size[rootP] += size[rootQ];
189+
} else {
190+
parent[rootP] = rootQ;
191+
size[rootQ] += size[rootP];
192+
}
193+
count--;
194+
}
195+
```
196+
197+
Like this, by comparing the weight of the tree, you can ensure that the growth of the tree is relatively balanced, and the height of the tree is roughly on the order of `logN`, which greatly improves the execution efficiency.
198+
199+
At this time, the time complexity of `find`,` union`, and `connected` has been reduced to O (logN), even if the data size is hundreds of millions, the time required is very small.
200+
201+
### Ⅳ. Path compression
202+
203+
This step of optimization is particularly simple and clever. Can we further compress the height of each tree so that the tree height remains constant at all times?
204+
205+
![](../pictures/unionfind/8.jpg)
206+
207+
`Find` can result in O (1) time to find a root node, corresponding,` connected` `union` and complexity are reduced to O (1).
208+
209+
To do this, simply add a line in `find` Code:
210+
211+
```java
212+
private int find(int x) {
213+
while (parent[x] != x) {
214+
// Path compression
215+
parent[x] = parent[parent[x]];
216+
x = parent[x];
217+
}
218+
return x;
219+
}
220+
```
221+
222+
This operation is a little tricky to see the GIF to understand its role (for clarity, this tree in extreme conditions).
223+
224+
![](../pictures/unionfind/9.gif)
225+
226+
Therefore, each time the `find` function is called to traverse to the root of the tree, the tree height is shortened by hand, and eventually all the heights will not exceed 3 (the height may reach 3 when` union`).
227+
228+
PS: The reader may ask, after the find process of this GIF graph is completed, the tree height is exactly equal to 3, but if there is a higher tree, the height after compression will still be greater than 3, what should we do? This GIF scenario was edited by me to make it easy for everyone to understand path compression, but in practice, path compression is performed every time it is found, so the tree could not have grown to such a high level, and this worry is unnecessary.
229+
230+
### Ⅴ. Summary
231+
232+
Let's take a look at the whole code:
233+
234+
```java
235+
class UF {
236+
// Number of connected components
237+
private int count;
238+
// Store a tree
239+
private int[] parent;
240+
// Record the "weight" of the tree
241+
private int[] size;
242+
243+
public UF(int n) {
244+
this.count = n;
245+
parent = new int[n];
246+
size = new int[n];
247+
for (int i = 0; i < n; i++) {
248+
parent[i] = i;
249+
size[i] = 1;
250+
}
251+
}
252+
253+
public void union(int p, int q) {
254+
int rootP = find(p);
255+
int rootQ = find(q);
256+
if (rootP == rootQ)
257+
return;
258+
259+
// The small tree is more balanced under the big tree
260+
if (size[rootP] > size[rootQ]) {
261+
parent[rootQ] = rootP;
262+
size[rootP] += size[rootQ];
263+
} else {
264+
parent[rootP] = rootQ;
265+
size[rootQ] += size[rootP];
266+
}
267+
count--;
268+
}
269+
270+
public boolean connected(int p, int q) {
271+
int rootP = find(p);
272+
int rootQ = find(q);
273+
return rootP == rootQ;
274+
}
275+
276+
private int find(int x) {
277+
while (parent[x] != x) {
278+
// Path compression
279+
parent[x] = parent[parent[x]];
280+
x = parent[x];
281+
}
282+
return x;
283+
}
284+
285+
public int count() {
286+
return count;
287+
}
288+
}
289+
```
290+
291+
The complexity of the Union-Find algorithm can be analyzed as follows: the constructor initializes the data structure requires O (N) time and space complexity. However, the time complexity required for `union`,` connected` and `count` is O (1).
292+
293+
**The algorithm is committed to make it clear! Welcome to follow us on WeChat public account labuladong for more easy-to-understand articles**:
294+
295+
![labuladong](../pictures/labuladong.png)
296+
297+
[Previous: How to schedule candidates' seats](../高频面试系列/座位调度.md)
298+
299+
[NextApplication of Union-Find algorithm](../算法思维系列/UnionFind算法应用.md)
300+
301+
[Contents](../README.md#目录)

0 commit comments

Comments
 (0)