0% found this document useful (0 votes)
22 views

Girvan-Newman Algorithm

Uploaded by

Sang Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Girvan-Newman Algorithm

Uploaded by

Sang Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

1. How to compute betweenness?

2. How to select the number of


clusters?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1


 Want to compute  Breath first search
betweenness of starting from �:
paths starting at
node � 0

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 2


 Count the number of shortest paths from
� to all other nodes of the network:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 3


 Compute betweenness by working up the
tree: If there are multiple paths count them
fractionally
The algorithm:
•Add edge flows:
-- node flow =
1+∑child edges 1+1 paths to H
-- split the flow up Split evenly
based on the parent
value
• Repeat the BFS 1+0.5 paths to J
Split 1:2
procedure for each
starting node �
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 4
 Compute betweenness by working up the
tree: If there are multiple paths count them
fractionally
The algorithm:
•Add edge flows:
-- node flow =
1+∑child edges 1+1 paths to H
-- split the flow up Split evenly
based on the parent
value
• Repeat the BFS 1+0.5 paths to J
Split 1:2
procedure for each
starting node �
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 5
Graph Partitioning
• Methods to break a network into sets of
connected components called regions
• Many general approaches
– Divisive methods: Repeatedly identify and
remove edges connecting densely connected
regions
– Agglomerative methods: Repeatedly identify and
merge nodes that likely belong in the same region
[Girvan-Newman ‘02]

 Divisive hierarchical clustering based on the


notion of edge betweenness:
Number of shortest paths passing through the edge
 Girvan-Newman Algorithm:
§ Undirected unweighted networks
§ Repeat until no edges are left:
§ Calculate betweenness of edges
§ Remove edges with highest betweenness
§ Connected components are communities
§ Gives a hierarchical decomposition of the network

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 7


Girvan-Newman Algorithm
• Divisive method Proposed by Girvan and
Newman in 2002
• Uses edge betweenness to identify edges to
remove
• Edge betweenness: Total amount of “flow” an
edge carries between all pairs of nodes where a
single unit of flow between two nodes divides
itself evenly among all shortest paths between the
nodes (1/k units flow along each of k shortest
paths)
Girvan-Newman Algorithm
1. Calculate betweenness of all edges
2. Remove the edge(s) with highest betweenness
3. Repeat steps 1 and 2 until graph is partitioned
into as many regions as desired
Girvan-Newman Algorithm
Girvan-Newman Algorithm

v The idea is that it is likely that edges connecting


separate modules have high edge betweenness as
all the shortest paths from one module to another
must traverse through them.
v So if we gradually remove the edge with the
highest edge betweenness score we will get a
hierarchical map, a rooted tree, called a
dendrogram of the graph.
§ The leafs of the tree are the individual vertices
§ The root of the tree represents the whole graph
Girvan-Newman Algorithm
v Step 1. Find the edge of highest betweenness - or
multiple edges of highest betweenness, if there is a tie -
and remove these edges from the graph.
§ This may cause the graph to separate into multiple
components. If so, this is the first level of regions in
the partitioning of the graph.
v Step 2. Recalculate all betweennesses, and again remove
the edge or edges of highest betweenness.
§ This may break some of the existing components into
smaller components; if so, these are regions nested
within the larger regions.
v Step 3. Proceed in this way as long as edges remain in
graph, in each step recalculating all betweennesses and
removing the edge or edges of highest betweenness.
Girvan-Newman Algorithm
v The method gives us only a succession of splits of
the network into smaller and smaller
communities, but it gives no indication of which
splits are best.
v One way to find the best split is via the the
modularity concept: “The spectral modularity
maximization community detection algorithm”
Girvan-Newman: Example1
Girvan-Newman: Example1
Girvan-Newman: Example1

Ø edge.betweenness(mynetwork)
Ø [1] 2.0 3.0 3.5 2.5 3.5 5.5 5.0
Example1: Delete edge (4,5)

Øedge.betweenness(mynetwork)
Ø[1] 4 1 9 4 8 5
Example1: Delete edge (2,3)

Øedge.betweenness(mynetwork)
Ø[1] 1 1 1 2 2
Example1: Delete edge (3,4) and (4,6)

Øedge.betweenness(mynetwork)
Ø[1] 1 1 1
Example1: Delete edge (2,1), (5,1) and (5,2)

Øedge.betweenness(mynetwork)
Ønumeric(0)
Example1: An example via package igraph: the
edge.betweenness.community() function
Example1: An example via package igraph: the
edge.betweenness.community() function
Øedge.betweenness.community(mynetwork)
ØIGRAPH clustering edge betweenness,
Øgroups: 2, mod: 0.2 + groups:
Ø$`1`
Ø[1] "1" "2" "5"
Ø$`2`
Ø[1] "3" "4" "6"
Example1: Plot dendrogram for hierarchical
methods via igraph function plot_dendrogram()
12
1
33
49

Need to re-compute
betweenness at
every step

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 24


Step 1: Step 2:

Step 3: Hierarchical network decomposition:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 25


Edge Betweenness
1 Example 10

Calculate total flow


2 3 over edge 7-8 9 11

7 8

4 6 12 13

5 14
1 10

One unit flows over 7-8


2 3 to get from 1 to 8 9 11

7 8

4 6 12 13

5 14
1 10

One unit flows over 7-8


2 3 to get from 1 to 9 9 11

7 8

4 6 12 13

5 14
1 10

One unit flows over 7-8


2 3 to get from 1 to 10 9 11

7 8

4 6 12 13

5 14
1 10

2 3 9 11

7 8

4 6 12 13

5 7 total units flow over


7-8 to get from 1 to
14
nodes 8-14
1 10

2 3 9 11

7 8

4 6 12 13

5 7 total units flow over


7-8 to get from 2 to
14
nodes 8-14
1 10

2 3 9 11

7 8

4 6 12 13

5 7 total units flow over


7-8 to get from 3 to
14
nodes 8-14
7 x 7 = 49 total units
flow over 7-8 from
1 nodes 1-7 to 8-14 10

2 3 9 11

7 8

4 6 12 13

5 14
Edge betweenness = 49
1 10

2 3 9 11

7 8

4 6 12 13

5 14
1 10
Calculate betweenness
for edge 3-7
2 3 9 11

7 8

4 6 12 13

5 14
1 10
3 units flow from
1-3 to each 4-14 node,
so total =
2 3 3 x 11 = 33 9 11

7 8

4 6 12 13

5 14
Betweenness = 33
1 for each 10
symmetric edge

2 3 9 11
33 33

7 8
33 33

4 6 12 13

5 14
1 Calculate betweenness 10
for edge 1-3

2 3 9 11

7 8

4 6 12 13

5 14
1 Carries all flow to node 10
1 except from node 2,
so betweenness = 12

2 3 9 11

7 8

4 6 12 13

5 14
1 betweenness = 12 10
12 for each 12
symmetric edge

2 12
3 9 12
11

7 8

12 12
4 6 12 13
12 12
5 14
1 Calculate betweenness 10
for edge 1-2

2 3 9 11

7 8

4 6 12 13

5 14
1 Only carries flow 10
from 1 to 2, so
betweenness = 1

2 3 9 11

7 8

4 6 12 13

5 14
1 betweenness = 1 10
1 for each symmetric edge 1

2 3 9 11

7 8

4 6 12 13
1 1

5 14
1 Edge with highest 10
1 12 betweenness 12 1

2 12
3 9 12
11
33 33

49
7 8
33 33
12 12
4 6 12 13
1 12 12 1

5 14
Communities in physics collaborations
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 45
 Zachary’s Karate club:
Hierarchical decomposition

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 46


Node Betweenness
• Betweenness also defined for nodes
• Node betweenness: Total amount of “flow” a
node carries when a unit of flow between each
pair of nodes is divided up evenly over shortest
paths
• Nodes and edges of high betweenness perform
critical roles in the network structure
Girvan-Newman Algorithm
1. Calculate betweenness of all edges
2. Remove the edge(s) with highest
betweenness
3. Repeat steps 1 and 2 until graph is
partitioned into as many regions as desired

How much computation does this require?


Newman (2001) and Brandes (2001) independently developed
similar algorithms that reduce the complexity from O(mn2) to
O(mn) where m = # of edges, n = # of nodes
Method 2: Computing Edge
Betweenness Efficiently
For each node N in the graph
1. Perform breadth-first search of graph starting at
node N
2. Determine the number of shortest paths from N
to every other node
3. Based on these numbers, determine the
amount of flow from N to all other nodes that
use each edge
Divide sum of flow of all edges by 2

Method developed by Brandes (2001) and Newman (2001)


F
Example Graph

B C I

A D G K

E H J
Method 2: Computing Edge
Betweenness Efficiently
For each node N in the graph
1. Perform breadth-first search of graph starting at
node N
2. Determine the number of shortest paths from N
to every other node
3. Based on these numbers, determine the
amount of flow from N to all other nodes that
use each edge
Divide sum of flow of all edges by 2
Breadth-first search
from node A A

B C D E

F G H

I J

K
Method 2: Computing Edge
Betweenness Efficiently
For each node N in the graph
1. Perform breadth-first search of graph starting at
node N
2. Determine the number of shortest paths from N
to every other node
3. Based on these numbers, determine the
amount of flow from N to all other nodes that
use each edge
Divide sum of flow of all edges by 2
A

1 B C 1 D 1 E 1
add add

F 2 G 1 H 2
add add

I 3 J 3

add

K 6
Method 2: Computing Edge
Betweenness Efficiently
For each node N in the graph
1. Perform breadth-first search of graph starting at
node N
2. Determine the number of shortest paths from N
to every other node
3. Based on these numbers, determine the
amount of flow from N to all other nodes that
use each edge
Divide sum of flow of all edges by 2
A

1 B C 1 D 1 E 1

F 2 G 1 H 2

I 3 J 3

Work from bottom-up


starting with K
K 6
A

1 B C 1 D 1 E 1

F 2 G 1 H 2

I 3 J 3

K gets 1 unit; equal, so ½ ½


evenly divide 1 unit K 6
A

1 B C 1 D 1 E 1

F 2 G 1 H 2
1 ½
I keeps 1 unit &
passes along ½ I 3 J 3
unit; gets 2 times ½ ½
as much from F
K 6
A

1 B C 1 D 1 E 1

F 2 G 1 H 2
1 ½ ½ 1
J keeps 1 unit &
I 3 J 3 passes along ½
½ ½
unit; gets 2 times
as much from H
K 6
A

1 B C 1 D 1 E 1
1 1

F 2 G 1 H 2
1 ½ ½ 1
F keeps 1 unit &
passes along 1 I 3 J 3
unit; equal, so ½ ½
divide evenly
K 6
A

1 B C 1 D 1 E 1
1 1 2

F 2 G 1 H 2
1 ½ ½ 1
G keeps 1 unit &
passes along 1 I 3 J 3
unit ½ ½

K 6
A

1 B C 1 D 1 E 1
1 1 2 1 1

F 2 G 1 H 2
1 ½ ½ 1
H keeps 1 unit &
I 3 J 3 passes along 1
unit; equal, so
½ ½
divide evenly
K 6
B keeps 1 & A
passes 1
2

1 B C 1 D 1 E 1
1 1 2 1 1

F 2 G 1 H 2
1 ½ ½ 1

I 3 J 3
½ ½

K 6
C keeps 1 & A
passes 1
2 2

1 B C 1 D 1 E 1
1 1 2 1 1

F 2 G 1 H 2
1 ½ ½ 1

I 3 J 3
½ ½

K 6
D keeps 1 &
A
passes along 3
2 2 4

1 B C 1 D 1 E 1
1 1 2 1 1

F 2 G 1 H 2
1 ½ ½ 1

I 3 J 3
½ ½

K 6
A E keeps 1 &
passes along 1
2 2 4 2

1 B C 1 D 1 E 1
1 1 2 1 1

F 2 G 1 H 2
1 ½ ½ 1

I 3 J 3
½ ½

K 6
No flow yet…
A
2 2 4 2

1 B C 1 D 1 E 1
1 1 2 1 1

F 2 G 1 H 2
1 ½ ½ 1

I 3 J 3
½ ½

K 6
Computing Edge
Betweenness Efficiently
For each node N in the graph Repeat for B, C, etc.
1. Perform breadth-first search of graph starting at
node N
2. Determine the number of shortest paths from N
to every other node
3. Based on these numbers, determine the amount
of flow from N to all other nodes that use each
edge
Divide sum of flow of all edges by 2
Since sum includes flow from A  B and B  A, etc.
 Example

b d

a g
Compute
c e #geodesics from
every node to g.
f

Breadth-first search – means for doing manythings.


 Example

b d d=0
w=1
a
g
c e

Breadth-first search – means for doing many things.


Example
d=1
w=1
b d d=0
w=1
a
g
c e d=1
w=1

f
Breadth-first search – means for doing many things.
 Example
d=2 d=1
w=2 w=1
b d d=0
w=1
a
g
c e d=1
d=2 w=1
w=2 d=2
f
w=2

Breadth-first search – means for doing many things.


 Example
d=2 d=1 Have all info.
w=2 w=1 we need for
b d d=0 edge
d=3
w=4 w=1 betweenness
now.
a
g
c e d=1
d=2 w=1
w=2 d=2
f
w=2

Breadth-first search – means for doing many things.


 Example
d=2 d=1 Note: a and f
w=2 w=1 are like leaves:
b d d=0 no geodesic
d=3 2/4
w=4 w=1 to g from other
1/2 nodes passes
a
2/4 g through them.
c e d=1
d=2 1/2
w=1
w=2 d=2
f
w=2
Breadth-first search – means for doing many things.
 An Example
d=2 d=1 Note: a and f
w=2 w=1 are like leaves:
b ½(1+2/4) d d=0 no geodesic
d=3 2/4
w=4
½
(1 w=1 to g from other
1/2
+2
a
/4 nodes passes
g
)
2/4 through them.
c e d=1
/4) ½(1+2/4)
+ 2 d=2 1/2
w=1
(1
½ w=2 d=2
f
w=2

Breadth-first search – means for doing many things.


 Example 1/1[ 1+½(1+2/4)+1/2(1+2/4)+1/2]
d=2 d=1
w=2 Note: a and f
w=1
b ½(1+2/4) d d=0 are like leaves:
d=3 2/4 no geodesic
w=4
½
(1 w=1
+2
1/2 to g from other
a
/4
g nodes passes
)
2/4
c e d=1 through them.
/4) ½(1+2/4)
+ 2 d=2 1/2
w=1
(1
½ w=2 d=2
f
w=2

Breadth-first search – means for doing many things.


 Edge betweenness for all edges can be
computed in time �(��) (�=#edges,
�=#nodes). [Newman 2001] – details
soon.
 Recalculation makes algorithm �(�2�),
so not feasible for large networks.

You might also like