Inf3380 Floyd
Inf3380 Floyd
Inf3380 Floyd
Floyds algorithm p. 1
Overview
Chapter 6 from Michael J. Quinn, Parallel Programming in C with MPI and
OpenMP
Floyds algorithm p. 2
Each edge is of a direction and has a length if theres path from vertex i to j , there may not be path from vertex j to i path length from vertex i to j may be different than path length from vertex j to i Objective: nding the shortest path between every pair of vertices (i j ) Application: table of driving distances between city pairs
Floyds algorithm p. 3
Adjacency matrix
There are n vertices The direct path length from vertex i to vertex j is stored as a[i, j ] An n n adjacency matrix a keeps the entire connectivity info 0 1 2 3 4 5 0 0 2 5 7 1 8 1 0 4 2 0 3 0 3 3 4 2 0 4 0 5 5 2 If a[i, j ] is , it means there is no direct path from vertex i to vertex j
Floyds algorithm p. 4
Floyds algorithm p. 5
Floyds algorithm
Input: n number of vertices a adjacency matrix Output: Transformed a that contains the shortest path lengths for k 0 to n 1 for i 0 to n 1 for j 0 to n 1 a[i, j ] min(a[i, j ], a[i, k ] + a[k, j ]) endfor endfor endfor
Floyds algorithm p. 6
Some observations
Floyds algorithm is an exhaustive and incremental approach The entries of the a-matrix are updated n rounds a[i, j ] is compared with all n possibilities, that is, against a[i, k ] + a[k, j ], for 0 k n 1 n3 of comparisons in total
Floyds algorithm p. 7
Source of parallelism
During the k th iteration, the work is (in C syntax)
for (i=0; i<n; i++) for (j=0; j<n, j++) a[i][j] = MIN( a[i][j], a[i][k]+a[k][j] );
Can all the entries in a be updated concurrently? Yes, because the k th column and the k th row remain the same during the k th iteration! Note that a[i][k]=MIN(a[i][k],a[i][k]+a[k][j]) will be the same as a[i][k] Note that a[k][j]=MIN(a[k][j],a[k][k]+a[k][j]) will be the same as a[k][j]
Floyds algorithm p. 8
Floyds algorithm p. 9
Floyds algorithm p. 10
Communication pattern
Recall that in the k th iteration: a[i, j ] min(a[i, j ], a[i, k ] + a[k, j ]) Since entries of a are divided into rowwise blocks, so a[i, k ] is also in the memory of the MPI process that owns a[i, j ] However, a[k, j ] is probably in another MPI processs memory Communication is therefore needed! Before the k th iteration, the MPI process that owns the k th row of the a matrix should broadcast this row to everyone else
Floyds algorithm p. 11
int **B, *Bstorage, i; ... Bstorage=(int*)malloc(m*n*sizeof(int)); B=(int**)malloc(m*sizeof(int*)); for (i=0; i<m; i++) B[i] = &Bstorage[i*n];
The underlying storage is contiguous, making it possible to send and receive an entire 2D array.
Floyds algorithm p. 12
Floyds algorithm p. 13
Floyds algorithm p. 14
Matrix input
Recall that each MPI process only stores a part of the a matrix When reading a from a le, we can let only process p 1 do the input once the number of rows needed by process i are read in, they are sent from process p 1 to process i using MPI Send process i must issue a matching MPI Recv The above simple strategy is not parallel Parallel I/O can be done using MPI-2 commands
Floyds algorithm p. 15
Matrix output
For example, we let only process 0 do the output Each process needs to send its part of a to process 0 To avoid many processes sending its entire subdata to process 0 at the same time Process 0 communicates with the other processes in turn Each process waits for a hint (a short message) from process 0 before sending its data (a large message)
Floyds algorithm p. 16
Deadlock
Typical deadlock example 1
Analysis
Serial algorithm time usage: n3 Parallel algorithm non-communication time usage: n2 n/p communication (broadcast) time usage: nlog2 p( + 4n/ ) assuming each entry of matrix a needs 4 bytes assuming as communication latency assuming as communication bandwidth (# bytes per second) Read Section 6.7 for a more detailed analysis that allows overlap between computation and communication
Floyds algorithm p. 18
Exercises
Write an MPI program that uses p processes to produce a JPEG picture of n n pixels. The picture should have white background and a black circle (of radius n/4) in the middle. (The existing C code collection
http://heim.ifi.uio.no/xingca/inf-verk3830/simple-jpeg.tar.gz
can be used.) Implement the complete Floyds algorithm and try it on a large enough adjacency matrix.
Floyds algorithm p. 19