Fast Sparse Matrix Multiplication
Fast Sparse Matrix Multiplication
Fast Sparse Matrix Multiplication
Raphael Yuster
Uri Zwick
Abstract
Let A and B two n n matrices over a ring R (e.g., the reals or the integers) each containing at most m
non-zero elements. We present a new algorithm that multiplies A and B using O(m0.7 n1.2 + n2+o(1) ) algebraic operations (i.e., multiplications, additions and subtractions) over R. The naive matrix multiplication
algorithm, on the other hand, may need to perform (mn) operations to accomplish the same task. For
m n1.14 , the new algorithm performs an almost optimal number of only n2+o(1) operations. For m n1.68 ,
the new algorithm is also faster than the best known matrix multiplication algorithm for dense matrices
which uses O(n2.38 ) algebraic operations. The new algorithm is obtained using a surprisingly straightforward combination of a simple combinatorial idea and existing fast rectangular matrix multiplication
algorithms. We also obtain improved algorithms for the multiplication of more than two sparse matrices.
As the known fast rectangular matrix multiplication algorithms are far from being practical, our result, at
least for now, is only of theoretical value.
Introduction
The multiplication of two nn matrices is one of the most basic algebraic problems and considerable effort was
devoted to obtaining efficient algorithms for the task. The naive matrix multiplication algorithm performs
O(n3 ) operations. Strassen [Str69] was the first to show that the naive algorithm is not optimal, giving
an O(n2.81 ) algorithm for the problem. Many improvements then followed. The currently fastest matrix
multiplication algorithm, with a complexity of O(n2.38 ), was obtained by Coppersmith and Winograd [CW90].
More information on the fascinating subject of matrix multiplication algorithms and its history can be found
in Pan [Pan85] and B
urgisser et al. [BCS97]. An interesting new group theoretic approach to the matrix
multiplication problem was recently suggested by Cohn and Umans [CU03]. For the best available lower
bounds see Shpilka [Shp03] and Raz [Raz03].
Matrix multiplication has numerous applications in combinatorial optimization in general, and in graph algorithms in particular. Fast matrix multiplication algorithms can be used, for example, to obtain fast algorithms
for finding simple cycles in graphs [AYZ95, AYZ97, YZ04], for finding small cliques and other small subgraphs
[NP85], for finding shortest paths [Sei95, SZ99, Zwi02], for obtaining improved dynamic reachability algorithms [DI00, RZ02], and for matching problems [MVV87, RV89, Che97]. Other applications can be found in
[Cha02, KS03], and this list is not exhaustive.
In many cases, the matrices to be multiplied are sparse, i.e., the number of non-zero elements in them is
negligible compared to the number of zeros in them. For example if G = (V, E) is a directed graph on n
vertices containing m edges, then its adjacency matrix AG is an nn matrix with only m non-zero elements (1s
A preliminary version of the paper is to appear in the proceedings of the 12th Annual European Symposium on Algorithms
(ESA04).
Department of Mathematics, University of Haifa at Oranim, Tivon 36006, Israel. Email: raphy@research.haifa.ac.il
Department of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Email: zwick@cs.tau.ac.il
in this case). In many interesting cases m = o(n2 ). Unfortunately, the fast matrix multiplication algorithms
mentioned above cannot utilize the sparsity of the matrices multiplied. The complexity of the algorithm
of Coppersmith and Winograd [CW90], for example, remains O(n2.38 ) even if the multiplied matrices are
extremely sparse. The naive matrix multiplication algorithm, on the other hand, can be used to multiply two
n n matrices, each with at most m non-zero elements, using O(mn) operations (see next section). Thus, for
m = O(n1.37 ), the sophisticated matrix multiplication algorithms of Coppersmith and Winograd [CW90] and
others do not provide any improvement over the naive matrix multiplication algorithm.
In this paper we show that the sophisticated algebraic techniques used by the fast matrix multiplication
algorithms can nevertheless be used to speed-up the computation of the product of even extremely sparse
matrices. More specifically, we present a new algorithm that multiplies two n n matrices, each with at
most m non-zero elements, using O(m0.7 n1.2 + n2+o(1) ) algebraic operations. (The exponents 0.7 and 1.2 are
derived, of course, from the current 2.38 bound on the exponent of matrix multiplication, and from bounds
on other exponents related to matrix multiplications, as will be explained in the sequel.) There are three
important things to notice here:
(i) If m n1+ , for any > 0, then the number of operations performed by the new algorithm is o(mn),
i.e., less then the number of operations performed, in the worst-case, by the naive algorithm.
(ii) If m n1.14 , then the new algorithm performs only n2+o(1) operations. This is very close to optimal as
all n2 entries in the product may be non-zero, even if the multiplied matrices are very sparse.
(iii) If m n1.68 , then the new algorithm performs only o(n2.38 ), i.e., fewer operations than the fastest
known matrix multiplication algorithm.
In other words, the new algorithm improves on the naive algorithm even for extremely sparse matrices (i.e.,
m = n1+ ), and it improves on the fastest matrix multiplication algorithm even for relatively dense matrices
(i.e., m = n1.68 ).
The new algorithm is obtained using a surprisingly straightforward combination of a simple combinatorial idea,
implicit in Eisenbrand and Grandoni [EG03] and Yuster and Zwick [YZ04], with the fast matrix multiplication
algorithm of Coppersmith and Winograd [CW90], and the fast rectangular matrix multiplication algorithm
of Coppersmith [Cop97]. It is interesting to note that a fast rectangular matrix multiplication algorithm for
dense matrices is used to obtain a fast matrix multiplication algorithm for sparse square matrices.
As mentioned above, matrix multiplication algorithms are used to obtain fast algorithms for many different
graph problems. We note (with some regret . . . ) that our improved sparse matrix multiplication algorithm
does not yield, automatically, improved algorithms for these problems on sparse graphs. These algorithms
may need to multiply dense matrices even if the input graph is sparse. Consider for example the computation
of the transitive closure of a graph by repeatedly squaring its adjacency matrix. The matrix obtain after the
first squaring may already be extremely dense. Still, we expect to find many situations in which the new
algorithm presented here could be useful.
In view of the above remark, we also consider the problem of computing the product A1 A2 Ak of three
or more sparse matrices. As the product of even very sparse matrices can be completely dense, the new
algorithm for multiplying two matrices cannot be applied directly in this case. We show, however, that some
improved bounds may also be obtained in this case. Our results here are less impressive, however. For
k = 3, we improve, for certain densities, on the performance of all existing algorithms. For k 4 we get no
worst-case improvements at the moment, but such improvements will be obtained if bounds on certain matrix
multiplication exponents are sufficiently improved.
The problem of computing the product A1 A2 Ak of k 3 rectangular matrices, known as the chain matrix
multiplication problem, was, of course, addressed before. The main concern, however, was finding an optimal
way of parenthesizing the expression so that a minimal number of operations will be performed when the
naive algorithm is used to successively multiply pairs of intermediate results. Such an optimal placement of
parentheses can be easily found in O(k 3 ) time using dynamic programming (see, e.g., Chapter 15 of [CLRS01]).
A much more complicated algorithm of Hu and Shing [HS82, HS84] can do the same in O(k log k) time. An
almost optimal solution can be found in O(k) time using a simple heuristic suggested by Chin [Chi78]. It
is easy to modify the simple dynamic programming solution to the case in which fast rectangular matrix
multiplication algorithm is used instead of the naive matrix multiplication algorithm. It is not clear whether
the techniques of Hu and Shing and of Chin can also be modified accordingly. Cohen [Coh99] suggests an
interesting technique for predicting the non-zero structure of a product of two or more matrices. Using her
technique it is possible to exploit the possible sparseness of the intermediate products.
All these techniques, however, reduce the computation of a product like A1 A2 A3 into the computation of
A1 A2 and then (A1 A2 )A3 , or to the computation of A2 A3 and then A1 (A2 A3 ). We show that, for certain
densities, a faster way exists.
The rest of the paper is organized as follows. In the next section we review the existing matrix multiplication
algorithms. In Section 3 we present the main result of this paper, i.e., the improved sparse matrix multiplication algorithm. In Section 4 we use similar ideas to obtain an improved algorithm for the multiplication of
three or more sparse matrices. We end, in Section 5, with some concluding remarks and open problems.
In this short section we examine the worst-case behavior of the naive matrix multiplication algorithm and
state the performance of existing fast matrix multiplication algorithms.
2.1
Let A and B be two n n matrices. The product C = AB is defined as follows cij = nk=1 aik bkj , for
1 i, j n. The naive matrix multiplication algorithm uses this definition to compute the entries of C
using n3 multiplications and n3 n2 additions. The number of operations can be reduced by avoiding the
computation of products aik bkj for which aik = 0 or bkj = 0. In general, if we let a
k be the number of non-zero
elements in the k-th column of A, and bk be the number of non-zero elements in the k-th row of B, then the
P
number of multiplications that need to be performed is only nk=1 a
k bk . The number of additions required is
always bounded by the required number of multiplications. This simple sparse matrix multiplication algorithm
may be considered folklore. It can also be found in Gustavson [Gus78].
P
P
k )n mn. The same bound is obtained
k bk ( n a
If A contains at most m non-zero entries, then n a
k=1
k=1
when B contains at most m non-zero entries. Can we get an improved bound on the worst-case number of
products required when both A and B are sparse? Unfortunately, the answer is no. Assume that m n and
consider the case a
i = bi = n, if i m/n, and a
i = bi = 0, otherwise. (In other words, all non-zero elements
of A and B are concentrated in the first m/n columns of A and the first m/n columns of B.) In this case
Pn
kbk = (m/n) n2 = mn. Thus, the naive algorithm may have to perform mn multiplications even if
k=1 a
both matrices are sparse. It is instructive to note that the computation of AB in this worst-case example
can be reduced to the computation of a much smaller rectangular product. This illustrates the main idea
behind the new algorithm: When the naive algorithm has to perform many operations, rectangular matrix
multiplication can be used to speed up the computation.
3
To do justice with the naive matrix multiplication algorithm we should note that in many cases that appear in
practice the matrices to be multiplied have a special structure, and the number of operations required may be
much smaller than mn. For example, if the non-zero elements of A are evenly distributed among the columns
of A, and the non-zero elements of B are evenly distributed among the rows of B, we have a
k = bk = m/n,
Pn
2
2
2.2
Let M (a, b, c) be the minimal number of algebraic operations needed to multiply an a b matrix by a b c
matrix over an arbitrary ring R. Let (r, s, t) be the minimal exponent for which M (nr , ns , nt ) = O(n+o(1) ).
We are interested here mainly in = (1, 1, 1), the exponent of square matrix multiplication, and (1, r, 1),
the exponent of rectangular matrix multiplication of a particular form. The best bounds available on (1, r, 1),
for 0 r 1 are summarized in the following theorems:
Theorem 2.1 (Coppersmith and Winograd [CW90])
< 2.376 .
Next, we define two more constants, and , related to rectangular matrix multiplication.
Definition 2.2
= max{ 0 r 1 | (1, r, 1) = 2 }
2
.
1
> 0.294 .
It is not difficult to see that these Theorems 2.1 and 2.3 imply the following theorem. A proof can be found,
for example, in Huang and Pan [HP98].
2
2 + (r )
if 0 r ,
otherwise.
Theorem 2.4
(1, r, 1)
Corollary 2.5
All the bounds in the rest of the paper will be expressed terms of and . Note that with = 2.376 and
= 0.294 we get ' 0.533. If = 2, as conjectured by many, then = 1. (In this case is not defined, but
also not needed.)
Let Ak be the k-th column of A, and let Bk be the k-th row of B, for 1 k n. Clearly AB = k Ak Bk .
(Note that Ak is a column vector, Bk a row vector, and Ak Bk is an n n matrix.) Let ak be the number
of non-zero elements in Ak and let bk be the number of non-zero elements in Bk . (For brevity, we omit
the bars over ak and bk used in Section 2.1. No confusion will arise here.) As explained in Section 2.1, we
P
can naively compute AB using O( k ak bk ) operations. If A and B each contain m non-zero elements, then
P
k ak bk may be as high as mn. (See the example in Section 2.1.)
For any subset I [n] let AI be the submatrix composed of the columns of A whose indices are in I and
let BI the submatrix composed of the rows of B whose indices are in I. If J = [n] I, then we clearly
4
Algorithm SMP(A, B)
Input: Two n n matrices A and B.
Output: The product AB.
1.
2.
3.
4.
5.
6.
7.
8.
O(min{ (m1 m2 ) +1 n
2
+o(1)
+1
+ n2+o(1) , m1 n , m2 n , n+o(1) })
ring operations.
2
+o(1)
2
to check that for m = O(n
), for any > 0, the algorithm performs only o(n ) operations. Using the
currently best available bounds on , and , namely ' 2.376, ' 0.294, and ' 0.536, we get that the
number of operations performed by the algorithm is at most O(m0.7 n1.2 + n2+o(1) ), justifying the claims made
in the abstract and the introduction.
The proof of Theorem 3.1 relies on the following simple lemma:
Lemma 3.2 For any 1 ` < n we have
m1 m2
` .
Proof: Assume, without loss of generality, that a1 a2 . . . an . Let 1 ` < n. We then clearly have
P
P
k>` a(k) b(k)
k>` ak bk . Also ` a`+1
k` ak m1 . Thus a`+1 m1 /`. Putting this together we get
a(k) b(k)
k>`
ak bk a`+1
k>`
bk
k>`
m1
m1 m2
m2 =
.
`
`
2
O((m1 m2 ) +1 n
2
+o(1)
+1
k>`
1
2
+1
m1 m2
= n2+o(1) .
`
a(k) b(k)
2
+o(1)
+1
2
m1 m2
1 1
= (m1 m2 ) +1 n +1 = (m1 m2 ) +1 n +1 .
`
+o(1)
In this section we extend the results of the previous section to the product of three or more matrices. Let
A1 , A2 , . . . , Ak be n n matrices, and let mr , for 1 r k be the number of non-zero elements in Ar . Let
B = A1 A2 Ak be the product of the k matrices. As the product of two sparse matrices is not necessarily
sparse, we cannot use the algorithm of the previous section directly to efficiently compute the product of more
than two sparse matrices. Nevertheless, we show that the algorithm of the previous section can be generalized
to efficiently handle the product of more than two matrices.
(r)
Let Ar = (aij ), for 1 r k, and B = A1 A2 Ak = (bij ). It follows easily from the definition of matrix
multiplication that
X
(1)
(k)
(k1)
ai,r1 a(2)
bij =
(1)
r1 ,r2 ark2 rk1 ark1 j .
r1 ,r2 ,...,rk1
It is convenient to interpret the computation of bij as the summation over paths in a layered graph, as shown
(for the case k = 4) in Figure 2. More precisely, the layered graph corresponding to the product A1 A2 Ak
is composed of k + 1 layers V1 , V2 , . . . , Vk+1 . Each layer Vr , where 1 r k + 1 is composed of n vertices vr,i ,
for 1 i n. For each non-zero element arij in Ar , there is an edge vr,i vr+1,j in the graph labelled by
the element arij . The element bij of the product is then the sum over all directed paths in the graph from v1,i
to vk+1,j of the product of the elements labelling the edges of the path.
6
(2)
(1)
a24
a42
(4)
(3)
a44
(3)
(1)
a58
a43
a62
a27
A3
A4
(4)
(2)
a86
A1
A2
Qk
r=1 mr )
1
k1+
when k = 2, we have ` = (m1 m2 ) +1 n +1 , as in the proof of Theorem 3.1.) Next, the algorithm lets Ir be
the set of indices of the ` rows of Ar with the largest number of non-zero elements, ties broken arbitrarily, for
2 r k. It also lets Jr = [n] Ir be the set of indices of the n ` rows of Ar with the smallest number of
non-zero elements. The rows of Ar with indices in Ir are said to be the heavy rows of Ar , while the rows of Ar
with indices in Jr are said to be light rows. The algorithm is then ready to do some calculations. For every
1 r k, it computes Pr (A1 )J2 (A2 )J2 J3 (Ar )Jr . This is done by enumerating all the corresponding
paths in the layered graph corresponding to the product. The matrix Pr is an n n matrix that gives the
contribution of the light paths, i.e., paths that do not use elements from heavy rows of A2 , . . . , Ar , to the
prefix product A1 A2 Ar . Next, the algorithm computes the suffix products Sr Ar Ak , for 2 r k,
using recursive calls to the algorithm. The cost of these recursive calls, as we shall see, will be overwhelmed
by the other operations performed by the algorithm. The crucial step of the algorithm is the computation of
Br (Pr1 )Ir (Sr )Ir , for 2 r k, using the fastest available rectangular matrix multiplication algorithm.
P
The algorithm then computes and outputs the matrix ( kr=2 Br ) + Pk .
Theorem 4.1 Let A1 , A2 , . . . , Ak be nn matrices each with m1 , m2 , . . . , mk non-zero elements, respectively,
where k 2 is a constant. Then, algorithm SCMP(A1 , A2 , . . . , Ak ) correctly computes the product A1 A2 Ak
using
Qk
O (
r=1 mr )
k1+
(2)(k1)
+o(1)
k1+
+ n2+o(1)
algebraic operations.
Proof: It is easy to see that the outdegree of a vertex vr,j is the number of non-zero elements in the j-th row
of Ar . We say that a vertex vr,j is heavy if j Ir , and light otherwise. (Note that vertices of Vk+1 are not
classified as light or heavy. The classification of V1 vertices is not used below.) A path in the layered graph is
said to be light if all its intermediate vertices are light, and heavy if at least one of its intermediate vertices
is heavy.
(1)
(2)
(k1)
(k)
Let ai,s1 as1 ,s2 ask2 sk1 ask1 j be one of the terms appearing in the sum of bij given in (1). To prove the
correctness of the algorithm we show that this term appears in exactly one of the matrices B2 , . . . , Bk and Pk
7
Algorithm SCMP(A1 , A2 , . . . , Ak )
Input: n n matrices A1 , A2 , . . . , Ak .
Output: The product A1 A2 Ak .
Q
for 2 r k .
The number of light paths in the graph is at most m1 d2 d3 dk . Using the bounds we obtained on the dr s,
and the choice of ` we get that the number of light paths is at most
Qk
m1 d2 d3 dk (
Qk
r=1 mr )
Qk
k1
k1+
r=1 mr )
k1
r=1 mr )/`
(2)(k1)
k1+
Qk
r=1 mr )
= (
k1+
(2)(k1)
k1+
(2)(k1)
Thus, the time taken to compute Pk is O(( kr=1 mr ) k1+ n k1+ ). (Computing the product of the
elements along a path requires k operations, but we consider k to be a constant.)
As |Ir | = `, for 2 r k, the product (Pr1 )Ir (Sr )Ir is the product of an n ` matrix by an ` n matrix
whose cost is M (n, `, n). Using Corollary 2.5 and the choice of ` made by the algorithm we get that
Qk
k1+
n
r=1 mr )
(2)
k1+
+ n2+o(1)
Qk
= (
k1+
n
r=1 mr )
(2)(1 k1+
)+o(1)
Qk
+ n2+o(1) = (
k1+
n
r=1 mr )
(2)(k1)
+o(1)
k1+
+ n2+o(1) .
Finally, it is easy to see that the cost of computing the suffix products Sr Ar Ak , for 2 r k, using
recursive calls to the algorithm, is dominated by the cost of the other operations performed by the algorithm.
(Recall again that k is a constant.) This completes the proof of the theorem.
2
There are two alternatives to the use of algorithm SCMP for computing the product A1 A2 Ak . The first
it to ignore the sparsity of the matrices and multiply the matrices in O(n ) time. The second is to multiply
the matrices, one by one, using the naive algorithm. As the naive algorithm uses at most O(mn) operations
when one of the matrices contains only m non-zero elements, the total number of operations in this case
P
will be O(( ki=r mr )n). For simplicity, let us consider the case in which each one of the matrices A1 , . . . , Ak
contains m non-zero elements. A simple calculation then shows that SCMP is faster then the fast dense
matrix multiplication algorithm for
k1+
m n k ,
and that it is faster than the naive matrix multiplication algorithm for
m max{n
k(1+(k1))1
(k1)(1)
, n1+o(1) } .
For k = 2, these bounds coincide with the bounds obtained in Section 3. For k = 3, with the best available
bounds on , and , we get that SCMP is the fastest algorithm when n1.24 m n1.45 . For smaller values
of m the naive algorithm is the fastest, while for larger values of m the fast dense algorithm is the fastest.
Sadly, for k 4, with the current values of , and , the new algorithm never improves on both the naive
and the dense algorithms. But, this may change if improved bounds on , and especially on , are obtained.
We obtained an improved algorithm for the multiplication of two sparse matrices. The algorithm does not
rely on any specific structure of the matrices to be multiplies, just on the fact that they are sparse. The
algorithm essentially partitions the matrices to be multiplied into a dense part and a sparse part and uses a
fast algebraic algorithm to multiply the dense parts, and the naive algorithm to multiply the sparse parts. We
also discussed the possibility of extending the ideas to the product of k 3 matrices. For k = 3 we obtained
some improved results. The new algorithms were presented for square matrices. It is not difficult, however,
to extend them to work on rectangular matrices.
The most interesting open problem is whether it is possible to speed up the running time of other operations
on sparse matrices. In particular, is it possible to compute the transitive closure of a directed graph on n
vertices with m edges in o(mn) time? In particular, is there an O(m1 n1+ ) algorithm for the problem, for
some > 0?
References
[AYZ95]
N. Alon, R. Yuster, and U. Zwick. Color-coding. Journal of the ACM, 42:844856, 1995.
[AYZ97]
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17:209
223, 1997.
[BCS97]
[Cha02]
T. Chan. Dynamic subgraph connectivity with geometric applications. In Proc. of 34th STOC,
pages 713, 2002.
[Che97]
[Chi78]
F.Y. Chin. An O(n) algorithm for determining a near-optimal computation order of matrix chain
products. Communications of the ACM, 21(7):544549, 1978.
[CLRS01] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to algorithms. The MIT
Press, second edition, 2001.
[Coh99]
E. Cohen. Fast algorithms for constructing t-spanners and paths with stretch t. SIAM Journal on
Computing, 28:210236, 1999.
[Cop97]
[CU03]
H. Cohn and C. Umans. A group-theoretic approach to fast matrix multiplication. In Proc. of 44th
FOCS, pages 438449, 2003.
[CW90]
[DI00]
C. Demetrescu and G.F. Italiano. Fully dynamic transitive closure: Breaking through the O(n2 )
barrier. In Proc. of 41st FOCS, pages 381389, 2000.
[EG03]
F. Eisenbrand and F. Grandoni. Detecting directed 4-cycles still faster. Information Processing
Letters, 87(1):1315, 2003.
[Gus78]
F.G. Gustavson. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM Transactions on Mathematical Software, 4(3):250269, 1978.
[HP98]
X. Huang and V.Y. Pan. Fast rectangular matrix multiplications and applications. Journal of
Complexity, 14:257299, 1998.
[HS82]
T.C. Hu and M.T. Shing. Computation of matrix chain products I. SIAM Journal on Computing,
11(2):362373, 1982.
[HS84]
T.C. Hu and M.T. Shing. Computation of matrix chain products II. SIAM Journal on Computing,
13(2):228251, 1984.
[KS03]
D. Kratsch and J. Spinrad. Between O(nm) and O(n ). In Proc. of 14th SODA, pages 709716,
2003.
[MVV87] K. Mulmuley, U. V. Vazirani, and V. V. Vazirani. Matching is as easy as matrix inversion. Combinatorica, 7:105113, 1987.
[NP85]
J. Nesetril and S. Poljak. On the complexity of the subgraph problem. Commentationes Mathematicae Universitatis Carolinae, 26(2):415419, 1985.
[Pan85]
V. Pan. How to multiply matrices faster. Lecture notes in computer science, volume 179. SpringerVerlag, 1985.
10
[Raz03]
R. Raz. On the complexity of matrix product. SIAM Journal on Computing, 32:13561369, 2003.
[RV89]
M.O. Rabin and V.V. Vazirani. Maximum matchings in general graphs through randomization.
Journal of Algorithms, 10:557567, 1989.
[RZ02]
L. Roditty and U. Zwick. Improved dynamic reachability algorithms for directed graphs. In Proc.
of 43rd FOCS, pages 679688, 2002.
[Sei95]
[Shp03]
A. Shpilka. Lower bounds for matrix product. SIAM Journal on Computing, 32:11851200, 2003.
[Str69]
[SZ99]
A. Shoshan and U. Zwick. All pairs shortest paths in undirected graphs with integer weights. In
Proc. of 40th FOCS, pages 605614, 1999.
[YZ04]
R. Yuster and U. Zwick. Detecting short directed cycles using rectangular matrix multiplication
and dynamic programming. In Proc. of 15th SODA, pages 247253, 2004.
[Zwi02]
U. Zwick. All-pairs shortest paths using bridging sets and rectangular matrix multiplication. Journal of the ACM, 49:289317, 2002.
11