Abstract
A new parallel implementation of Strassen’s matrix multiplication algorithm is proposed for massively parallel supercomputers with 2D, all-port torus interconnection networks. The proposed algorithm employs a special conflict-free routing pattern for better scalability and is able to yield a performance rate very close to the theoretical bound for many practical network and matrix sizes. It effectively scales up to very large networks typically containing hundreds-of-thousands processors where petaflop or exaflop processing rates are sought.
Similar content being viewed by others
References
Al Bdaiwi BF, Bose B (2003) On resource placements in 3D tori. J Parallel Distrib Comput 63(9):838–845
Baransel C, İmre K, Artuner H (2010) New parallel matrix multiplication algorithms for wormhole-routed al-port 2D/3D torus networks. In: Mathematical methods in engineering international symposium (MME ’10), Coimbra, Portugal, 21–24 October, 2010, ISBN 978-989-8331-11-3
Bini DA (2007) Chapter 47: Fast matrix multiplication. In: Hogben L (ed) Handbook of linear algebra. Chapman & Hall/CRC Press, London/New York
Bodrato M (2010) A Strassen-like matrix multiplication suited for squaring and higher power computation. In: Proceedings of the 2010 international symposium on symbolic and algebraic computation (ISSAC’10). ACM, New York, pp 273–280
Boyer B, Dumas J, Pernet C, Zhou W (2009) Memory efficient scheduling of Strassen–Winograd’s matrix multiplication algorithm. In: Proceedings of the 2009 international symposium on symbolic and algebraic computation (ISSAC’09). ACM, New York, pp 55–62
Chou C, Deng Y, Li G, Wang Y (1995) Parallelizing Strassen’s method for matrix multiplication on distributed-memory MIMD architectures. Comput Math Appl 30(2):49–69
Desprez F, Suter F (2002) Impact of mixed-parallelism on parallel implementations of Strassen and Winograd matrix multiplication algorithms, INRIA Research Report No 4482
Dimitrescu B, Roch JL, Trystram D (1994) Fast matrix multiplication algorithms on MIMD architectures. Parallel Algorithms Appl 4(2):53–70
Gramma A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Addison-Wesley, Reading
Gastinel N (1971) Sur le calcul des produits de matrices. Numer Math 17:222–229
Grayson B, Shah A, van de Geijn R (1995) A high performance parallel Strassen implementation. Dept. of Computer Sciences, The University of Texas, TR-95-24, June 1995
İmre KM, Baransel C, Artuner H (2011) Efficient and scalable routing algorithms for collective communication operations on 2D all-port torus networks. Int J Parallel Program 39(6):746–782
Hopcroft J, Kerr L (1971) On minimizing the number of multiplications necessary for matrix multiplication. SIAM J Appl Math 20(1):30–36
Hunold S, Rauber T, Rünger G (2008) Combining building blocks for parallel multi-level matrix multiplication. Parallel Comput 34:411–426
Kolen JF, Bruce P (2001) Evolutionary search for matrix multiplication algorithms. In: FLAIRS conference, pp. 161–165
Laderman JD (1976) A noncommutative algorithm for multiplying 3×3 matrices using 23 multiplications. Bull Am Math Soc 82(1):126–128
Luo Q, Drake JB (1995) A scalable parallel Strassen’s matrix multiplication algorithm for distributed-memory computers. In: Proceedings of the 1995 ACM symposium on applied computing. ACM Press, New York, pp 221–226
Ni LM, McKinley PK (1993) A survey of wormhole routing techniques in direct networks. Computer 26(2):62–76
Oh S, Moon B (2010) Automatic reproduction of a genius algorithm: Strassen’s algorithm revisited by genetic search. IEEE Trans Evol Comput 14(2):246–251
Ohtaki Y, Takahashi D, Boku T, Sato M (2004) Parallel implementation of Strassen’s matrix multiplication algorithm for heterogeneous clusters. In: Proceedings of the 18th IEEE international parallel and distributed processing symposium (IPDPS ’04)
Song F, Dongarra J, Moore S (2006) Experiments with Strassen’s algorithm: from sequential to parallel. In: Proceedings of the 18th IASTED international conference on parallel and distributed computing and systems (PDCS ’06)
Strassen V (1969) Gaussian elimination is not optimal. Numer Math 13:354–356
Winograd S (1971) On multiplication of 2×2 matrices. Linear Algebra Appl 4:381–388
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Baransel, C., İmre, K.M. A parallel implementation of Strassen’s matrix multiplication algorithm for wormhole-routed all-port 2D torus networks. J Supercomput 62, 486–509 (2012). https://doi.org/10.1007/s11227-011-0730-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0730-1