Abstract
This paper reports a study of sparse matrix vector multiplication on a parallel distributed memory machine called EARTH, which supports a fine-grain multithreaded program execution model on off-the-shelf processors. Such sparse computations, when parallelized without graph partitioning, have a high communication to computation ratio, and are well known to have limited scalability on traditional distributed memory machines. EARTH offers a number of features which should make it a promising architecture for this class of applications, including local synchronizations, low communication overheads, ability to overlap communication and computation, and low context-switching costs. On the NAS CG benchmark Class A inputs, we achieve linear speedups on the 20-node MANNA platform, and an absolute speedup of 79 on 120 nodes on a simulated extension. The speedup improves to 90 on 120 nodes for Class B. This is achieved without inspector/executor, graph partitioning, or any communication minimization phase, which means that similar results can be expected for adaptive problems.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Guang R. Gao, and Laurie J. Hendren. A study of the EARTH-MANNA multithreaded system. International Journal of Parallel Programming, 24(4):319–347, August 1996.
Kevin Bryan Theobald. EARTH: An Efficient Architecture for Running Threads. PhD thesis, McGill University, Montréal, Québec, May 1999.
S.T. Barnard and H. Simon. A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Technical Report RNR-92-033, NAS Systems Division, NASA Ames Research Center, November 1992.
Joel Saltz, Kathleen Crowley, Ravi Mirchandaney, and Harry Berryman. Run-time scheduling and execution of loops on message passing machines. Journal of Parallel and Distributed Computing, 8(4):303–312, April 1990.
Numerical Aerospace Simulation Facility. NAS parallel benchmarks, 1997. http://www.nas.nasa.gov/Software/NPB/
U. Bruening, W. K. Giloi, and W. Schroeder-Preikschat. Latency hiding in message-passing architectures. In Proceedings of the 8th International Parallel Processing Symposium, pages 704–709, Cancún, Mexico, April 1994. IEEE Computer Society.
Ian Stuart MacKenzie Walker. Towards a custom EARTH synchronization unit. Master’s thesis, University of Delaware, Newark, Delaware, July 1999.
Lynn Elliot Cannon. A Cellular Computer to Implement the Kalman Filter Algorithm. PhD thesis, Montana State University, 1969.
Kevin B. Theobald, Rishi Kumar, Gagan Agrawal, Gerd Heber, Ruppa K. Thulasiram, and Guang R. Gao. Developing a communication intensive application on the EARTH multithreaded architecture. CAPSL Technical Memo 38, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, March 2000. In ftp://ftp.capsl.udel.edu/pub/doc/memos.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Theobald, K.B., Kumar, R., Agrawal, G., Heber, G., Thulasiram, R.K., Gao, G.R. (2000). Developing a Communication Intensive Application on the EARTH Multithreaded Architecture. In: Bode, A., Ludwig, T., Karl, W., Wismüller, R. (eds) Euro-Par 2000 Parallel Processing. Euro-Par 2000. Lecture Notes in Computer Science, vol 1900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44520-X_88
Download citation
DOI: https://doi.org/10.1007/3-540-44520-X_88
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67956-1
Online ISBN: 978-3-540-44520-3
eBook Packages: Springer Book Archive