Abstract
As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lin, P., Shadid, J., Sala, M., Tuminaro, R., Hennigan, G., Hoekstra, R.: Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling. Journal of Computational Physics 228(17), 6250–6267 (2009)
Hennigan, G., Hoekstra, R., Castro, J., Fixel, D., Shadid, J.: Simulation of neutron radiation damage in silicon semiconductor devices. Technical Report SAND2007-7157, Sandia National Laboratories (2007)
Lin, P.T., Shadid, J.N.: Performance of an MPI-only semiconductor device simulator on a quad socket/quad core InfiniBand platform. Technical Report SAND2009-0179, Sandia National Laboratories (2009)
Li, X.S., Shao, M., Yamazaki, I., Ng, E.G.: Factorization-based sparse solvers and preconditioners. Journal of Physics: Conference Series 180(1), 012015 (2009)
Saltz, J.H.: Aggregation methods for solving sparse triangular systems on multiprocessors. SIAM Journal on Scientific and Statistical Computing 11(1), 123–144 (1990)
Rothberg, E., Gupta, A.: Parallel iccg on a hierarchical memory multiprocessor – addressing the triangular solve bottleneck. Parallel Computing 18(7), 719–741 (1992)
Mayer, J.: Parallel algorithms for solving linear systems with sparse triangular matrices. Computing 86(4), 291–312 (2009)
Davis, T.A.: The University of Florida Sparse Matrix Collection (1994), Matrices found at http://www.cise.ufl.edu/research/sparse/matrices/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wolf, M.M., Heroux, M.A., Boman, E.G. (2011). Factors Impacting Performance of Multithreaded Sparse Triangular Solve. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds) High Performance Computing for Computational Science – VECPAR 2010. VECPAR 2010. Lecture Notes in Computer Science, vol 6449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19328-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-19328-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19327-9
Online ISBN: 978-3-642-19328-6
eBook Packages: Computer ScienceComputer Science (R0)