7f99 PDF
7f99 PDF
7f99 PDF
ABSTRACT: Despite a significant decline in their popularity in the last decade vector
processors are still with us, and manufacturers such as Cray and NEC are bringing new
products to market. We have carried out a performance comparison of three full-scale
applications, the first, SBLI, a Direct Numerical Simulation code from Computational
Fluid Dynamics, the second, DL_POLY, a molecular dynamics code and the third,
POLCOMS, a coastal-ocean model. Comparing the performance of the Cray X1 vector
system with two massively parallel (MPP) micro-processor-based systems we find three
rather different results. The SBLI PCHAN benchmark performs excellently on the Cray
X1 with no code modification, showing 100% vectorisation and significantly
outperforming the MPP systems. The performance of DL_POLY was initially poor, but
we were able to make significant improvements through a few simple optimisations. The
POLCOMS code has been substantially restructured for cache-based MPP systems and
now does not vectorise at all well on the Cray X1 leading to poor performance. We
conclude that both vector and MPP systems can deliver high performance levels but that,
depending on the algorithm, careful software design may be necessary if the same code
is to achieve high performance on different architectures.
An initial examination of the scaling of the optimised The most important communications structure within
code shows that, as on HPCx, it scales quite well. Table 2 PCHAN is a halo-exchange between adjacent sub-
shows the timing results for the optimised code, the domains. Providing the problem size is large enough to
number of processors being the number of SSPs on the give a small surface area to volume ratio for each sub-
Cray. domain, the communications costs are small relative to
computation and do not constitute a bottleneck.
SBLI/PCHAN
Figure 2 shows performance results for the T3 data
The direct solution of the equations of motion for a case, a grid of 360x360x360, from the Cray X1, the IBM
fluid remains a formidable task and simulations are only cluster and the SGI Altix and shows ideal scaling on all
possible for flows with small to modest Reynolds systems. Hardware profiling studies of this code have
numbers. Within the UK the Turbulence Consortium shown that its performance is highly dependent on the
(UKTC) has been at the forefront of simulating turbulent cache utilisation and bandwidth to main memory [11].
flows by direct numerical simulation (DNS). UKTC has
developed a parallel version of a code to solve problems It is clear that memory management for this code is
associated with shock/boundary-layer interaction [10]. taking place more efficiently on the Altix than on the
p690+. The match to the streaming architecture of the
The code (SBLI) was originally developed for the Cray X1 is excellent. Timings for the Cray X1 and the
Cray T3E and is a sophisticated DNS code that IBM p690+ are shown in Table 3 together with the
incorporates a number of advanced features: namely high- performance ratio. At 128 processors the ratio is 4.3,
order central differencing; a shock-preserving advection dropping to 2.5 at 1280 processors, possibly as the sub-
scheme from the total variation diminishing (TVD)
Number PCHAN
of IBM Cray X1 Ratio
processor p690+ Cray/IBM
s
128 1245 290 4.3
192 812 189 4.3
256 576 147 3.9
512 230 75 3.1
768 146 52 2.8
1024 112 39 2.9
1280 81 32 2.5
Table 3. Execution times in seconds for the PCHAN T3
benchmark on the Cray X1 and IBM p690+ including the
performance ratio between the Cray and the IBM.
POLCOMS
[1] The use of Vector Processors in Quantum [12] Eddy Resolved Ecosystem Modelling in the Irish
Chemistry; Experience in the U.K., M.F. Guest Sea, J.T. Holt, R. Proctor, M. Ashworth, J.I. Allen,
and S. Wilson, Daresbury Laboratory Preprint, and J.C. Blackford, in Realizing Teracomputing:
DL/SCI/P290T; in Supercomputers in Chemistry', Proceedings of the Tenth ECMWF Workshop on
ed. P.Lykos and I. Shavitt, A.C.S. Symposium the Use of High Performance Computing in
series 173 (1981) 1. Meteorology, eds. W. Zwieflhofer and N. Kreitz,
(2004), 268-278, (World Scientific).
[2] Application of the CRAY-1 for Quantum Chemistry
Calculations, V.R. Saunders and M.F. Guest [13] Optimization of the POLCOMS Hydrodynamic
Computer Physics Commun. 26 (1982) 389. Code for Terascale High-Performance Computers,
M. Ashworth, J.T. Holt and R. Proctor, HPCx
[3] The Study of Molecular Electronic Structure on Technical Report HPCxTR0415, (2004)
Vector and Attached Processors: Correlation http://www.hpcx.ac.uk/research/hpc/technical_rep
Effects in Transition Metal Complexes, orts/HPCxTR0415.pdf
Supercomputer Simulations in Chemistry, ed. M.
Dupuis, Lecture Notes in Chemistry, 44, (1986) 98
(Springer Verlag)
Acknowledgments
[4] TOP500 supercomputer sites The authors would like to thank Oak Ridge National
http://www.top500.org/ Laboratory (ORNL) and the UK Engineering and
Physical Sciences Research Council (EPSRC) for access
[5] Parallel Processing in Environmental Modelling, to machines.
M. Ashworth, in Parallel Supercomputing in
Atmospheric Science: Proceedings of the Fifth About the Authors
Workshop on the Use of Parallel Processors in
Meteorology, eds. G-R. Hoffmann and T. Mike Ashworth is Head of the Advanced Research
Kauranne, (1993), 1-25, (World Scientific). Computing Group in the Computational Science &
Engineering Department (CSED) at CCLRC Daresbury
[6] Cray X1 Evaluation Status Report, P.A. Agarwal Laboratory and has special interests in the optimisation of
et al (29 authors), Oak Ridge National Laboratory, coupled environmental models for high-performance
Oak Ridge, TN, USA, Technical Report systems. Ian Bush is a computational scientist at CSED
ORNL/TM-2004/13 with specialisation in high performance parallel
algorithms for computational chemistry, molecular