8 Nvidia PDF

The Visual Computing Company
GPU Acceleration Benefits for Applied CAE

Axel Koehler, Senior Solutions Architect HPC, NVIDIA
HPC Advisory Council Meeting, April 2014 , Lugano
Outline
General overview about GPU efforts in CAE

Computational Structural Mechanics (CSM)
ANSYS Mechanical, SIMULIA Abaqus/Standard, MSC Nastran, MSC Marc
Computational Fluid Dynamics (CFD)
ANSYS Fluent, OpenFOAM (FluiDyna, PARALUTION)
Computational Electromagnetics (CEM)
CST Studio Suite
Conclusion
Status Summary of ISVs and GPU Computing
Every primary ISV has products available on GPUs or undergoing evaluation
The 4 largest ISVs all have products based on GPUs
#1 ANSYS, #2 DS SIMULIA, #3 MSC Software, and #4 Altair
The top 4 out of 5 ISV applications are available on GPUs today
ANSYS Fluent, ANSYS Mechanical, SIMULIA Abaqus/Standard, MSC Nastran, (LS-DYNA
implicit only)
In addition several other ISV applications are already ported to GPUs
AcuSolve, OptiStruct (Altair), NX Nastran (Siemens), Permas (Intes), Fire (AVL),
Moldflow(Autodesk), AMLS, FastFRS (CDH), ….
Several new ISVs were founded with GPUs as a primary competitive strategy
Prometech, FluiDyna, Vratis, IMPETUS, Turbostream, …
Open source CFD OpenFOAM available on GPUs today with many options
Commercial options: FluiDyna, Vratis; Open source options: Cufflink, Symscape
ofgpu, RAS, etc.
GPUs and Distributed Cluster Computing
Geometry decomposed: partitions
Partition on CPU put on independent cluster nodes; Nodes distributed
CPU distributed parallel processing parallel using MPI
1
2 3
4
N1 N1 N2 N3 N4
G1 G2 G3 G4
Global Solution
1
GPUs shared memory
parallel using OpenMP
Execution on under distributed parallel
CPU + GPU
CAE Priority for ISV Software Development on GPUs
LSTC / LS-DYNA
SIMULIA / Abaqus/Explicit
Altair / RADIOSS
#4
ESI / PAM-CRASH
ANSYS / ANSYS Mechanical

Altair / RADIOSS
#2 Altair / AcuSolve (CFD)
Autodesk / Moldflow
#1 ANSYS / ANSYS Mechanical

#3
SIMULIA / Abaqus/Standard
ANSYS / ANSYS Fluent MSC Software / MSC Nastran
OpenFOAM (Various ISVs) MSC Software / Marc
CD-adapco / STAR-CCM+ LSTC / LS-DYNA implicit
Autodesk Simulation CFD Altair / RADIOSS Bulk
ESI / CFD-ACE+ Siemens / NX Nastran
SIMULIA / Abaqus/CFD Autodesk / Mechanical
Computational Structural Mechanics
ANSYS Mechanical
ANSYS and NVIDIA Collaboration Roadmap
Release ANSYS Mechanical ANSYS Fluent ANSYS EM
13.0 SMP, Single GPU, Sparse ANSYS Nexxim

Dec 2010 and PCG/JCG Solvers
14.0 + Distributed ANSYS; Radiation Heat Transfer ANSYS Nexxim

Dec 2011 + Multi-node Support (beta)
14.5 + Multi-GPU Support; + Radiation HT; ANSYS Nexxim

Nov 2012 + Hybrid PCG; + GPU AMG Solver (beta),
+ Kepler GPU Support Single GPU
15.0 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ANSYS Nexxim

Dec 2013 + CUDA 5 Kepler Tuning ANSYS HFSS (Transient)
ANSYS Mechanical15.0 on Tesla GPUs
600
576 K40
K20
V14sp-5 Model
2.1X 2.2X
ANSYS Mechanical jobs/day
363
324 K40
K20 275 275
3.9X Turbine geometry

3.5X
Higher 2,100,000 DOF
is SOLID187 FEs
Better
93 93 Static, nonlinear
Distributed ANSYS 15.0
Direct sparse solver
2 CPU cores 2 CPU cores 2 CPU cores 2 CPU cores 8 CPU cores 7 CPU cores 8 CPU cores 7 CPU cores
+ Tesla K20 + Tesla K40 + Tesla K20 + Tesla K40
Simulation productivity (with a HPC license) Simulation productivity (with a HPC Pack)
Distributed ANSYS Mechanical 15.0 with Intel Xeon E5-2697 v2 2.7 GHz CPU; Tesla K20 GPU and a
Tesla K40 GPU with boost clocks.
Considerations for ANSYS Mechanical on GPUs
Problems with high solver workloads benefit the most from GPU
Characterized by both high DOF and high factorization requirements
Models with solid elements and have >500K DOF experience good speedups
Better performance when run on DMP mode over SMP mode

GPU and system memories both play important roles in performance
Sparse solver:
If the model exceeds 5M DOF, then either add another GPU with 5-6 GB of memory
(Tesla K20 or K20X) or use a single GPU with 12 GB memory (eg. Tesla K40)
PCG/JCG solver:
Memory saving (MSAVE) option should be turned off for enabling GPUs
Models with lower Level of Difficulty value (Lev_Diff) are better suited for GPUs
Abaqus/Standard
SIMUILA and Abaqus GPU Release Progression
Abaqus 6.11, June 2011
Direct sparse solver is accelerated on the GPU
Single GPU support; Fermi GPUs (Tesla 20-series, Quadro 6000)

Multi-GPU/node; multi-node DMP clusters
Flexibility to run jobs on specific GPUs
Fermi GPUs + Kepler Hotfix (since November 2012)

Un-symmetric sparse solver on GPU
Official Kepler support (Tesla K20/K20X/K40)
Rolls Royce: Abaqus 3.5x Speedup with 5M DOF
• 4.71M DOF (equations); ~77 TFLOPs
• Nonlinear Static (6 Steps)
Sandy Bridge + Tesla K20X Single Server • Direct Sparse solver, 100GB memory
Speed up relative to 8 core (1x)

20000 Elapsed Time in seconds Speed up relative to 8 core 3.5
3
15000 2.42x
2.5
10000
2.11x 2
5000
1.5
0 1
8c 8c + 1g 8c + 2g 16c 16c + 2g
Server with 2x E5-2670, 2.6GHz CPUs, 128GB memory, 2x Tesla K20X, Linux RHEL 6.2, Abaqus/Standard 6.12-2
Rolls Royce: Abaqus Speedups on an HPC Cluster
• 4.71M DOF (equations); ~77 TFLOPs
• Nonlinear Static (6 Steps)
Sandy Bridge + Tesla K20X for 4 x Servers • Direct Sparse solver, 100GB memory
9000
2.2x
Elapsed Time in seconds
6000
2.04X
1.9x
1.8X 1.8x
3000
0
24c 24c+4g 36c 36c+6g 48c 48c8g
2 Servers 3 Servers 4 Servers
Servers with 2x E5-2670, 2.6GHz CPUs, 128GB memory, 2x Tesla K20X, Linux RHEL 6.2, Abaqus/Standard 6.12-2
Abaqus/Standard ~15% Gain from K20X to K40
2.1x – 4.8x
1.9x – 4.1x
15% av 15% av
1.7x – 2.9x
1.5x – 2.5x
Abaqus 6.13-DEV Scaling on Tesla GPU Cluster
PSG Cluster: Sandy Bridge CPUs with 2x E5-2670 (8-core), 2.6 GHz, 128GB memory, 2x Tesla K20X, Linux RHEL 6.2, QDR IB, CUDA 5
Abaqus Licensing in a node and across a cluster
Cores Tokens GPU Tokens GPU Tokens
1 5 1 6 2 7
2 6 1 7 2 8
3 7 1 8 2 9
4 8 1 9 2 10 2 nodes: 2x 16 cores + 2x 2 GPUs
5 9 1 10 2 11
6 10 1 11 2 12
32 cores: 21 tokens
7 11 1 12 2 12 32 cores + 4 GPUs: 22 tokens
8 (1 CPU) 12 1 12 2 13
9 12 1 13 2 13 3 nodes: 3x 16 cores + 3x 2 GPUs
10 13 1 13 2 14
11 13 1 14 2 14 48 cores: 25 tokens
12 14 1 14 2 15 48 cores + 6 GPUs: 26 tokens
13 14 1 15 2 15
14 15 1 15 2 16
15 15 1 16 2 16
16 (2 CPUs) 16 1 16 2 16
Abaqus 6.12 Power consumption in a node
MSC Nastran
MSC Nastran 2013
Nastran direct equation solver is GPU accelerated
Sparse direct factorization (MSCLDL, MSCLU)
Real, Complex, Symmetric, Un-symmetric
Handles very large fronts with minimal use of pinned host memory
Lowest granularity GPU implementation of a sparse direct solver; solves unlimited
sparse matrix sizes
Impacts several solution sequences:
High impact (SOL101, SOL108), Mid (SOL103), Low (SOL111, SOL400)
Support of multi-GPU and for Linux and Windows

With DMP> 1, multiple fronts are factorized concurrently on multiple GPUs; 1 GPU
per matrix domain
NVIDIA GPUs include Tesla 20-series, Tesla K20/K20X, Tesla K40, Quadro 6000
CUDA 5 and later
19
MSC Nastran 2013
SMP + GPU acceleration of SOL101 and SOL103
6X
6
Higher serial 4c 4c+1g
is
Better
4.5
3 2.7X 2.8X
1.9X
1.5
1X 1X
Lanczos solver (SOL 103)
Sparse matrix factorization
0 Iterate on a block of vectors (solve)
SOL101, 2.4M rows, 42K front SOL103, 2.6M rows, 18K front
Orthogonalization of vectors
Server node: Sandy Bridge E5-2670 (2.6GHz), Tesla K20X GPU, 128 GB memory
MSC Nastran 2013
Coupled Structural-Acoustics simulation with SOL108
1X Europe Auto OEM

1000
Lower is 710K nodes, 3.83M elements
Better 100 frequency increments (FREQ1)
800
Elapsed Time (mins)
Direct Sparse solver

600
400 2.7X
4.8X 5.2X 5.5X

200
11.1X
0
serial 1c + 1g 4c (smp) 4c + 1g 8c (dmp=2) 8c + 2g
(dmp=2)
Server node: Sandy Bridge 2.6GHz, 2x 8 core, Tesla 2x K20X GPU, 128GB memory
MSC MARC
MARC 2013
Computational Fluid Dynamics
ANSYS Fluent
ANSYS and NVIDIA Collaboration Roadmap
Release ANSYS Mechanical ANSYS Fluent ANSYS EM
13.0 SMP, Single GPU, Sparse ANSYS Nexxim

Dec 2010 and PCG/JCG Solvers
14.0 + Distributed ANSYS; Radiation Heat Transfer ANSYS Nexxim

Dec 2011 + Multi-node Support (beta)
14.5 + Multi-GPU Support; + Radiation HT; ANSYS Nexxim

Nov 2012 + Hybrid PCG; + GPU AMG Solver (beta),
+ Kepler GPU Support Single GPU
15.0 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ANSYS Nexxim

Dec 2013 + CUDA 5 Kepler Tuning ANSYS HFSS (Transient)
How to Enable NVIDIA GPUs in ANSYS Fluent
Windows: Linux:
fluent 3ddp -g -ssh –t2 -gpgpu=1 -i journal.jou
Cluster specification:
nprocs = Total number of fluent processes
M = Number of machines
ngpgpus = Number of GPUs per machine
Requirement 1
nprocs mod M = 0
Same number of solver processes on each machine
Requirement 2
𝑛𝑝𝑟𝑜𝑐
mod ngpgpus = 0
𝑀
No. of processes should be an integer multiple of GPUs
Cluster Specification Examples
Single-node configurations:
CPU
GPU GPU GPU GPU GPU GPU GPU GPU GPU
16 mpi 8 mpi 8 mpi 5 mpi 5 mpi 5 mpi
Multi-node configurations:
GPU 8 mpi GPU GPU 7 mpi


GPU 8 mpi GPU 7 mpi
Note: The problem must fit in the GPU memory for the solution to proceed
Considerations for ANSYS Fluent on GPUs
GPUs accelerate the AMG solver of the CFD analysis
Fine meshes and low-dissipation problems have high %AMG
Coupled solution scheme spends 65% on average in AMG
In many cases, pressure-based coupled solvers offer faster convergence
compared to segregated solvers (problem-dependent)
The system matrix must fit in the GPU memory

For coupled PBNS, each 1 MM cells need about 4 GB of GPU memory
High-memory GPUs such as Tesla K40 or Quadro K6000 are ideal
Better performance with use of lower CPU core counts

A ratio of 4 CPU cores to 1 GPU is recommended
NVIDIA-GPU Solution fit for ANSYS Fluent
CFD analysis
Yes Is it single-phase & No

flow dominant?
Not ideal for GPUs

Which solver
do you use?
No
Is it a
Pressure–based
Segregated solver steady-state
Coupled solver
analysis?
Yes
Consider switching to the pressure-based coupled solver
Best-fit for GPUs for better performance (faster convergence) and further
speedups with GPUs.
29
ANSYS Fluent GPU Performance for Large Cases
Better speed-ups on larger and harder-to-solve problems
36 CPU cores 144 CPU cores
36 CPU cores + 12 GPUs Truck Body Model
144 CPU cores + 48 GPUs
36
13 2X
ANSYS Fluent Time (Sec)
1.4 X • External aerodynamics

• Steady, k-e turbulence
Lower
9.5 is • Double-precision solver
Better 18 • CPU: Intel Xeon E5-2667;
12 cores per node
• GPU: Tesla K40, 4 per node
NOTE: Reported times are

14 million cells 111 million cells Fluent solution time in
second per iteration
ANSYS Fluent 15.0 Performance – Results by NVIDIA, Dec 2013
GPU Acceleration of Water Jacket Analysis
ANSYS Fluent 15.0 performance on pressure-based coupled Solver
6391
Water jacket model

•
ANSYS Fluent Time (Sec)
4557 2.5x Unsteady RANS model

Lower • Fluid: water
is
Better • Internal flow
5.9x • CPU: Intel Xeon E5-2680;
2520 8 cores
• GPU: 2 X Tesla K40
775
CPU only CPU + GPU CPU only CPU + GPU NOTE: Times
AMG solver time for 20 time steps
Solution time
ANSYS Fluent GPU Study on Productivity Gains
ANSYS Fluent 15.0 Preview 3 Performance – Results by NVIDIA, Sep 2013
25
• Same solution times:
ANSYS Fluent Number of Jobs Per Day

Truck Body Model
64 cores vs. Higher
20 is
32 cores + 8 GPUs Better
15 16 16 14 M Mixed cells
• Frees up 32 CPUs Steady, k-e turbulence
and HPC licenses for 10
Coupled PBNS, DP
Total solution times
additional job(s) CPU: AMG F-cycle
5 64 Cores 32 Cores GPU: FGMRES with
AMG Preconditioner
• Approximate 56% + 8 GPUs
increase in overall 0
productivity for 25% 4 x Nodes x 2 CPUs

(64 Cores Total)
2 x Nodes x 2 CPUs
(32 Cores Total)
increase in cost 8 GPUs (4 each Node) NOTE: All results
fully converged
ANSYS 15.0 New HPC Licenses for GPUs
Treats each GPU socket as a CPU core, which significantly increases
simulation productivity of your HPC licenses
Needs 1 HPC task to

enable a GPU
All ANSYS HPC products unlock GPUs in 15.0, including HPC,

HPC Pack, HPC Workgroup, and HPC Enterprise products.
Computational Fluid Dynamics
OpenFOAM
NVIDIA GPU Strategy for OpenFOAM
Provide technical support for GPU solver developments
FluiDyna (implementation of NVIDIA’s AMG), Vratis and PARALUTION
AMG development by Russian Academy of Science ISP (A. Monakov)
Cufflink development by WUSTL now Engys North America (D. Combest)
Invest in strategic alliances with OpenFOAM developers

ESI and OpenCFD Foundation (H. Weller, M. Salari)
Wikki and OpenFOAM-extend community (H. Jasak)
Conduct performance studies and customer evaluations

Collaborations: developers, customers, OEMs (Dell, SGI, HP, etc.)
Library Culises
Concept and Features
Simulation tool e.g.
• State-of-the-art solvers for solution of linear
OpenFOAM® systems
– Multi-GPU and multi-node capable
– Single precision or double precision available
• Krylov subspace methods
– CG, BiCGStab, GMRES
for symmetric /non-symmetric matrices
– Preconditioning options
• Jacobi (Diagonal)
• Incomplete Cholesky (IC)
• Incomplete LU (ILU)
• Algebraic Multigrid (AMG), see below
• Stand-alone Multigrid method
– Algebraic aggregation and classical coarsening
Culises = Cuda Library for Solving Linear Equation – Multitude of smoothers (Jacobi, Gauss-Seidel, ILU etc. )
• Flexible interfaces for arbitrary applications
Systems e.g.: established coupling with OpenFOAM®
See also www.culises.com
GPU Acceleration of CFD in Industrial Applications using Culises and aeroFluidX GTC 2014
Summary hybrid approach
Advantage:
Simulation tool e.g. • Universally applicable (coupled to
OpenFOAM® simulation tool of choice)
• Full availability of existing flow models
• Easy/no validation needed
• Unsteady approach better for hybrid due
to large linear solver times
Disadvantages:
• Hybrid CPU-GPU produces overhead
• In case that solution of linear system not
dominant
→ Application speedup can be limited
aeroFluidX
an extension of the hybrid approach
CPU flow solver • Porting discretization of equations to GPU
aeroFluidX
e.g. OpenFOAM® GPU implementation  discretization module (Finite Volume)
running on GPU
preprocessing  Possibility of direct coupling to Culises
 Zero overhead from CPU-GPU-CPU memory transfer
and matrix format conversion
FV module  Solution of momentum equations also beneficial
discretization FV module
• OpenFOAM® environment supported
 Enables plug-in solution for OpenFOAM®
customers
Linear solver
Culises
Culises  But communication with other
input/output file formats possible
postrocessing
aeroFluidX
Cavity flow
• CPU: Intel E5-2650 (all 8 cores)
GPU: Nvidia K40 Normalized computing time
• 4M grid cells (unstructured) 100 all assembly
• Running 100 SIMPLE steps with: 90 1x all linear solve
– OpenFOAM® (OF) 80
• pressure: GAMG 70
• Velocitiy: Gauss-Seidel 60
– OpenFOAM® (OFC) 50 1x
• Pressure: Culises AMGPCG (2.4x) 40 2.1x
• Velocity: Gauss-Seidel 30 1x
– aeroFluidX (AFXC) 20 1.96x
• Pressure: Culises AMGPCG 10 2.22x
• Velocity: Culises Jacobi 0
• Total speedup: OpenFOAM OpenFOAM+Culises aeroFluidX+Culises
– OF (1x)
– OFC 1.62x all assembly = assembly of all linear systems (pressure and velocity)
– AFXC 2.20x all linear solve = solution of all linear systems (pressure and velocity)
PARALUTION
C++ Library to perform various sparse iterative
solvers and preconditioner
Contains Krylov subspace solvers (CR, CG, BiCGStab,

GMRES, IDR), Multigrid (GMG, AMG), Deflated PCG,
….
Multi/many-core CPU and GPU support
Allows seamless integration with other

scientific software
PARALUTION Library is Open Source released

under GPL v3 www.paralution.com
PARALUTION OpenFOAM plugin
OpenFOAM Plugin will be released soon

Computational Electromagnetics
CST Studio Suite

CST - Company and Product Overview
 CST AG is one of the two largest suppliers of 3D EM simulation software.
 CST STUDIO SUITE is an integrated solution for 3D EM simulations it includes a parametric modeler, more
than 20 solvers, and integrated post-processing. Currently, three solvers support GPU Computing.
CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com

New GPU Cards - Quadro K6000/Tesla K40
The Quadro K6000 is the new high-end graphics adapter of the Kepler series whereas the
Tesla K40 card is the new high-end computing device. CST STUDIO SUITE 2013 supports
both cards for GPU computing with service pack 5.
Speedup K20 vs Quadro K6000/Tesla K40
www.nvidia.com
-The Quadro K6000/Tesla K40 card is about 30..35% faster than the K20 card.
-12 GB onboard RAM allows for larger model size.

GPU Computing Performance
GPU computing performance has
Speedup of Solver Loop
18 been improved for CST STUDIO SUITE
16
2014 as CPU and GPU resources are
used in parallel.
14
12
GPU
Speedup
10
8
CPU
6
CST STUDIO SUITE 2013
4
CST STUDIO SUITE 2014
2
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Number of GPUs (Tesla K40)
Benchmark performed on system equipped with dual Xeon E5-2630 v2 (Ivy Bridge EP) processors, and four Tesla K40 cards. Model has 80 million mesh cells.

MPI Computing — Performance
CST STUDIO SUITE® offers native support for high speed/low latency networks
MPI Cluster System Speedup of Solver Loop

CST STUDIO SUITE Frontend 25
CPU
20 2 GPUs (K20) per Node
4 GPUs (K20) per Node

Cluster Interconnect
15
Speedup
10
GPU Hardware
Benchmark Model Features:
0 -Open boundaries
Note: A GPU accelerated 1 1.5 2 2.5 3 3.5 4 -Dispersive and lossy
material
cluster system requires Number of Cluster Nodes
high-speed network in
Base model size is 80 million cells. Problem size is scaled up linearly with the number of cluster nodes
order to perform well! (i.e., weak scaling). Hardware: dual Xeon E5-2650 processors, 128GB RAM per node (1600MHz),
Infiniband QDR interconnect (40Gb/s).

Conclusion
GPUs provide significant performance acceleration for solver
intensive large jobs
Shorter product engineering cycles (Faster Time-to-Market) with improved
product quality
Cut down energy consumption in the CAE process
Better Total Cost of Ownership (TCO)
GPUs for 2nd level parallelism, preserves costly MPI investment
GPU acceleration contributing to growth in emerging CAE
New ISV developments in particle based CFD (LBM, SPH, etc.)
Rapid growth for range of CEM applications and GPU adoption
Simulations recently considered intractable are now possible
Large Eddy Simulation (LES) with a high degree of arithmetic intensity
Parameter optimization with highly increased number of jobs
The Visual Computing Company
Axel Koehler
akoehler@nvidia.com
NVIDIA, the NVIDIA logo, GeForce, Quadro, Tegra, Tesla, GeForce Experience, GRID, GTX, Kepler, ShadowPlay, GameStream, SHIELD, and The Way It’s Meant To Be Played are trademarks and/or
registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.
© 2014 NVIDIA Corporation. All rights reserved.

8 Nvidia PDF

Uploaded by

Copyright:

Available Formats

8 Nvidia PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8 Nvidia PDF

Uploaded by

Copyright:

Available Formats

The Visual Computing Company

GPU Acceleration Benefits for Applied CAE

General overview about GPU efforts in CAE

ANSYS / ANSYS Mechanical

#1 ANSYS / ANSYS Mechanical

13.0 SMP, Single GPU, Sparse ANSYS Nexxim

14.0 + Distributed ANSYS; Radiation Heat Transfer ANSYS Nexxim

14.5 + Multi-GPU Support; + Radiation HT; ANSYS Nexxim

15.0 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ANSYS Nexxim

3.9X Turbine geometry

Better performance when run on DMP mode over SMP mode

Abaqus 6.12, June 2012

Abaqus 6.13, June 2013

Speed up relative to 8 core (1x)

2 Servers 3 Servers 4 Servers

Support of multi-GPU and for Linux and Windows

1X Europe Auto OEM

Direct Sparse solver

4.8X 5.2X 5.5X

13.0 SMP, Single GPU, Sparse ANSYS Nexxim

14.0 + Distributed ANSYS; Radiation Heat Transfer ANSYS Nexxim

14.5 + Multi-GPU Support; + Radiation HT; ANSYS Nexxim

15.0 + CUDA 5 Kepler Tuning + Multi-GPU AMG Solver; ANSYS Nexxim

GPU 8 mpi GPU GPU 7 mpi

GPU 8 mpi GPU GPU 7 mpi

The system matrix must fit in the GPU memory

Better performance with use of lower CPU core counts

Yes Is it single-phase & No

Not ideal for GPUs

1.4 X • External aerodynamics

NOTE: Reported times are

Water jacket model

4557 2.5x Unsteady RANS model

ANSYS Fluent Number of Jobs Per Day

productivity for 25% 4 x Nodes x 2 CPUs

Needs 1 HPC task to

All ANSYS HPC products unlock GPUs in 15.0, including HPC,

Invest in strategic alliances with OpenFOAM developers

Conduct performance studies and customer evaluations

Contains Krylov subspace solvers (CR, CG, BiCGStab,

Multi/many-core CPU and GPU support

Allows seamless integration with other

PARALUTION Library is Open Source released

OpenFOAM Plugin will be released soon

CST Studio Suite

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com

Speedup K20 vs Quadro K6000/Tesla K40

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com

MPI Cluster System Speedup of Solver Loop

20 2 GPUs (K20) per Node

4 GPUs (K20) per Node

CST – COMPUTER SIMULATION TECHNOLOGY | www.cst.com

© 2014 NVIDIA Corporation. All rights reserved.

You might also like