Greg Corke gets hands on with Ansys Mechanical 17.0 to explore how firms can get
more out of their budget when specifying a workstation for Finite Element Analysis

onsidering the savings that machine is simply not realistic for most STRIKING A BALANCE
simulation can bring to design firms, and, in some cases, a waste of money, Unlike CAD, which is mostly about having a
and manufacturing it is almost considering the small additional return you high GHz CPU, simulation software can put
criminal to think that some might get over a workstation half its price. huge stresses on all parts of a workstation.
designers use underpowered By building a balanced machine, optimised Finding the right balance between CPU,
workstation hardware for for custom simulation workflows, firms can memory and storage is critical.
solving complex CAE problems. make their workstation budget go a whole There is no point in having two 22 core Intel
With optimised workstation hardware lot further. Even some minor workstation Xeon CPUs if data is fed into them through
simulations don’t need to take hours. upgrades can have a dramatic impact on slow storage. Your workstation is only as fast
Engineers can do more in a much shorter cutting solve times. as your slowest component.
period of time, exploring many different The aim of this article is to help uses
design options, rather than verifying one or of simulation software gain a better HPC ON THE DESKTOP
two. True optimisation studies can become understanding of how different workstation High Performance Computing (HPC) used
a reality. components can impact performance in to mean a supercomputer or a cluster – a high
The accuracy of simulations can also be Finite Element Analysis (FEA) solvers. speed network of servers or workstations.
increased by not having to limit the fidelity While all benchmarking was done with But, with the rise of multi-core CPUs,
of models. Users can simulate whole Ansys 17.0 many of the concepts will be more recently the term can be applied to
assemblies, rather than just parts; multi- valid for other FEA applications. However, workstations as well.
physics, rather than multi-model. Ansys does pride itself on just how well its Of course, workstations come in all shapes
Investing in a fully loaded dual Intel Xeon solvers scale across multiple CPU cores, and sizes from mobile to desktop; single quad
workstation will no doubt solve some of these particularly with this latest release (see box core CPU to dual 22 core CPU. While those
issues. But spending £10k to £20k on a single out on page WS7 for more info). serious about simulation will mostly use a

This ultra-high-end dual Xeon E5-2640 V4, which Connector card, leading to
CPU workstation is for boasts the same GHz but a total of four NVMe SSDs.
designers who take their only 10 cores. These SSDs can also be
simulation seriously. 128GB (8 x 16GB) of RAIDed to boost sustained
With up to two Intel Xeon DDR4 ECC memory is a read/write performance
E5-2600 v4 series CPUs, good starting point for above the respective
1TB DDR4 ECC memory, 14 simulation, but with eight theoretical 2,150 MB/sec
storage devices and three DIMM slots free there’s still and 1,550 MB/s of a single
ultra high-end GPUs it has scope to ramp this up for Samsung SM951.
the capacity and flexibility more demanding workflows. With the obvious benefits
to handle the most Storage is its forte. A that NVMe SSDs bring to
demanding FEA workflows. single 512GB Samsung SM951 simulation solvers (see
Our test machine’s two NVMe SSD is supplemented box out on page WS8) we
Intel Xeon E5-2680 v4 CPUs by a 2.5-inch 512GB SATA SSD see little reason to specify which is ideal for handling stand out is the beautifully » 2x Intel Xeon
offer a good balance of price and two 4TB 7,200RPM SATA a system with a 2.5-inch 3D models in most CAD and engineered chassis. Tool E5-2680 v4 CPUs (14C)
/ performance. Running at HDDs for data. SATA SSD — unless you CAE software. For those free access to virtually (2.4GHz up to 3.3GHz)
2.4GHz (up to 3.3GHz) there’s The M.2 form factor NVMe need a very high capacity interested in accelerating all the key components » 128GB DDR4 RAM
a strong foundation for drive is mounted on a FLEX SSD RAID array. However, their simulation solvers makes maintenance and » Nvidia Quadro M2000
single threaded applications. Connector card, a custom the two 4TB SATA HDDs, with co-processors, be it an customisation incredibly
With 14 cores apiece (28 small footprint PCIe add-in » 512GB NVMe SSD +
which can be configured in Nvidia Tesla GPU or Intel easy. Swapping drives, in
in total) there was also board that sits parallel to RAID 0 (for performance) Xeon Phi, there’s also room particular, is a joy with 2 x 4TB SATA HDDs
plenty of power in reserve the motherboard. There’s or RAID 1 (for redundancy) for three double height both 2.5-inch and 3.5-inch
to run all of our Ansys 17.0 » Microsoft
room for a second M.2 offer a cost effective way add-in cards. drives clipping easily into
Windows 10 Pro OS
test simulations. Those NVMe drive, onto which we of storing huge simulation The capabilities of this the four FLEX Drive trays,
on a budget may consider placed a retail Samsung models and results data. hugely powerful workstation then slotting into the bays £5,900 (ex VAT)
dropping down to the SSD 950 Pro. The machine Graphics is handled by a are undoubted, but the to automatically mate with
significantly cheaper Intel can also host a second FLEX Nvidia Quadro M2000 GPU, Lenovo.com
thing that really makes it power and data.



Ansys 17.0 engine

block static
structural analysis

workstation with two Intel Xeon CPUs, there CPUs and lots of cores.
are still many users out there with much The CPU currently best suited to
lower specced machines, including desktop simulation is the ‘Broadwell-EP’ Intel Xeon ANSYS 17.0
PCs and mobile workstations. E5-2600 v4 series, available in dual processor PERFORMANCE
Indeed, in a survey carried out by Intel and
Ansys in late 2014, 35% of respondents said
workstations such as our test machine, the
Lenovo ThinkStation P910. This high-end
they used a machine with a single CPU and chip not only has a large number of cores One of the biggest advancements in the recently released
18% used a consumer grade PC. (models range from 4 to 22) but it can Ansys 17.0 simulation software suite is a significantly
optimised HPC solver architecture. It is specifically
support huge amounts of ECC memory.
designed to take advantage of new generation Intel
TEST CASES Its high-bandwidth quad channel memory processor technologies and large numbers of CPU cores.
For the scope of this article, we tested with architecture also contributes to faster solve The biggest benefits should be seen by those using
four ‘typical usage’ mechanical simulation times (see later). Intel ‘Haswell’ Xeon E5-2600 v3 or Intel ‘Broadwell’
problems taken from the Ansys Mechanical Another option is the ‘Broadwell-EP’ Intel Xeon E5-2600 v4 CPU architectures. Both of these
processor families feature new Intel AVX-2 compiler
Benchmark Suite. Problems range from 3.2 Xeon E5-1600 v4 series, which is for single instructions and Intel Math Kernel Libraries that are
million degrees of freedom (DOF — number processor workstations. supported in Ansys 17.0.
of equations) to 14.2 million DOF. If your budget is extremely limited, try the Ansys has also enhanced its Distributed Memory
Two used the iterative PCG solver: a static quad core Intel Xeon E3-1200 v5 and, for Parallel (DMP) processing capabilities, a technique
that divides up a simulation into portions that can be
structural analysis of a farm tractor rear axle mobile workstations, the Intel Xeon E5-1500
computed on separate cores.
assembly and a static structural analysis of an v5 CPU. We would not recommend Intel Here, the move to Intel Message Passing Interface
engine block . Core i5 or Core i7 as these CPUs do not (MPI) — the communications channel that lets each Ansys
Two used the sparse (direct) solver: a static support ECC memory (see later). process exchange data with other processes involved in
nonlinear structural analysis of a turbine Virtually all workstation-class CPUs feature the DMP simulation — has helped Ansys make DMP the
default standard for Ansys Workbench instead of Shared
blade and a transient nonlinear structural Intel Hyper-threading (HT), a virtual core Memory Parallel (SMP) processing.
analysis of an electronic ball grid array technology that turns each physical CPU core Ansys says DMP will deliver more efficient performance
All four problems were solved in two into two virtual cores. So a 12 core processor for simulations involving more than four compute cores
modes - Shared Memory Parallel (SMP) and with HT actually has 24 virtual cores (or running in parallel.
There have also been a number of new software code
Distributed Memory Parallel (DMP). threads). HT is generally not recommended
optimisations including a completely new algorithm that
DMP processing, now the default mode for simulation. It can be turned off in the optimises the matrix factorisation stage of the sparse
in Ansys 17.0, runs separate executables BIOS or by using a workstation optimisation solver.
on individual CPU cores. It typically tool such as the Lenovo Performance Tuner.
means better performance for simulations Throwing a huge number of CPU
involving more than four compute cores cores at a simulation problem does not
running in parallel. necessarily translate to faster solve times.
There are usually diminishing returns, as
CENTRAL PROCESSING UNIT (CPU) demonstrated in our suite of Ansys 17.0
Modern CPUs are made up of multiple Mechanical Benchmarks which were run on
processors (called cores). As FEA solvers are our Lenovo ThinkStation P910 with two 14
multi-threaded (i.e. the processing load can core Intel Xeon E5-2680 v4 CPUs.
be spread across multiple cores) there is a In our tests, we found 16 to be the
big benefit to using a workstation with two optimum number of cores, though the



performance difference between 12 and costs. Many simulation tools, including Kitting out your workstation with huge
16 cores was not that big (see page WS11). Ansys, charge more for running simulations amounts of ECC RAM is not always
‘Scalability’ (or how well more cores across multiple cores. possible. Simulation problems can run into
contributes to quicker solve times) can hundreds of gigabytes so either the cost may
vary quite significantly. It depends on the MEMORY be prohibitive or your workstation simply
type of simulation and solver, and the size Workstation memory is absolutely crucial may not support that much (many single
and complexity of the problem. We highly for simulation. In an ideal world you should CPU desktop and mobile workstations only
recommend you test out your own datasets have enough to hold even your most complex support a maximum of 32GB or 64GB).
to see where this sweet spot lies. simulation problems entirely in memory. Out of our four Ansys 17 simulation
The clock speed of the CPU is also This allows the CPU to access this cached problems, the Turbine (V17sp-4) used the
important – the higher the GHz, the more data very quickly. most memory. At 85GB this comfortably
floating point operations it can perform If you can’t hold the whole model in fitted within the 128GB available in our
per second. Having a high GHz CPU will memory, data has to be stored in virtual Lenovo ThinkStation P910 workstation.
also boost overall system performance memory or ‘swap space’ on your workstation’s But there will be times when you don’t have
and that of single threaded applications. Hard Disk Drive (HDD) or Solid State Drive enough (some simulation problems need
If you intend to run CAD and simulation (SSD), which are much slower. 100s of GB). This increases the importance
software on the same workstation this is an Memory bandwidth, or the rate at which of high-performance storage.
important consideration. data can be read from or written to the CPU,
One final consideration is STORAGE
cache, a small store of memory  Workstation storage typically
located directly on the CPU. A Throwing a huge number of CPU cores at a comes in two forms: Hard Disk
bigger cache can mean the CPU Drive (HDDs) and Solid State
can access frequently used data simulation problem does not necessarily Drives (SSDs). HDDs offer a
much quicker and can help boost translate to faster solve times much better price per GB but
Of course, engineers don’t tend
 performance is much slower.
SSDs are more expensive but
to do one thing at a time. When choosing a can have also have a big impact on solve performance is significantly faster.
CPU, consider that you may want do other times. This is governed by the number of To understand why, it is important to
tasks concurrently. Multi-tasking could memory channels — the more there are, the appreciate how each storage technology works.
mean prepping a new study, or something higher the memory bandwidth. With HDDs, data is stored on platters that spin
more processor-intensive like mesh Intel Xeon E5-2600 v4 series CPUs, for at high speeds. In order to read or write data,
generation or running other simulations in example, feature quad channel DDR4 the mechanical drive head has to physically
parallel. With too few cores and too many of memory, which has a theoretical maximum move across the platter, much like a laser
these processes running at the same time bandwidth of 77GB/sec. Intel Xeon E3-1200 moving across a CD when skipping from track
your workstation may grind to a halt. v5 series CPUs, on the other hand, only have to track. With the huge volumes of data needed
In order to keep your workstation running dual channel DDR4 memory at 34GB/sec. for simulation this can quickly become a
smoothly, Ansys 17.0 allows you to limit the Another important consideration is the bottleneck, especially when a large simulation
number of CPU cores used by the solver. type of RAM you use. ECC (Error Correcting problem cannot be held entirely in memory.
Workstation optimisation utilities such as Code) memory is strongly recommended as SSDs are different insofar as they contain no
Lenovo Performance Tuner also allow you it protects against crashes by detecting and moving parts at all. Data is stored on an array
to assign specific cores to specific processes. rectifying errors. Such errors may happen once of NAND flash memory, which is managed
Time spent tuning your workstation can in a blue moon but, if it means you don’t crash by a controller — a dedicated processor that
reap huge rewards. in the middle of a simulation you have set to provides the bridge to the workstation.
One other consideration when choosing run overnight, it is well worth paying the small SSDs boast significantly better sustained
a CPU for simulation is software licensing premium compared to non-ECC memory. read / write performance, which is important

THE IMPORTANCE OF Chart showing comparative solves times with different storage technologies.
When loaded, the V17sp-4.dat dataset will takes up 82GB of system memory, so by

FAST STORAGE reducing the system memory to 32GB it forces part of the simulation into swap space.

With dual Xeon workstations simulation studies on workstation system memory

capable of supporting up to occasion, or c) your is reduced to 32GB (forcing 1,996 secs HGST SATA HDD + 32GB RAM
1TB of memory, it is possible workstation simply cannot the simulation to use 50GB
to solve some exceedingly hold more memory, having of swap space) solve times
complex simulation models fast storage becomes even vary dramatically. 480 secs Micron M600 SATA SSD + 32GB RAM
entirely in memory. But if more important. The most interesting take
a) your budgets are tight, b) In the chart right you can away from this study was not
you only run large see how storage performance how slow the HDD was but 299 secs Samsung 950 Pro NVMe (PCIe) SSD + 32GB RAM
memory impacts solve times. just how fast the Samsung
When the simulation 950 Pro NVMe SSD was. It
is run entirely in system was not only 37% faster than 352 secs HGST SATA HDD + 128GB RAM
memory (using 82GB the Micron M600 SATA SSD,
out of an available but only 10% slower than
128GB) the impact of when running the simulation 279 secs Micron M600 SATA SSD + 128GB RAM)
the different storage entirely in system memory.
technologies is Learn more about the
small. However, Samsung 950 Pro SSD at 271 secs Samsung 950 Pro NVMe (PCIe) SSD + 128GB RAM
when the tinyurl.com/950-PRO-SSD

0 500 1000 1500 2000


1c 2,671 secs
TRACTOR REAR AXLE (V17cg-2) (PCG solver) 2c 1,930 secs
Static structural analysis of a farm tractor rear axle assembly
4c 1,061 secs
8c 591 secs
12c 431 secs
16c 478 secs
Chart showing solve times with
20 500 secs different numbers of CPU cores.
24 523 secs Test machine: Lenovo ThinkStation P910
2 x Intel Xeon E5 2680v4 CPUs (14 cores)
28 491 secs 128GB DDR4-2400 memory

0 500 1000 1500 2000 2500 3000

1c 4,196 secs
ENGINE BLOCK (V17cg-3) (PCG solver)
2c 2,281 secs
Static structural analysis of an engine block without the internal
components 4c 1,363 secs
8c 658 secs
12c 630 secs
16c 491 secs
20c 443 secs Chart showing solve times with
different numbers of CPU cores.
24c 463 secs Test machine: Lenovo ThinkStation P910
2 x Intel Xeon E5 2680v4 CPUs (14 cores)
28c 438 secs 128GB DDR4-2400 memory

1000 2000 3000 40001,326 secs
TURBINE (V17sp-4) (Sparse solver) 2c 768 secs
Static nonlinear structural analysis of a turbine blade as found in
aircraft engines 4c 496 secs
8c 320 secs
12c 268 secs
16c 217 secs
Chart showing solve times with
20c 198 secs different numbers of CPU cores.
24c 207 secs Test machine: Lenovo ThinkStation P910
2 x Intel Xeon E5 2680v4 CPUs (14 cores)
28c 192 secs 128GB DDR4-2400 memory

300 600 900 1200 1500
1,765 secs
BGA (V17sp-5) (Sparse solver) 2c 1,016 secs
Transient nonlinear structural analysis of an electronic ball grid array
4c 626 secs
8c 461 secs
12c 323 secs
16c 265 secs
Chart showing solve times with
20c 261 secs
different numbers of CPU cores.
24c 246 secs Test machine: Lenovo ThinkStation P910
2 x Intel Xeon E5 2680v4 CPUs (14 cores)
28c 327 secs 128GB DDR4-2400 memory

0 500 1000 1500 2000

for large continuous datasets. They also offer PCIe board, which is about the same size as a combined in a RAID array to boost read /
superior response times (latency) and better small graphics card. write performance. The HP Z Turbo Drive
random read / write performance. The main difference between the SATA Quad Pro, for example, integrates up to four
There are two main types of SSDs: SATA and NVMe SSDs is their sustained read/ NVMe modules on a PCIe x16 card to deliver
and NVMe (PCIe). SATA SSDs come in the write performance. SATA SSDs can usually sequential performance up to 9.0GB/s.
familiar 2.5-inch form factor. NVMe SSDs read/write data at around 500MB/sec but This kind of bandwidth could be useful for
mostly come in the M.2 2280 form factor with NVMe this ranges from 1,500MB/sec particularly complex simulation problems.
(22mm x 80mm) which is similar in size to 2,500MB/sec. Fast storage becomes critical when you can’t
to a stick of memory, but also as an add-in Multiple HDDs and SSDs can also be hold the whole simulation job in memory and



(Right) Chart showing relative performance

of Shared Memory Parallel (SMP) 1.94

Shared Memory Parallel (SMP)
and Distributed Memory Parallel (DMP)
modes in Ansys 17.0 using 1.68
Distributed Memory Parallel (DMP)
BGA (V17sp-5) (Sparse solver) model.
(bigger is better)

1.26 1.31
Results show that the benefit of DMP
increases with core count 1.10 1.15
Test machine: Lenovo ThinkStation P910 1.00 1.00 1.00 1.00 1.00 1.00

2 x Intel Xeon E5 2680v4 CPUs

(Below) Interior of Lenovo

ThinkStation P910 workstation

128GB DDR4-2400 memory

1 core 2 cores 4 cores 8 cores 12 cores 16 cores

data has to be moved in and out of swap space. of data that can be written over its lifetime Nvidia Tesla, AMD FirePro and AMD Radeon
To simulate this scenario, we reduced the or warranty period. Don’t be tempted to save Pro, now have anywhere up to 32GB so this is
system memory inside our ThinkStation money on consumer-focused SSDs as these less of an issue. The AMD Radeon Pro SSG,
P910 from 128GB to 32GB then tested with have lower endurance ratings and, on paper, a completely new type of GPU, even has an
three different storage technologies: a 4TB will fail before professional SSDs. Investing on-board 1TB NVMe Solid State Drive (SSD)
7,200RPM HDD (HGST), a 512GB SATA in a drive with good endurance is particularly to give fast access to giant datasets.
SSD (Micron M600) and a 512GB NVMe important for disc intensive simulation Most high-end simulation software
SSD (Samsung SSD 950 Pro). The results workflows with lots of read/write operations. developers offer some level of support for
were quite astounding. 0.0 what0.about
So 5 1.HDDs?
0 1.5 There2.0 is still a place GPU compute but this is usually limited to
We weren’t surprised to see the system for mechanical hard drives in simulation for certain solvers. Support is either through
grind to a halt when using the HDD (no storing legacy projects or results data. With OpenCL, an open standard championed
one should be using a standard HDD for a 4TB 7,200 RPM available for under £100, by AMD and Intel, or CUDA, a bespoke
simulation). But were quite bowled over by and an 8TB model for £250, SSDs simply technology from Nvidia.
the performance of the Samsung 950 Pro can’t compete on price per GB and capacity. Intel also has a co-processor, the Intel Xeon
NVMe SSD (see box out on page WS8 for Phi. While this is not a GPU (it comprises
detailed information). GPU (COMPUTE) tens of x86 cores), it is another add-in board
In short, an NVMe SSD should be Although we did not test any Graphics that can be used to accelerate simulation
considered essential for simulation. In fact, Processing Units (GPUs) in the scope of this software.
the only reason we can see to specify a SATA article, it is worth dedicating a few sentences
SSD would be if you need more storage to this interesting technology. CONCLUSION
capacity. The Samsung 950 Pro NVMe SSD There was a time when GPUs were used Money spent on high-end workstations pales
only comes in 256GB and 512GB models, solely for graphics, but in recent years they into insignificance compared to the savings
whereas the Samsung 850 Pro SATA SSD have also transformed into co-processors that that can be made through design optimisation.
goes up to 2TB. (N.B. Samsung recently can be used for compute functions. Their But even with this huge incentive, many firms
announced a 1TB Samsung SM961, an OEM highly parallel architecture (1,000s of cores, still have to stick to tight budgets when it
targeted NVMe SSD). rather than 10s) makes them well suited comes to specifying machines for CAE.
Even if you are not on the look-out for a to solving complex simulation problems. The price differential between a
new machine, upgrading your workstation Models must support double precision workstation for CAD and one for engineering
to an NVMe SSD could be one of the most floating point operations. simulation can be huge but with a careful
significant investments you can make. With The first ‘double precision’ GPUs had choice of hardware components, matched to
the 256GB and 512GB Samsung 950 Pro relatively small memory footprints. This your firm’s workflows and datasets, this gap
only costing £125 and £220 respectively this meant there were limitations in the size of can be narrowed.
is not a lot of money — particularly when simulation problems that they could solve. In summary, don’t get seduced by the top-
you consider that 128GB (4 x 32GB) DDR4 Modern cards, including Nvidia Quadro, end CPUs. Large numbers of cores come at
ECC memory will set you back a big premium and more doesn’t
around £1,000. always mean faster. Choose
It should be noted that enough RAM to handle day to
NVMe drives are only day jobs, but ask yourself if you
supported natively on the really so much just to handle the
latest generation desktop and exceptionally large studies you do
mobile workstations. However, once a month. Always invest in
it is still possible to upgrade fast storage — NVMe SSDs are a
older generation desktop must. Make sure your machine is
workstations by buying a low balanced — a workstation tuning
cost PCIe-add in card, which tool like Lenovo Performance
hosts a single NVMe SSD. Tuner can track resources over
Finally, it’s important to time, helping identifying where
write a few words about bottlenecks are. Finally, grab your
SSD endurance. SSDs are stopwatch, load up your Excel
typically rated by terabytes spreadsheet, and test, test, and
written (TBW) – the amount test again.


