Skip to content

Commit ba1a313

Browse files
author
MFC Action
committed
Docs @ c78933f
1 parent eeba609 commit ba1a313

File tree

2 files changed

+14
-30
lines changed

2 files changed

+14
-30
lines changed

documentation/md_expectedPerformance.html

Lines changed: 9 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -137,40 +137,24 @@
137137
<div class="textblock"><p><a class="anchor" id="autotoc_md53"></a> MFC has been benchmarked on several CPUs and GPU devices. This page shows a summary of these results.</p>
138138
<h1><a class="anchor" id="autotoc_md54"></a>
139139
Expected time-steps/hour</h1>
140-
<p>The following table outlines expected performance in terms of the number of time steps per hour, rounded to the nearest hundred (higher is better). A 3D inviscid, 6-equation problem is solved for various problem sizes (grid cells) and hardware. A 3rd order (3-stage) Runge-Kutta time-stepper is used. CPU results utilize an entire processor die.</p>
140+
<p>The following table outlines observed performance as nanoseconds per grid point (ns/GP) per right-hand side evaluation (lower is better). We solve an example 3D, inviscid, 5-equation model problem with two advected species (a total of 8 PDEs). The numerics are WENO5 and the HLLC approximate Riemann solver. We report results for various numbers of grid points per CPU die (or GPU device) and hardware.</p>
141141
<table class="markdownTable">
142142
<tr class="markdownTableHead">
143-
<th class="markdownTableHeadRight">Hardware </th><th class="markdownTableHeadCenter"># Cores </th><th class="markdownTableHeadCenter">Steps/Hr (1M pts) </th><th class="markdownTableHeadCenter">Steps/Hr (4M pts) </th><th class="markdownTableHeadCenter">Steps/Hr (8M pts) </th><th class="markdownTableHeadCenter">Compiler </th><th class="markdownTableHeadLeft">Computer </th></tr>
143+
<th class="markdownTableHeadRight">Hardware </th><th class="markdownTableHeadCenter"></th><th class="markdownTableHeadCenter">1M GPs </th><th class="markdownTableHeadCenter">4M GPs </th><th class="markdownTableHeadCenter">8M GPs </th><th class="markdownTableHeadCenter">Compiler </th><th class="markdownTableHeadLeft">Computer </th></tr>
144144
<tr class="markdownTableRowOdd">
145-
<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">88.5k </td><td class="markdownTableBodyCenter">18.7k </td><td class="markdownTableBodyCenter">N/A </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr>
145+
<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">96 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr>
146146
<tr class="markdownTableRowEven">
147-
<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">78.8k </td><td class="markdownTableBodyCenter">18.8k </td><td class="markdownTableBodyCenter">N/A </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr>
147+
<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">101 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr>
148148
<tr class="markdownTableRowOdd">
149-
<td class="markdownTableBodyRight">NVIDIA A100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">114.4k </td><td class="markdownTableBodyCenter">34.6k </td><td class="markdownTableBodyCenter">16.5k </td><td class="markdownTableBodyCenter">NVHPC 23.5 </td><td class="markdownTableBodyLeft">Wingtip </td></tr>
149+
<td class="markdownTableBodyRight">NVIDIA A100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">71 </td><td class="markdownTableBodyCenter">56 </td><td class="markdownTableBodyCenter">59 </td><td class="markdownTableBodyCenter">NVHPC 23.5 </td><td class="markdownTableBodyLeft">Wingtip </td></tr>
150150
<tr class="markdownTableRowEven">
151-
<td class="markdownTableBodyRight">AMD MI250X </td><td class="markdownTableBodyCenter">1 (GCD) </td><td class="markdownTableBodyCenter">77.5k </td><td class="markdownTableBodyCenter">22.3k </td><td class="markdownTableBodyCenter">11.2k </td><td class="markdownTableBodyCenter">CCE 16.0.1 </td><td class="markdownTableBodyLeft">OLCF Frontier </td></tr>
151+
<td class="markdownTableBodyRight">AMD MI250X </td><td class="markdownTableBodyCenter">1 GCD </td><td class="markdownTableBodyCenter">108 </td><td class="markdownTableBodyCenter">90 </td><td class="markdownTableBodyCenter">96 </td><td class="markdownTableBodyCenter">CCE 16.0.1 </td><td class="markdownTableBodyLeft">OLCF Frontier </td></tr>
152152
<tr class="markdownTableRowOdd">
153-
<td class="markdownTableBodyRight">Intel Xeon Gold 6226 </td><td class="markdownTableBodyCenter">12 (cores) </td><td class="markdownTableBodyCenter">2.5k </td><td class="markdownTableBodyCenter">0.7k </td><td class="markdownTableBodyCenter">0.4k </td><td class="markdownTableBodyCenter">GNU 10.3.0 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr>
153+
<td class="markdownTableBodyRight">Intel Xeon Gold 6226 </td><td class="markdownTableBodyCenter">12 cores </td><td class="markdownTableBodyCenter">1963 </td><td class="markdownTableBodyCenter">1688 </td><td class="markdownTableBodyCenter">1686 </td><td class="markdownTableBodyCenter">GNU 10.3.0 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr>
154154
<tr class="markdownTableRowEven">
155-
<td class="markdownTableBodyRight">Apple Silicon M2 </td><td class="markdownTableBodyCenter">6 (cores) </td><td class="markdownTableBodyCenter">2.8k </td><td class="markdownTableBodyCenter">0.6k </td><td class="markdownTableBodyCenter">0.2k </td><td class="markdownTableBodyCenter">GNU 13.2.0 </td><td class="markdownTableBodyLeft">N/A </td></tr>
156-
</table>
157-
<p>We also show the expected performance of MFC for the same problem as above, except for the 5-equation model used, in the table below. It is presented in the same manner as the one above.</p>
158-
<table class="markdownTable">
159-
<tr class="markdownTableHead">
160-
<th class="markdownTableHeadRight">Hardware </th><th class="markdownTableHeadCenter"># Cores </th><th class="markdownTableHeadCenter">Steps/Hr (1M pts) </th><th class="markdownTableHeadCenter">Steps/Hr (4M pts) </th><th class="markdownTableHeadCenter">Steps/Hr (8M pts) </th><th class="markdownTableHeadCenter">Compiler </th><th class="markdownTableHeadLeft">Computer </th></tr>
161-
<tr class="markdownTableRowOdd">
162-
<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">113.4k </td><td class="markdownTableBodyCenter">26.2k </td><td class="markdownTableBodyCenter">13.0k </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr>
163-
<tr class="markdownTableRowEven">
164-
<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">107.7k </td><td class="markdownTableBodyCenter">26.3k </td><td class="markdownTableBodyCenter">13.1k </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr>
165-
<tr class="markdownTableRowOdd">
166-
<td class="markdownTableBodyRight">NVIDIA A100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">153.5k </td><td class="markdownTableBodyCenter">48.0k </td><td class="markdownTableBodyCenter">22.5k </td><td class="markdownTableBodyCenter">NVHPC 23.5 </td><td class="markdownTableBodyLeft">Wingtip </td></tr>
167-
<tr class="markdownTableRowEven">
168-
<td class="markdownTableBodyRight">AMD MI250X </td><td class="markdownTableBodyCenter">1 (GCD) </td><td class="markdownTableBodyCenter">104.2k </td><td class="markdownTableBodyCenter">31.0k </td><td class="markdownTableBodyCenter">14.8k </td><td class="markdownTableBodyCenter">CCE 16.0.1 </td><td class="markdownTableBodyLeft">OLCF Frontier </td></tr>
169-
<tr class="markdownTableRowOdd">
170-
<td class="markdownTableBodyRight">Intel Xeon Gold 6226 </td><td class="markdownTableBodyCenter">12 (cores) </td><td class="markdownTableBodyCenter">5.4k </td><td class="markdownTableBodyCenter">1.6k </td><td class="markdownTableBodyCenter">0.8k </td><td class="markdownTableBodyCenter">GNU 10.3.0 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr>
171-
<tr class="markdownTableRowEven">
172-
<td class="markdownTableBodyRight">Apple Silicon M2 </td><td class="markdownTableBodyCenter">6 (cores) </td><td class="markdownTableBodyCenter">3.7k </td><td class="markdownTableBodyCenter">11.0k </td><td class="markdownTableBodyCenter">0.3k </td><td class="markdownTableBodyCenter">GNU 13.2.0 </td><td class="markdownTableBodyLeft">N/A </td></tr>
155+
<td class="markdownTableBodyRight">Apple M2 </td><td class="markdownTableBodyCenter">6 cores </td><td class="markdownTableBodyCenter">2919 </td><td class="markdownTableBodyCenter">245 </td><td class="markdownTableBodyCenter">4500 </td><td class="markdownTableBodyCenter">GNU 13.2.0 </td><td class="markdownTableBodyLeft">N/A </td></tr>
173156
</table>
157+
<p><b>All results are in nanoseconds (ns) per grid point (gp) per right-hand side (rhs) evaluation. Lower is better.</b></p>
174158
<h1><a class="anchor" id="autotoc_md55"></a>
175159
Weak scaling</h1>
176160
<p>Weak scaling results are obtained by increasing the problem size with the number of processes so that work per process remains constant.</p>

index.html

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,12 @@
7272
};
7373

7474
const sims = [
75-
new FS("Shedding water droplet - Vorticity", "res/simulations/a.png", "Summit", "960 V100s", "4h", "https://doi.org/10.48550/arXiv.2305.09163"),
76-
new FS("Inviscid flow over an airfoil - Vorticity", "res/simulations/g.png", "Delta", "128 A100s", "19h", "https://doi.org/10.48550/arXiv.2305.09163"),
77-
new FS("Kidney stone near a collapsing bubble cloud - Maximum principal stresses", "res/simulations/d.png", "Summit", "576 V100s", "30 min", "https://doi.org/10.48550/arXiv.2305.09163"),
75+
new FS("Shedding water droplet", "res/simulations/a.png", "Summit", "960 V100s", "4h", "https://doi.org/10.48550/arXiv.2305.09163"),
76+
new FS("Flow over an airfoil (vorticity", "res/simulations/g.png", "Delta", "128 A100s", "19h", "https://vimeo.com/917305340/c05fd414c8?share=copy"),
77+
new FS("Cavitating bubbles fragment kidney stone", "res/simulations/d.png", "Summit", "576 V100s", "30 min", "https://doi.org/10.48550/arXiv.2305.09163"),
7878
new FS("Breakup of vibrated interface", "res/simulations/f.png", "Summit", "128 V100s", "4h","https://youtu.be/qQV2ZRDpf2M"),
79-
new FS("Cavitating bubble cloud - Wall pressure", "res/simulations/b.png", "Summit", "216 V100s", "3h", "https://doi.org/10.48550/arXiv.2305.09163"),
80-
new FS("Cavitating bubble cloud - Streamlines", "res/simulations/c.png", "Summit", "216 V100s", "3h", "https://doi.org/10.48550/arXiv.2305.09163"),
79+
new FS("Collapsing bubbles (pressure)", "res/simulations/b.png", "Summit", "216 V100s", "3h", "https://doi.org/10.48550/arXiv.2305.09163"),
80+
new FS("Collapsing bubbles (streamlines)", "res/simulations/c.png", "Summit", "216 V100s", "3h", "https://doi.org/10.48550/arXiv.2305.09163"),
8181
];
8282

8383
/*

0 commit comments

Comments
 (0)