|
137 | 137 | <div class="textblock"><p><a class="anchor" id="autotoc_md53"></a> MFC has been benchmarked on several CPUs and GPU devices. This page shows a summary of these results.</p>
|
138 | 138 | <h1><a class="anchor" id="autotoc_md54"></a>
|
139 | 139 | Expected time-steps/hour</h1>
|
140 |
| -<p>The following table outlines expected performance in terms of the number of time steps per hour, rounded to the nearest hundred (higher is better). A 3D inviscid, 6-equation problem is solved for various problem sizes (grid cells) and hardware. A 3rd order (3-stage) Runge-Kutta time-stepper is used. CPU results utilize an entire processor die.</p> |
| 140 | +<p>The following table outlines observed performance as nanoseconds per grid point (ns/GP) per right-hand side evaluation (lower is better). We solve an example 3D, inviscid, 5-equation model problem with two advected species (a total of 8 PDEs). The numerics are WENO5 and the HLLC approximate Riemann solver. We report results for various numbers of grid points per CPU die (or GPU device) and hardware.</p> |
141 | 141 | <table class="markdownTable">
|
142 | 142 | <tr class="markdownTableHead">
|
143 |
| -<th class="markdownTableHeadRight">Hardware </th><th class="markdownTableHeadCenter"># Cores </th><th class="markdownTableHeadCenter">Steps/Hr (1M pts) </th><th class="markdownTableHeadCenter">Steps/Hr (4M pts) </th><th class="markdownTableHeadCenter">Steps/Hr (8M pts) </th><th class="markdownTableHeadCenter">Compiler </th><th class="markdownTableHeadLeft">Computer </th></tr> |
| 143 | +<th class="markdownTableHeadRight">Hardware </th><th class="markdownTableHeadCenter"></th><th class="markdownTableHeadCenter">1M GPs </th><th class="markdownTableHeadCenter">4M GPs </th><th class="markdownTableHeadCenter">8M GPs </th><th class="markdownTableHeadCenter">Compiler </th><th class="markdownTableHeadLeft">Computer </th></tr> |
144 | 144 | <tr class="markdownTableRowOdd">
|
145 |
| -<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">88.5k </td><td class="markdownTableBodyCenter">18.7k </td><td class="markdownTableBodyCenter">N/A </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
| 145 | +<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">96 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
146 | 146 | <tr class="markdownTableRowEven">
|
147 |
| -<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">78.8k </td><td class="markdownTableBodyCenter">18.8k </td><td class="markdownTableBodyCenter">N/A </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr> |
| 147 | +<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">101 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr> |
148 | 148 | <tr class="markdownTableRowOdd">
|
149 |
| -<td class="markdownTableBodyRight">NVIDIA A100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">114.4k </td><td class="markdownTableBodyCenter">34.6k </td><td class="markdownTableBodyCenter">16.5k </td><td class="markdownTableBodyCenter">NVHPC 23.5 </td><td class="markdownTableBodyLeft">Wingtip </td></tr> |
| 149 | +<td class="markdownTableBodyRight">NVIDIA A100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">71 </td><td class="markdownTableBodyCenter">56 </td><td class="markdownTableBodyCenter">59 </td><td class="markdownTableBodyCenter">NVHPC 23.5 </td><td class="markdownTableBodyLeft">Wingtip </td></tr> |
150 | 150 | <tr class="markdownTableRowEven">
|
151 |
| -<td class="markdownTableBodyRight">AMD MI250X </td><td class="markdownTableBodyCenter">1 (GCD) </td><td class="markdownTableBodyCenter">77.5k </td><td class="markdownTableBodyCenter">22.3k </td><td class="markdownTableBodyCenter">11.2k </td><td class="markdownTableBodyCenter">CCE 16.0.1 </td><td class="markdownTableBodyLeft">OLCF Frontier </td></tr> |
| 151 | +<td class="markdownTableBodyRight">AMD MI250X </td><td class="markdownTableBodyCenter">1 GCD </td><td class="markdownTableBodyCenter">108 </td><td class="markdownTableBodyCenter">90 </td><td class="markdownTableBodyCenter">96 </td><td class="markdownTableBodyCenter">CCE 16.0.1 </td><td class="markdownTableBodyLeft">OLCF Frontier </td></tr> |
152 | 152 | <tr class="markdownTableRowOdd">
|
153 |
| -<td class="markdownTableBodyRight">Intel Xeon Gold 6226 </td><td class="markdownTableBodyCenter">12 (cores) </td><td class="markdownTableBodyCenter">2.5k </td><td class="markdownTableBodyCenter">0.7k </td><td class="markdownTableBodyCenter">0.4k </td><td class="markdownTableBodyCenter">GNU 10.3.0 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
| 153 | +<td class="markdownTableBodyRight">Intel Xeon Gold 6226 </td><td class="markdownTableBodyCenter">12 cores </td><td class="markdownTableBodyCenter">1963 </td><td class="markdownTableBodyCenter">1688 </td><td class="markdownTableBodyCenter">1686 </td><td class="markdownTableBodyCenter">GNU 10.3.0 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
154 | 154 | <tr class="markdownTableRowEven">
|
155 |
| -<td class="markdownTableBodyRight">Apple Silicon M2 </td><td class="markdownTableBodyCenter">6 (cores) </td><td class="markdownTableBodyCenter">2.8k </td><td class="markdownTableBodyCenter">0.6k </td><td class="markdownTableBodyCenter">0.2k </td><td class="markdownTableBodyCenter">GNU 13.2.0 </td><td class="markdownTableBodyLeft">N/A </td></tr> |
156 |
| -</table> |
157 |
| -<p>We also show the expected performance of MFC for the same problem as above, except for the 5-equation model used, in the table below. It is presented in the same manner as the one above.</p> |
158 |
| -<table class="markdownTable"> |
159 |
| -<tr class="markdownTableHead"> |
160 |
| -<th class="markdownTableHeadRight">Hardware </th><th class="markdownTableHeadCenter"># Cores </th><th class="markdownTableHeadCenter">Steps/Hr (1M pts) </th><th class="markdownTableHeadCenter">Steps/Hr (4M pts) </th><th class="markdownTableHeadCenter">Steps/Hr (8M pts) </th><th class="markdownTableHeadCenter">Compiler </th><th class="markdownTableHeadLeft">Computer </th></tr> |
161 |
| -<tr class="markdownTableRowOdd"> |
162 |
| -<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">113.4k </td><td class="markdownTableBodyCenter">26.2k </td><td class="markdownTableBodyCenter">13.0k </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
163 |
| -<tr class="markdownTableRowEven"> |
164 |
| -<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">107.7k </td><td class="markdownTableBodyCenter">26.3k </td><td class="markdownTableBodyCenter">13.1k </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr> |
165 |
| -<tr class="markdownTableRowOdd"> |
166 |
| -<td class="markdownTableBodyRight">NVIDIA A100 </td><td class="markdownTableBodyCenter">1 (device) </td><td class="markdownTableBodyCenter">153.5k </td><td class="markdownTableBodyCenter">48.0k </td><td class="markdownTableBodyCenter">22.5k </td><td class="markdownTableBodyCenter">NVHPC 23.5 </td><td class="markdownTableBodyLeft">Wingtip </td></tr> |
167 |
| -<tr class="markdownTableRowEven"> |
168 |
| -<td class="markdownTableBodyRight">AMD MI250X </td><td class="markdownTableBodyCenter">1 (GCD) </td><td class="markdownTableBodyCenter">104.2k </td><td class="markdownTableBodyCenter">31.0k </td><td class="markdownTableBodyCenter">14.8k </td><td class="markdownTableBodyCenter">CCE 16.0.1 </td><td class="markdownTableBodyLeft">OLCF Frontier </td></tr> |
169 |
| -<tr class="markdownTableRowOdd"> |
170 |
| -<td class="markdownTableBodyRight">Intel Xeon Gold 6226 </td><td class="markdownTableBodyCenter">12 (cores) </td><td class="markdownTableBodyCenter">5.4k </td><td class="markdownTableBodyCenter">1.6k </td><td class="markdownTableBodyCenter">0.8k </td><td class="markdownTableBodyCenter">GNU 10.3.0 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
171 |
| -<tr class="markdownTableRowEven"> |
172 |
| -<td class="markdownTableBodyRight">Apple Silicon M2 </td><td class="markdownTableBodyCenter">6 (cores) </td><td class="markdownTableBodyCenter">3.7k </td><td class="markdownTableBodyCenter">11.0k </td><td class="markdownTableBodyCenter">0.3k </td><td class="markdownTableBodyCenter">GNU 13.2.0 </td><td class="markdownTableBodyLeft">N/A </td></tr> |
| 155 | +<td class="markdownTableBodyRight">Apple M2 </td><td class="markdownTableBodyCenter">6 cores </td><td class="markdownTableBodyCenter">2919 </td><td class="markdownTableBodyCenter">245 </td><td class="markdownTableBodyCenter">4500 </td><td class="markdownTableBodyCenter">GNU 13.2.0 </td><td class="markdownTableBodyLeft">N/A </td></tr> |
173 | 156 | </table>
|
| 157 | +<p><b>All results are in nanoseconds (ns) per grid point (gp) per right-hand side (rhs) evaluation. Lower is better.</b></p> |
174 | 158 | <h1><a class="anchor" id="autotoc_md55"></a>
|
175 | 159 | Weak scaling</h1>
|
176 | 160 | <p>Weak scaling results are obtained by increasing the problem size with the number of processes so that work per process remains constant.</p>
|
|
0 commit comments