HPL.dat Calculator

HPL Configuration Parameters

HPL.dat Template Type:

Select the type of HPL.dat file to generate:

Default CPU: Standard HPL.dat suitable for general CPU-based runs.
AMD Zen precompiled xhpl: Optimized for AMD Zen HPL pre-built binaries from AMD Zen Software Studio. Includes AMD specific MXSWP setting.
NVIDIA GPU HPC-Benchmarks: For use with the NVIDIA HPC-Benchmarks container from NVIDIA NGC. Uses GPU-suitable parameters and may suggest larger NB values for optimal performance.

Number of Nodes (for HPL):

Specify the total number of physical compute nodes that will be used for the HPL benchmark. This value determines the problem scaling (P & Q grid) and total memory for (N).

Total Physical/Relevant Cores per Node:

Use 1 MPI rank per node (for P & Q calculation)

Enter the total number of physical CPU cores on each node. For GPUs, This will be the total number of GPUs.

If the checkbox "Use 1 MPI rank per node" is checked, HPL's process grid (P x Q) will be calculated assuming only 1 MPI rank per node. . If unchecked, P & Q will be based on all cores entered here being used by MPI ranks. This also affects `slots` in generated Machine Files and ranks per node in Rankfiles.

Memory per Node (GB):

Amount of free RAM available on each compute node, specified in Gigabytes (GB).

For GPU HPL, input the total amount of vRAM all GPUs combined per node.

Block Size (NB):

The block size (NB) is a critical HPL tuning parameter.

Recommendation: CPU HPL: 128 to 384. GPU HPL (NVIDIA template): 1024+. Experimentation is key.

Memory Utilization Target (%):

Percentage of total aggregated memory for HPL's main matrix (N x N).

Recommendation: 80% to 95%. Default: 92.5%.

Hardware Specifications (for Rpeak Calculation)

CPU Clock Speed (GHz):

Clock frequency of CPU cores in Gigahertz (e.g., 2.5, 3.0 GHz).

Recommendation: Use base or sustained all-core boost frequency.

DP FLOPs/Cycle/Core:

Double Precision Floating-Point Operations a single CPU core can execute per clock cycle.

Recommendation: Consult CPU specs. Examples:

Intel Xeon:

Xeon E5 v3/v4 (Haswell and Broadwell): 16 (AVX2)
Xeon Scalable 1st-2nd Gen (Skylake-SP and Cascade-Lake) 16 or 32 depending on SKU.
Xeon Scalable 3rd-6th Gen (Ice Lake-SP, Sapphire Rapids-SP, Emerald Rapids-SP and Granite Rapids): 32 (AVX-512)

AMD EPYC:

EPYC 1st Gen Zen 1 (Naples): 8 (AVX2)
EPYC 2nd-3rd Gen Zen 2 / Zen 3 (Rome and Milan): 16 (AVX2)
EPYC 4th Gen Zen 4 (Genoa): 16 (AVX2 w/ AVX-512 Support)
EPYC 5th Gen Zen 5 (Turin): 32 (AVX-512)

Optional: Auxiliary File Generation

Machine File (Hostfile)

Generates a hostfile listing nodes in the specified range. Slots per node are determined by "Total Physical/Relevant Cores per Node" and the "Use 1 MPI rank per node" checkbox. This hostname range is also required if you want to generate a Rankfile below with specific node names.

First Hostname in Range:

Last Hostname in Range:

Enter the full hostnames for the start and end of your node range (e.g., First: node001, Last: node010).

The tool parses prefix, numerical part, and suffix. Prefix/suffix must match.

MPI Rankfile

Generate Linear MPI Rankfile

A rankfile explicitly maps each MPI process to a specific CPU core. Requires "First/Last Hostname in Range" above to be filled for node names.

Starting Core ID per Node (for Rankfile):

Specify the logical core ID (usually 0) from which MPI ranks should start mapping on each node. For example, if ranks per node is 4 and start ID is 0, ranks will map to cores 0, 1, 2, 3 on each node.

Ranks per Node:

Specify the exact number of MPI ranks per node for the rankfile.

If left blank, ranks per node for the rankfile will be one rank per core based on "Total Physical/Relevant Cores per Node" and the "Use 1 MPI rank per node" HPL setting.