Compiling high-performance software from source is a foundational skill in HPC. It allows for machine-specific optimisations that can't be achieved with pre-built binaries. This post uses the High-Performance Linpack (HPL) benchmark as a practical example to explore the compilation process, and then dives into some interesting concepts like ldd and LD_PRELOAD.
Part 1: Compiling HPL from Source
When you need a piece of software, you have two main options: use a pre-built binary or compile it from source. While pre-built is easier, compiling from source is usually better in HPC because the compiler can make machine-specific optimisations for the exact hardware you're running on. For HPL, this is key to squeezing out every last drop of performance.
Setting up the Environment
First, after logging into the cluster, it's good practice to start with a clean slate by purging any loaded modules, and then loading the specific compiler we need. In this case, a modern GCC.
module purge
module avail gcc
module load gcc/15.2
# Set up a build directory
export HPL_BUILD_DIR=$(pwd)/hpl_build
mkdir -p $HPL_BUILD_DIR
cd $HPL_BUILD_DIR
Building the Dependencies (Open MPI & OpenBLAS)
HPL has a few key dependencies: an MPI implementation and a BLAS (Basic Linear Algebra Subprograms) library. We'll compile both Open MPI and OpenBLAS from source inside our build directory.
Let's start with Open MPI. We'll download the source, create a separate build directory (a good practice to keep the source tree clean), and configure it.
# Download and extract Open MPI
wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.8.tar.gz
tar -xvf openmpi-5.0.8.tar.gz
cd openmpi-5.0.8
# Create build and install directories
mkdir build
cd build
mkdir $HPL_BUILD_DIR/installs
# Configure and build
../configure --prefix=$HPL_BUILD_DIR/installs CC=gcc CXX=g++ FC=gfortran
make -j 8
make install
The --prefix command tells the build system where to install the final files. Using our environment variable keeps things tidy. make -j 8 tells it to use 8 processor cores to speed up the compilation.
While that's running, we can get started on OpenBLAS. It's important to navigate back to our main build directory first.
# IMPORTANT: Go back to the main build directory
cd $HPL_BUILD_DIR
# Clone and build OpenBLAS
git clone https://github.com/OpenMathLib/OpenBLAS.git
cd OpenBLAS
make -j 8
make PREFIX=$HPL_BUILD_DIR/installs install
It's worth noting that while we're using Open MPI and OpenBLAS here, many other implementations are available (e.g., MPICH, Intel MPI, BLIS, ATLAS). In HPC competitions, trying different implementations and tuning their compilation can be a significant part of optimizing performance and is highly encouraged as an exercise!
Compiling HPL
Now that our dependencies are built and installed, we can compile HPL. First, we need to make sure we're using our newly compiled MPI and that our OpenBLAS libraries are where we expect them.
# Check that our custom-built mpicc is available
$HPL_BUILD_DIR/installs/bin/mpicc --version
# Check for our OpenBLAS libraries
ls $HPL_BUILD_DIR/installs/lib
With that confirmed, we configure HPL. This part is crucial. We tell the configure script where to find our custom MPI compiler and the OpenBLAS library.
# Go back to the main build directory and get HPL
cd $HPL_BUILD_DIR
wget http://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
tar -xvf hpl-2.3.tar.gz
cd hpl-2.3
# Configure with our custom paths
./configure CC=$HPL_BUILD_DIR/installs/bin/mpicc \
LDFLAGS="-L$HPL_BUILD_DIR/installs/lib" \
LIBS="-lopenblas"
# Finally, build HPL
make -j8 arch=Linux_GCC
After a successful build, you're ready to run the benchmark! This usually involves creating a HPL.dat file and preparing a SLURM script to submit the job. While everything worked smoothly here for the demo, you often run into issues at every other step in the real world!
Truly understanding the input options in HPL.dat is complicated; luckily for us, there are many resources online to help create the input. This website is a good start! It won't get optimal performance as it only optimises some parameters, but it should get pretty close.
Once you have the input file ready, place it in the $HPL_BUILD_DIR/hpl-2.3/testing/ directory, which is where the xhpl binary is located. For testing purposes, you can copy the example HPL.dat from $HPL_BUILD_DIR/hpl-2.3/testing/ptest/HPL.dat. You should now be able to submit a job to the scheduler. Here is an example SLURM script:
#!/bin/bash
#SBATCH --job-name=hpl_benchmark
#SBATCH --output=hpl_benchmark_%j.out
#SBATCH --error=hpl_benchmark_%j.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:30:00
module purge
module load gcc/15.2
cd $HPL_BUILD_DIR/hpl-2.3/testing/
$HPL_BUILD_DIR/installs/bin/mpirun ./xhpl
Part 2: A Little Library Mischief
That compilation process is a great practical example of managing libraries. But what if we want to investigate or even manipulate which libraries a program uses?
Checking Dynamic Libraries
The easiest way to see what shared libraries an executable needs is ldd:
ldd $HPL_BUILD_DIR/hpl-2.3/testing/xhpl
For a hardcore check, you can actually look at the memory maps of a running process. Find the process ID (e.g., with pidof xhpl) and then check its maps:
cat /proc/$(pidof xhpl)/maps | grep .so
This will show you the exact path of every shared object (.so) file loaded into that process's memory space.
A Thought Experiment with LD_PRELOAD
This brings us to a fascinating and powerful environment variable: LD_PRELOAD. It allows you to tell the dynamic linker to load your own library before any other library, including standard ones like `libc`. This means if your library contains a function with the same name as a standard function (e.g., rand()), your version will be used instead. It's a powerful tool for debugging, but also for... other things.
If you could LD_PRELOAD a single function for all the servers of any company to cause the most chaos, which company, what function, and why?
My answer: target Cloudflare and preload their random number generator. If you could make their crypto-quality random numbers not-so-random (e.g., always return 4), you could silently break the cryptographic guarantees that underpin a huge chunk of the internet's security. It would be subtle yet catastrophic!