CIQ

HPC Triangle

January 19, 2024

Generally, an HPC cluster requires three basic resources: fast compute, fast storage, and fast networking. The fast compute is provided by CPUs, GPUs, FPGAs, and other accelerator cards; this is the part of the cluster doing the actual calculations involved in a given computational job. The fast storage is provided by parallel filesystems, storage tiering, and fast underlying storage hardware like SSDs; this is the part of the cluster providing a fast filesystem for a computational job to read/write data to/from. The fast networking is provided by an HPC interconnect; this is the part of the cluster that networks the fast compute all together so the calculation data involved in a given computational job can be effectively communicated across the cluster as necessary, and frequently also connects the fast storage to the fast compute. These three components–compute, storage, and networking–can each bottleneck the others very easily in an HPC cluster, making the design and planning of a supercomputer deployment a complex task. Together, they form an “HPC triangle,” analogous to the “cheap, fast, good” triangle.