AI Training/Inference

January 19, 2024

AI training and inference usually have to leverage HPC-type resources in order to run efficiently, even if these operations are sometimes more associated with enterprise computing based on the context they’re being done in.

As mentioned above under the “GPU” definition, it frequently takes special accelerator cards like GPUs to effectively train AI, and the most complex models can require dozens, or even thousands, of GPUs to all be coordinated together at once on a single computational job. This is because AI training ultimately involves a massive network of neurons doing simple calculations all at once more or less independently, and these networks can be so large as to necessitate many GPUs being pooled together to give the memory space necessary to fit the entire model as it’s being trained and the compute power necessary to do all the calculations necessary in parallel.

AI inference can typically utilize smaller scales of GPU resources, as the finished model requires less data to be held in memory/computed at once to run in the end due to how the model no longer requires keeping track of a large number of variables that are all being tuned individually at once.

As the needs for computational power in AI training and inference increase, numerous specialized machine learning accelerators are being developed to move beyond the capabilities of GPUs.