CIQ

GPU Computing with Apptainer!

February 23, 2023

Apptainer has been good friends with GPUs for many years. In this installment of the container education series we will explore GPU container integration in detail. We’ll talk about how it works, how to build containers that use GPUs, common pitfalls to avoid, and we’ll solve the mystery of the missing GPU. Join us for a fun, informal, and (we hope) educational discussion.

Webinar Synopsis:

Speakers:

  • Zane Hamilton, Vice President - Sales Engineering, CIQ

  • Dave Godlove, Solutions Architect, CIQ

  • Brian Phan, Solutions Architect, CIQ


Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Full Webinar Transcript:

Zane Hamilton:

Hello, and welcome to another installment of the Container Education Series here at CIQ. I'm Zane Hamilton. We at CIQ are dedicated to driving the future of software infrastructure utilizing cloud, hyperscale, and HPC technologies. We cater to a wide range of enterprise customers all the way through research, and we do it in an unparalleled way for Rocky Linux, Warewulf, and Apptainer. We tailor our services to meet unique needs of all of our customers, and we deliver them in the collaborative spirit of open source. So today we're going to dive back into containerization, and we're going to be talking about containers and GPUs. And we have Dave, Brian, Forrest, I hope. There we go. Brian and Dave, welcome back.

Brian Phan:

Good to be back.

Dave Godlove:

Hi Zane. Excellent.

Zane Hamilton:

So I'll let you guys introduce yourselves and I'm going to let you guys dive into it. I am very interested in this topic because, I know Dave, you've been doing a lot of work on this and being able to add stuff and do things with a container and a GPU and the way that you're going to describe it is very interesting to me. So, Brian, introduce yourself.

Brain Phan's HPC and Apptainer Experience [6:13]

Brian Phan:

Hey everyone. My name is Brian Phan. I'm a Solutions Architect here at CIQ. My background in HPC in the industry verticals of automotive, aerospace, and genomics. Good to be back.

Zane Hamilton:

Thank you, Brian. Dr. David Godlove

Dave Godlove and the National Institutes of Health [6:29]

Dave Godlove:

I'm Dave, Godlove. Yeah. So I'm also a Solutions Architect here at CIQ. And my background is that I started off as a scientist, a research scientist, working at the National Institutes of Health. While I was there, I became interested in high performance computing, and I joined the NIH BioWulf team. So I helped to administer the BioWulf cluster at the NIH as a staff scientist. And so that background is going to become important today because I hope to take a little trip down memory lane and talk a little bit about the origins of GPU container support within Apptainer. And so then I have been with the Apptainer community for some time now and been helping to develop Singularity/Apptainer off and on for a number of years.

What Are Users Doing With GPUs in HPC**?** [7:26]

Zane Hamilton:

Thank you, Dave. So also, before we start and get into containerization on GPUs, I think it's important that we level the set of what are people using GPUs for in general at HPC. So I think from my perspective, I have my views and guesses from the enterprise side, but I would like to hear from you guys of what people are doing with it, or what are you running across people doing with GPUs and HPC today? Start with you, Brian.

Brian Phan:

So from what I've seen, I know GPUs are used heavily in AI/ML. From my experience though in the automotive and aerospace industries I have seen GPUs used to accelerate simulations. But this is very dependent on the specific algorithms that are used because certain algorithms work well on GPUs while others don't. So yeah, that's been my experience.

Zane Hamilton:

Thank you, Brian. Dave?

Dave Godlove:

Yeah, I think especially the whole AI, Deep Learning revolution really that we've seen happening since maybe 2015, 2016 is really what launched GPUs into the forefront of HPC. I was a spectator to this, and at the time that this really started, I wasn't in computer science, I was in neuroscience, so I can only speculate, but it sort of seemed to me that people who were developing GPUs so the GPU architecture is like fundamentally different from like the normal CPU architecture, right? GPU architecture is massively parallel, lots and lots of relatively simple computations all happening at the same time across tons and tons of processors, right? This is a very naive understanding based on just what I think.

But I think that what was happening was that GPU people who were developing GPUs were using them for graphics like massively computational things like ray tracing and stuff where you need to just calculate a whole bunch of relatively simple things all simultaneously. And somebody had the bright idea and said, well, I mean, there's lots of different problems that don't include graphics that we could use these things for. And I'm oversimplifying and once again presenting my outsider's view. But once people came up with that, and then also I think at the same time or in a similar time, these Deep Learning models started to come out, and these Deep Learning models really could benefit from lots and lots of pretty small computations all happening in parallel because the way in which the models were put together, right? It's a bunch of linear algebra. It's these matrix calculations which are happening. And so everything worked together and just sort of launched GPU processing really into the HPC mainstream.

How Did Dave Godlove Use GPUs Before Getting Into Computer Science? [10:40]

Zane Hamilton:

So from your research perspective, Dave, before you got into computer science, were you using GPUs? Was that something that came before, after, during?

Dave Godlove:

Yeah, that's a good question. So I remember way back, like 2012, 2013 I was actually not using high performance computing at that point in time, but I was in a lab in which a bunch of my lab mates were doing a lot of simulations. And at that point in time, I remember that it was at Vanderbilt University, and I remember Vanderbilt started to invest in GPU technology at their HPC center. And some of my lab mates were like, it's weird. You can do these operations on graphics processing units instead of CPUs, but they're really investing heavily in this. It's supposed to be something that's happening in the future. And so then around I guess around 2000, well, I guess around 2013 actually, so I might have got that timeline mixed up.

That was probably 2010, 2011, around 2013. So I think when I went to the NIH as a scientist, and shortly thereafter is when, Deep Learning really started to make a big impact. And some of the models that people were putting together began to solve problems that had been huge problems in computer sciences for many decades. And so at that point in time the perspective that I had on it, so I was investigating the neural basis of vision and wow, a lot of AI research is really focused on vision. And so at that point in time, from my perspective, one of the big things that a lot of my colleagues were wondering about, and talking about and writing about and arguing about is to what extent do these Deep Learning models. So you call them neural networks, right?

And there's a metaphor of them being neurons, but are they neurons? And I mean, do they operate in the same way that real biological neurons do? And to what extent do they approximate the actual real workings of the brain? And like, there's people probably still on the side, like all the way at one end of the spectrum just like these things are completely different from the way that the brain behaves. There's no way we can use these as research tools to try to understand the brain, and there's people all the way at the other end of the spectrum. They're like basically we should build these models of vision and we should study them, and we should extrapolate from them how the brain works. And then there's people all on the spectrum right in the middle. So it's a pretty interesting topic from that point of view too, which is not really something you hear too much of from computer science. I don't think so.

How Does The GPU Driver Work? [13:47]

Zane Hamilton:

That's great. Thanks, Dave. Now, from a background perspective, I guess the next question is how does the GPU driver work?

Brian Phan:

I guess from my perspective, my knowledge of how this works is this, the GPU driver is just the interface for the OS to actually communicate with the GPU hardware. But under the hood though, I'm not too familiar with how that works, but if Dave would like to shed some more light on that, I'm all ears.

Dave Godlove:

So I can give you, once again, I'm going to give you just my perspective, which I feel like there might be like actual engineers who really work on this stuff, who are going to watch this later, and they're going to be like, wow, what is this guy talking about? He's got no idea. But I'm going to give you what I know. And what I know is that the NVIDIA GPU driver, which I should specify, we're going to be sticking to NVIDIA today. And that's in no way an endorsement of NVIDIA or anything like that, but it's just like those are the most widespread GPUs that are currently installed. And most of the clusters that I've worked with, and most of the nodes that I've worked with and stuff.

So I don't have a lot of experience outside of NVIDIA GPUs. So that's what I'm going to be sticking to and talking about today. But from my perspective, the NVIDIA driver for GPUs, when you install it, it installs basically two major components. And the first of those two major components is a Kernel module, which it compiles and then adds to your Kernel. And then the second major component is a collection of libraries that get installed in the user space that are used to interact with that Kernel module. And the reason that that's important is because that makes it difficult, from a container perspective, to pass the GPU with its driver through into the container and to use it. So that's basically at a high level.

Zane Hamilton:

So from Dave. Neuromorphic computing is definitely still a thing. Yet abstract models of the mind still beat them to date. Thanks Dave. Thanks for joining.

Dave Godlove:

Yeah, good to see you.

The History of GPU Support For Apptainer and Singularity [16:19]

Zane Hamilton:

All right. So Mr. Godlove, give us a little bit of a history of GPU support. And we're specifically talking about Apptainer and Singularity is what we're focused on, right?

Dave Godlove:

This is a cool topic. I think, and it's personal to me. So if we go all the way back and we take a trip down memory lane around 2016 or so, I believe it was, 2017, I believe. No, it was 2016. That's when Greg first committed the first V1.0 of Singularity to GitHub, and there was a community coalescing around Singularity, and there was a lot of excitement within the community. It was really new technology, and it was a really nice inclusive thing. And one of the things that we really wanted to do and was really important to us was to be able to utilize GPUs from within the containers.

But, I told you a little bit about how the GPU container, the GPU drivers installed to set up this problem. So, basically, one of the key differences, or maybe the key difference between containers and virtual machines is, and one of the things that makes containers so fast is that when you use a container, you're sharing the Kernel with the host system underneath. And that's like a key thing to understand why the driver becomes such a prickly thing to use inside the container. So I already said that the GPU as far as NVIDIA's drivers goes, that driver installs a Kernel module, and then you also have to install a set of libraries to interact with that Kernel module.

So what ends up happening is that when you run a container in which you want to use the GPU, well, when you run that container, you get the Kernel along with its modules in your container, and so you've got this specific version of the driver, right? But then you don't get the libraries installed. You just get the Kernel module, and so then it's up to you to try to match exactly the same version of all the libraries with the Kernel module. And if you don't match them exactly, you're just going to get errors where it says that it just can't locate the GPU or whatever. So it becomes an issue. So we had a lot of users who were interested in using the GPUs at the NIH , on the cluster, and were also interested in Singularity containers at the time, which ultimately became Apptainer containers.

So we were trying to figure out this problem, how do we instruct the users and figure out a way for them to be able to install the right versions of the libraries and their containers to use our GPUs. And the very first thing that we did, and I was helped with a few other members of the community in doing this, but the very first thing I did was really a pretty stupid thing. We created this script, and the script was just a helper script, and basically it's just a shell script. And we said we're going to provide this for you up in our storage space at the NIH , you can download this script as you're building your container.

And it's still actually up on GitHub. It's called GPU for Singularity. And what it does is you basically just give it, as input, a version of the GPU that you're interested in, and to try to make sure that we really match these up. I actually grabbed it, so the GPU drivers from NVIDIA are redistributable according to their license, at least the Linux versions are. So I actually just grabbed the one that we were currently using, and I put that on the FTP server at the NIH , or one of the FTP servers. And so then you could either that one or in the script, you could download it directly from NVIDIA, depending on the options that you passed, so that people outside the NIH could also benefit from this. So this helped a little bit.

I mean, it's a little bit better than having to go figure out what the current version of the driver is and then go manually download it from NVIDIA or download it in your container and then install it. But the problem is okay, that container now works at the NIH with the current version of the driver that we're running, sometimes we have to go and update the drivers, right? Then everybody's containers break, or sometimes these people might want to take these containers and run them somewhere else. So I tried to make those scripts so that you could update your containers too, so that you could without rebuilding your containers, you could just pretty easily rerun that in the same container and then basically rebuild that into another. At the time, I thought they were ext3 images instead of the sif file format, but it's still super clunky and it just didn't work very well at all.

And so that was the beginning. And then I started to say, okay, well, I mean, I can see what gets installed inside the container when you do this. I can see what we need in order to make the container work. So why don't I start going through and trying to find manually all these libraries where they exist on the host, and maybe I can just start to bind mount these libraries, you know? Because I have a list here, so maybe I can just go through and start to bind mount, manually, all these libraries into the host. And so that took a while and there was some trial and error involved in doing that. But I finally found a pretty reliable method to be able to bind all the libraries from the host system into the container.

And so this is better, right? Because now, if you can just find them and bind them in, you don't have to rebuild your container every time the administrator comes by and updates the driver. And you also don't have to rebuild your container. If you want to move your container from one system to another. You just plop the container on that system, and then you bind mount all the libraries in. But manually going through and finding all those libraries and figuring out how to bind mount them is a little tricky. It's not super easy. And it's still pretty clunky. So I got really excited about it though. I went and I pinged Greg, so we're talking in Slack, and I'm like, Hey I mean, I've been working on this problem.

And we can bind mount all these libraries from the underlying host into the container. And this shows you how I'm not a computer scientist by background. I've just picked stuff up along the way, and he was like, that is crazy. And it's probably never going to work. If you got it to work, then you're lucky. Maybe your container was the same operating system as the underlying host or something. But like, I mean, libraries are going to be compiled for specific operating systems and they're like, this is not going to work in a very generally flexible way. And I was like, well, I think it does. I mean, it seems like it does.

And so we started talking to some folks at NVIDIA. So this is a big problem in containers generally, right? This was not just an issue that we were having with Apptainer. People couldn't really do this with Docker either, which was the other prevalent solution at the time. The problem was so bad that NVIDIA had created a different version of Docker called NVIDIA Docker, in which they had figured out how to make GPUs work within containers. And their advice at the time was that you were supposed to grab a different version of Docker. You were supposed to grab an NVIDIA Docker and use that if you wanted to use GPUs inside your containers. So we started talking to some developers in NVIDIA, and it turned out that that's basically what NVIDIA Docker was doing.

It was finding the libraries on the host system, and it wasn't bind mounting them, it was writing them into the container, but it was basically doing the same thing. And it turns out that it's lucky because it's the way in which NVIDIA compiles and then distributes their libraries. They basically just compile them in a very general way that they're expected to work on a wide range of different systems. And so because of that, you can just grab them from one operating system and put them into another one and they'll go ahead and work. And so ultimately we ended up, even though it's gross and it's weird that it works and it's surprising and all that, we ended up creating the experimental, which has been experimental for the past six years, the Experimental--NV option which that's how it works. It goes and it finds the libraries on the host system, bind mounts them into your container in a specific library, and then it just sets up your LD library path with all those libraries on the path and away you go.

How Does NVIDIA Select Drivers For Specific Operating System Distributions? [26:07]

Zane Hamilton:

So, Dave, whenever you say it finds the libraries, I'm assuming it's different per Distro? So if you're on a Debian base system like Ubuntu, it could put libraries in a little bit of a different path than what it would on Enterprise Linux. So from a developer's standpoint, are you having to tell it all of those versions of where to go look? Cause I mean, Ubuntu 18 may be different than 20, different than 22, or does it just go look?

Dave Godlove:

So this is really dead-bang simple. So when you install you know what maybe should go ahead and show you. Maybe the best thing to do is to show you, all right, let me let me see if I can figure out how to present. I've never had any trouble with presenting in the past, right?

Zane Hamilton:

No, I mean, we've never actually had to watch your physical monitor or anything. Not today, right? Oh, look at that.

Live TensorFlow CUDA Container Demo [27:03]

Dave Godlove:

Wow. Look at that. Okay, cool. Magic. All right. Yeah. So I came prepared to show how this works. I've got this example here. I've got this example here. I've got a little TensorFlow container that I could show. But it doesn't really matter what the container is. I mean, this one has CUDA and it's got a Deep Learning framework and stuff installed in it, so I could actually do real GPU work with it. But the same basic principle applies anywhere. So you asked Zane how does Apptainer figure out, depending on where regardless of what your distribution is, how does it figure out what libraries to get and where to get them and stuff. And it turns out that when you install, and I've got this. This is really gross too, but I've got this installed in /opt/. When you install Apptainer, there's a configuration file within /etc/. There's this configuration file called nvliblists.conf.

Okay? And the reason that we did it this way is because, when you install container, it installs for you this configuration file and this configuration file has a bunch of libraries that are a pretty good first guess at what you're going to need in order to run CUDA or run a graphical sort of a 3D graphical representation or something like that from within your container. And we did it like this specifically. And so it also, I should highlight that there are libraries that it's going to find, and then it's going to try to find also some binaries on your host, based on the fact that this is the way NVIDIA drivers are usually installed, and it's going to try to bind mount those into your container also. Now the reason we did it this way is because we didn't want to provide an authoritative list. In the code of this is all the libraries that you absolutely have to find.

We wanted this to be user configurable because you know what? You might have some weird system. You might have some weird requirements. You might be doing something that we haven't anticipated and tested. So we want you to be able to figure out what this list should be on your own. And so this list is editable by the administrator. I think there's even ways, it's been a while since I've played with it, there's probably even ways for a user to edit the list themselves and change the change the list. But in any case, that list usually provides a pretty good starting point for being able to utilize the GPU inside the container, it turns out. So and then what happens, it's actually pattern matching. And so what it does is Apptainer looks in the LD cache, which is where all the libraries for the entire system are. There's a cache basically saying, here are all the libraries to look for when you need to link stuff.

And it looks in that cash and it finds not only libcuda.so, but libcuda.so.1 for instance, or libcuda.so. whatever it is. It pattern matches and it finds all those libraries on the system, and then it bind mounts them into one particular place within the container. So if I do an apptainer shell --nv, and then I do that tf.sif, then it basically just says that it had to create a bunch of different bind mounts to get that NVIDIA.smi in there. But now if I did NVIDIA.smi, oops, you can see the GPU from within the container. Sorry, it's really big and gross there. But the way that it actually works, so if you go to the top level directory of Apptainer, there's a hidden directory called .singularity.d, and that's like the guts of the container.

That's where all the metadata and stuff go. There's a sub directory there called libs. And if you look in libs, use a command that works to look in libs. This is all the same libraries or many of the same libraries that I just showed you out of that configuration file. It's gone and it's found these libraries and it just jammed them all flatly into this one directory. And then within the container, if I do like echo $LD_Library_Path on the end of that, I can see singularity.libs, which is set by default with Apptainer. That LD_Library_Path, I believe, is set no matter whether you use the --nv option or not. But in this case, there's some other stuff. So this container, I should say too, was grabbed directly from NVIDIA's container registry, the NGC.

And so they've prepended some other stuff that is going to be important for this container to work properly. But this in a nutshell is how this works. And to your point, Zane, you could go to your host, could be an Ubuntu host, it could be Debian, CentOS, it doesn't really matter, you know this is pretty much the way that NVIDIA installs the container on every different distribution that it supports. And so this pretty much works no matter where you go.

Can You Add New Data to Container Files? [33:17]

Zane Hamilton:

So you said that you could add things to this file, or there's the capability of adding things to that file. So does it read every time you execute a container?

Dave Godlove:

No, it reads every time you execute this container. So, you could, and it sometimes is the case that you find, okay, so this list is not perfect, and it's sometimes the case that an administrator or something finds, wait a minute, there's something in here which conflicts with my user's ability to do X, Y, and Z and it's not working properly. And so you can just comment stuff out, or you can add different things. For a while, this was like a undocumented and unintended feature for a while. This actually, I believe with earlier versions of Apptainer, or actually earlier versions of Singularity, it would create an environment variable on the host to set the bind path, and then it would use that when it ran. And so you could actually create that environment variable yourself, and you could use that to copy libraries into that /.singularity.d/libs directory. And people, we found out, were using that because that was like an undocumented, it was a side effect. And it wasn't really expected that users were going to use that environment variable. And then they were, and I think we tried to change it, and then users were kind of like, wait a minute, stuff doesn't work anymore. We were using that.

And then just to show, I mean, I could show really quick. Can I show you that this does actually work? I've got a Jupyter Notebook that I installed in this container. So I can do something like this and start up a Jupyter Notebook. And if I go over here, this is just a basic GPU demo. Okay? So I'm going to watch the GPU over here. Notice I have X.Org running. This is why I can share right now, by the way. So last time I tried to share, my computer updated and it swapped me over to Wayland and Wayland was unhappy with trying to share through Streamyard.

Zane Hamilton:

You fixed it.

Jupyter Notebook Demo using MNIST Data [36:06]

Dave Godlove:

Yep. All right. So I'm going to go ahead and connect off screen to this. All right, so I've connected to this Jupyter notebook I've got running that I started with this command. I've got this MNIST example. So MNIST is a bunch of handwritten digits. It's the Modified National Institute of Science and Technology Handwritten Digit Dataset. And it's used really widely for stuff like this. Just silly little examples. You can see I've run it once before and it gives me a bunch of warnings. But anyhow, I'm going to go ahead and run it again. It's going to give me, hopefully if it works, some warnings. Are you not running?

Oh, it started to run and then I stopped it. All right. But in any case I see Python installed here, or there's a Python process running on my GPU here. And it's training. So basically all this is doing, this is a very simple example, and I hit it twice, and that's why it's not giving me output here.

Yeah, it's giving me a bunch of warnings back there, but in any case, you can see it's running here. Oh, here we go. So what happened was this is importing TensorFlow. It's downloading this MNIST dataset or loading it. It's creating a very simple model. It's compiling the model a certain way, and then it's fitting the model to the data. And so I did 10 iterations here. That's more than enough to get this model to recognize these handwritten datasets. And so now my accuracy is like 98%, so I'm only wrong 2% of the time. Because that's a really simple little example.

How Are CUDA Libraries and NVIDIA Drivers Different? [38:02]

Zane Hamilton:

Thank you for sharing that, Dave. Absolutely. We've talked about it. I know they put in here a couple different times talking about compiling or building CUDA Kernels. I think that was something else we were going to talk about is CUDA.

Dave Godlove:

Yeah, so I wanted to talk about CUDA a little bit, because this is something too that I find that there's a lot of confusion on, and it sometimes causes errors and issues and problems. And so I wanted to just talk all this, all the way through. So CUDA sometimes people refer to CUDA as the CUDA driver or they conflate the CUDA libraries with the NVIDIA driver. And I think that this is something which could be made a little bit more explicit by NVIDIA, truthfully. So CUDA is different from the driver, right? CUDA is a collection of libraries that you can use in order to do parallel computing on GPUs.

You can install the library or I'm sorry. You can install the driver and you don't have to install CUDA. And you could in theory install CUDA as well, but you're not going to get very far with it without the driver. The two are separate. The problem is that, it's confusing that the two of those are like conflated together. So one of the ways that you can get CUDA is by downloading a self extracting binary, a .run file that NVIDIA provides for you. And that's a pretty standard way to get and install CUDA. And when you do that, if you actually extract a binary and you look at it, you'll see that it extracts a bunch of libraries and stuff and extracts an installer and everything.

But it also extracts another self extracting binary, another .run file, which is packaged with it, which is the drivers. And so everything's packaged up together. And if you just go ahead and just run that .run file and install it, it's going to try to install a new driver on your host system as well. This makes sense. If you're installing CUDA on bare metal, on a node, because you're probably installing the latest version of CUDA, and if you're doing that, whatever driver you've got is probably not going to be up to speed to be able to run the latest version of CUDA. So you probably have to update your driver too. So that makes sense. But when you grab that .run file and you try to install CUDA inside your container, you don't want all those driver libraries inside your container.

They're probably not going to match whatever host system you're trying to run. And if they actually get seen, if they actually don't get masked by the LD_Library_Path, they're basically going to break your container and make sure that it's not able to run on the GPU. So this is where the waters are muddied and it becomes problematic for people, right? It's really important to recognize that there's a difference between the driver and CUDA, and it's really important to recognize that the driver is on your host, and that's where it comes from, and that's what it's tied to. And that CUDA is in your container, and that's what it's tied to. And that's because if you're running TensorFlow or PyTorch or whatever, you're running some CUDA enabled program, it's been compiled with a specific version of the CUDA libraries. And so you can't do the same trick that we did with the driver and bind mount CUDA from the host system and then the container at runtime. That's probably not going to work. You need to have CUDA in your container, and you need to have the driver on your host. And the two need to be separate.

Are The Container CUDA Drivers Dependent on The Host CUDA Drivers? [42:19]

Zane Hamilton:

Is one dependent on the other from a versioning standpoint? So if you had CUDA in your container, could you possibly break it on a host?

Dave Godlove:

You could. But it's a little bit more forgiving. So when it comes to the drivers, remember you've got a Kernel module and you've got a bunch of libraries which interact with that Kernel module. And those have to be exact, and that's why we get them from the host and put them into the container at runtime. With CUDA, it's a little bit more, so you basically just need a minimum version of some driver to run CUDA. The drivers are backward compatible to run older versions of CUDA. So you could install the very latest CUDA into your container, and then try to use that with a driver, which is older and it would break. But as long as you've got a relatively up-to-date driver on your host system, and as long as your CUDA is not bleeding edge within your container, you're going to be okay. You should be able to run it.

There's also, so before we get off that topic, there's one other thing I showed you actually in that configuration file a minute ago, like the very first library that I showed you that I was bind mounting from the host into the container was called libcuda.so, and that's not CUDA, that is not CUDA, that is the library, which is part of the NVIDIA driver, which allows the CUDA libraries to talk to the NVIDIA driver. It's actually part of the driver. It's not the CUDA libraries. This is why I'm saying that I think that NVIDIA could make this a little clearer, and it's complicated keeping the two separate. But it's confusing.

Can You Cache Shared Objects to Avoid Dynamic Node Loading? [44:09]

Zane Hamilton:

All right. Mr. Debonis had another statement about shared objects. Yeah, here we go. So, shared objects in a large HPC system with a distributed file system have historically caused IO storms and bottlenecks. Is there a way to cache these SOs to avoid dynamic loading to the node? That's an interesting question.

Dave Godlove:

Well so it depends. So in this context you're probably not going to have your drivers, in fact, I think that it would be really weird, I think if you had the NVIDIA driver libraries installed on shared storage. That'd be very atypical, although, I don't know, it might work, I guess, but it would be a very atypical installation. Normally when you're installing drivers, you go through node by node and you install the drivers on the nodes, and that's stuff that's on the nodes themselves. Yeah, I don't know. So what do you think Brian, you've got a little bit more HPC.

Brian Phan:

I actually agree with that, I've never seen these typically installed on a shared file system, but if it were me, I think this should just go on the node itself just to avoid that specific bottleneck. Yeah.

Dave Godlove:

I mean, you could get into a situation if you installed it. So once again, part of the driver is a Kernel module and that's not going to be installed on a shared file system. So you could get into a situation while you're in the middle of an update. And in fact, I've been in this situation at the NIH before where you're in the middle of an update and you're doing a rolling update. And so some of your compute nodes have one version of the driver and others of your compute nodes have another version of the driver as you're going through and trying to update them without taking them all down at the same time and messing your users up. If you had the driver libraries installed on the shared files system it would break on whatever nodes that those libraries didn't match the Kernel.

So you'd want to avoid that. Now, as far as the CUDA libraries go, those are going to be installed within your .sif, within your container. And so even if that's sitting on shared storage, the way in which the image is mounted onto your file system for your node is going to avoid all the little metadata lookups, which can cause those IO storms that you're talking about. So just the fact that you're using containers and specifically that you're using Apptainer containers in which the container is a single image, that's going to actually help with that problem, that's going to actually improve that problem and make sure that you avoid it because you could see a situation, in fact, it's a normal situation to have CUDA installed on a shared file, right? So you could have CUDA installed just like any other application on a shared file system and have that controlled with your module so that you're loading. You have your users load whatever version of CUDA they need for their particular application to run. And so you could totally have IO bottlenecks, but using the containerized approach is going to help with that.

Zane Hamilton:

Thank you for the question, Dave. I think Greg was agreeing that he liked the question as well. I was hoping it was going to be one of those, you asked him if he could do it, he told you it was not a good idea, and then you go do it and prove that it was.

Dave Godlove:

Well, I mean, I wasn't proving anything. I was just sort of stumbling around in the dark and it turned out that it was a good, good direction to go in.

Zane Hamilton:

Very good. All right guys, if we have any questions, now is the time, because I think we're at the end of what we wanted to talk about. There's some other comments in here about libraries and there's anything else. Give it a minute.

Other Possible Problems With Unusual Workloads [48:45]

Dave Godlove:

Well so there are some other things that we could talk about. So as I said, the way in which the --nv option works and bind mounts these libraries in, is dead-bang simple and even a little stupid. And there are times in which it'll break when you've got an unusual workflow or something. So I've seen that and issues are raised once every couple months or something, we've got an issue with somebody who's doing something unusual and they'll raise an issue in GitHub and say this is a problem. And a lot of times the solution to that is to edit that file, edit the envliblist.conf file.

Using NVIDIA Container CLI to Solve Query Issues [49:37]

So there is another way. So NVIDIA, actually, probably around the time or shortly after we began working on the --nv option, they began to develop a product that's called libnvidia-container on GitHub, but it's also alternately referred to as the NVIDIA Container CLI or NVCCLI. But what it does is it's a different program that you can install on a standalone way on your system and you can query it to do things like figure out what libraries are on the host system and not only figure out what libraries are there, but figure out what libraries for a specific application. Like I want to use visualization for instance. So what libraries exist on my host system that are important for me to be able to use 3D visualization.

And so because that exists you can actually, if you have that installed, there's another command line argument that you can pass to Apptainer, which is called --nvccli. It's a mouthful, but if you pass that, instead of just going and looking at this dumb configuration file, which has a list of libraries which are probably right for most applications, it'll go in query NVIDIA's third party tool and say, tell me what do I need to have passed into this container. Now it works a little bit differently. It doesn't want to just bind mount these libraries into your container, instead it wants to actually write them into your container. So typically if you don't specify anything, what that'll do is it'll set up a writeable temp file system in memory and it'll overlay that memory onto your container as though it were like an overlay in Apptainer and then it'll write to that.

But that's ephemeral. So it'll go away when the container is done running. So there's that consideration. And the other consideration is the places that it's going to write to are places where a normal user doesn't have access to write. And so you need to have fakeroot access enabled and you need to run it as fakeroot typically for this to work properly. So there's a few little gotchas and little caveats, but there are circumstances in which that option is going to be a better fit for what you're trying to do and it's going to be a little bit more robust than just relying on the --nv option.

Zane Hamilton:

Thank you Dave. So I do have one more question from Greg. Greg wants Dave to remind him who won the basketball shoot out at the arcade.

Dave Godlove:

Yeah. That would be Kenneth every time. He is absolutely awesome at all things I think video game or carnival related. He's a pretty amazing guy, so we're taking this trip down memory lane.

Zane Hamilton:

Thanks for the question, Greg. We appreciate it. So you guys don't have any more questions? Go ahead Dave. Go ahead.

Dave Godlove:

Yeah, I can just, you know me man, I can just keep talking forever.

Zane Hamilton:

Absolutely keep talking.

Solving GPU Driver Issues with NVSMI [53:10]

Dave Godlove:

So the reason that we got back into this recently and I started going through this and looking at all this, this was prompted because we started having some trouble with some cloud instances recently, which prompted me to go through my memories and dig up all the information that I had and revisit everything GPU driver related. So I'm writing a blog post about this now, which hopefully will be published pretty soon because I think it's interesting and it might help some people actually. So we use a cloud provider here at CIQ pretty extensively called Vultr just for single spot instances and testing things and doing stuff like that. And so Vultr has some GPU instances like many other cloud providers that you can spin up.

And so, me and several of the other solutions architects, I think Brian included, have been spinning up some GPU enabled Vultr nodes and then trying to run GPU enabled workflows on them. And we were having this weird problem where we would try to run the workflow. If you just did nvidia-smi and you just looked at the graphics card from within the container, you could see the graphics card and you could see what was or was not running on it or whatever. So that was working, but then when you tried to run the workflow, it would just complain that it couldn't find any CUDA device and it was like, well the CUDA device is right there and you can see it within nvidia-smi, why are you having this much trouble?

And so I struggled with this for a little while. I think everybody struggled with it and then just did some workarounds and went and did other work for a while. And I struggled with it for a little while because I was trying to put a demo together that used the GPU on a cloud instance. And I finally said, I don't know what's going on here. I'm going to just use my local machine. I'm just going to use my local laptop to do this demo, which has a GPU in it. Although at the time I didn't have a driver installed for it. So I went and I installed the driver on my laptop, I rebooted my laptop and then I went and tried to run the workflow and I had the same error.

GPU Driver Obstacles at the National Institutes of Health [55:34]

So, this was not because of the cloud. And I started to think, and I started to say, wow, five or six years ago, I think I had a similar error at the NIH . And it turns out, this is complicated, but it's neat. So it turns out that when you reboot a node, there are some device files in /dev that are needed in order to run the GPU and those are not created by default. So when you reboot the node, they're not all guaranteed to be there. What happens is when you run a program which is compiled using CUDA, or when you start the X server or you do anything else, which is going to ping the graphics card and start using it, it calls a program called NVIDIA ModProbe, which is installed as part of the driver, which goes and looks in /dev and says do I have all the device files I need?

And if it doesn't, it creates them. Here's the kicker. A normal user can't create files in /dev, right? So this is an SUID BIT program. It escalates your privileges behind the scenes and it allows you to run a specific operation as root. That one operation let me create the right device files that I need to talk to the GPU card. Okay, well that works on bare metal just fine, but it doesn't work in an Apptainercontainer. You can't escalate your privileges like that and use an SUID bit program within an Apptainer container. So the upshot is, if you restart a node and then the first CUDA enabled workflow you try to run is containerized, it's just going to say, I can't find the CUDA device. If you restart a node and you run a CUDA enabled workflow on bare metal, it'll create the device for you and then subsequently you can run it within an Apptainer container.

So this is a really weird, hard to diagnose, strange problem. And I remembered, we had this problem a long time ago at the NIH and the solution, ultimately when we figured it out, was to just create a little startup script that goes through and creates the proper devices and make sure that that runs on the GPU nodes at startup. And so once I figured that out, I googled around a little bit and I found that NVIDIA actually provides for you, a startup script for this particular weird situation in which you don't have NVIDIA ModProbe installed as SUID BIT, or for whatever reason you can't use it. Like we can't use it within an Apptainer container. And so I just grabbed that start script, threw that onto Vultr so that it runs when the nodes boot up and then all the device files are created and everything works for you. So I I started looking at this and started talking to the other Solutions Architects and then it came up. Well why don't we just do a webinar where we just talk about GPU stuff with Apptainer?

Zane Hamilton:

It's fantastic and I'm also really impressed that you remembered an error you got five years ago.

Dave Godlove:

But it took me days. It took me days to remember that.

Zane Hamilton:

Hey, you got there. That's impressive. Impressive. So very good. Well now we are actually up on time so you don't have to no more time tell stories Dave. I'm sorry. 

I appreciate it. I appreciate everything. It's very interesting and certainly something that a lot of people are interested in obviously, so we really appreciate it. Brian, thank you for joining us as well. Guys, go like and subscribe and join us next week for another round table, I believe. Thank you and see you next week.