Apptainer in Bio Sciences
In this webinar, our Apptainer (formerly Singularity) experts will discuss Apptainer use-cases and features that directly help life sciences.
Webinar Synopsis:
- What is Singularity?
- Linux Foundation Requirements to Sponsor a Project
- Are There Uses for Apptainer Outside of High Performance Computing?
- How does one transmogrify a Docker container into Apptainer?
- How are people using Apptainer today?
- Compliance and Means of Reproducibility With Containers
- Envisioning a Standard DOI for Containers
- Cryptographic Signing and Apptainer
- Work During COVID-19
- Containers vs. Packages on Conda Environments
- Container Reproducibility With Different Kernel Versions
- Kernels, GPU's and Ending Up With a Different Answer
- Developer Reproducibility and Scientific Reproducibility
- Real Life Use Cases in Life Sciences
Speakers:
- Zane Hamilton, Vice President of Sales Engineering, CIQ
- Gregory Kurtzer, CEO, CIQ
- Dave Godlove, Community/Release Manager for Apptainer, CIQ
- Glen Otero, Director of Scientific Computing, Genomics, AI & ML, CIQ
Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.
Full Webinar Transcript:
Zane Hamilton:
Good morning. Good afternoon. Good evening. Wherever you are, we appreciate you spending time with us today. Welcome to another CIQ webcast. Today we are going to talk more about Apptainer. We are bringing in exciting and interesting people who actually have done the work, used the product, and are involved in the community. It is nice to hear real-world examples from actual scientists who are also part of the CIQ team. Today, we have, as always Gregory Kurtzer, and we’ve got Dave and Glen. Dave and Glen are both new to CIQ; Glen is extremely new. So why don’t you guys introduce yourselves? Greg, I’ll save you for last because I think most people should know who you are by now.
Dave Godlove:
Hey everybody! I’m Dave Godlove. I worked at the NIH for two years. Before that, I helped Greg found Sylabs, and I was the community/release manager for Singularity, now Apptainer. Previous to that, I was also at the NIH for another two years, working at the internal cluster Biowulf.
Zane Hamilton:
Very nice, thank you. Glen, welcome!
Glen Otero:
Hi - Glen Otero. Like Dave, I was trained as a scientist, long, long ago. I have been working in bioinformatics and HPC for a little over 20 years, about the same amount of time I’ve known Greg. I worked with a lot of people, when I was at Dell, TGen, and other places, to help design clusters for genomics and bioinformatics workflows. And that’s what I do best: I make things go fast. And you know I’m a real hacker because I have a Monty Python mug.
Zane Hamilton:
And, of course, Greg.
Gregory Kurtzer:
Hey, everybody. So I too am a scientist, first and foremost, in my career, but unlike these two guys who are smarter than me and Zane, of course, as well, they have PhDs, and I didn’t quite finish my PhD, so I’m going to hide her in the corner and let these guys talk for the majority of this.
What is Apptainer? [02:31]
Zane Hamilton:
You can’t hide for too long, Greg. I’d like to level set and take a few minutes for those who haven’t done this before on what Singularity is, and what is Apptainer? Where did Apptainer come from? How is it different from Podman, Docker, etc.?
Gregory Kurtzer:
For historical relevance and how we got here, I will talk about why Apptainer was developed. Five or six years ago, customers, researchers, and scientists started getting wind of this new idea called containers. I got it a lot from UC, especially. I was on a joint appointment from Berkeley Lab, DOE as well as UCOP, specifically on appointment with UC Berkeley. I constantly had people coming at science from the long tail, so it was like library science, political science, and many others that you never thought would be High Performance Computing consumers, started asking us: “Can we go run this operating system or can we do this differently? And if you just supported containers, all of this would be so easy.”
So we started looking into how do we incorporate containers into High Performance Computing (HPC)? And the first thing that we found is the existing container ecosystem is really designed to solve a problem around services - and root running services in particular -and that did not really fit well for the architecture that we were using for High Performance Computing (HPC), which is the traditional Beowulf,. where you can have any number of users on your system at any given point. And if you give any of them access to a root running service, technically that is considered a security breach.
We had to figure out how to solve that, and that’s actually what caused me originally to prototype Singularity. Within six to eight months, Singularity was used on the majority of High Performance Computing (HPC) systems worldwide. It took off massively fast. As Dave mentioned, we created a company called Sylabs focused on that. When Dave and I left Sylabs, we took the open source Singularity project out of the company and brought it with us to ensure that it’s going to stay in the community. We decided that the best way to make sure that this project stays open source and has a very long life ahead of it and cross-pollinates with various other aspects in the community, specifically aroundOCI (the Open Containers Initiative) and the Cloud Native Computing Foundation, it makes a lot of sense to move it into the Linux Foundation.
The Linux Foundation and the open source community were excited about moving Singularity to the Linux Foundation. The one request that the Linux Foundation had was “please rename it.” Singularity is used too often as a search term, and it’s all over the place. Since that term is used very frequently, it would have been too difficult to protect the brand of the project and protect the project itself, so they asked us to rename it to something unique. We put the renaming to a vote among 50 community members that wanted to be a part of the renaming process. Apptainer was the name chosen.
Linux Foundation Requirements to Sponsor a Project [06:30]
Zane Hamilton
What are the requirements the Linux Foundation has before they will sponsor a project?
Gregory Kurtzer:
There wasn’t a huge amount aside from the fact that it is truly a community project rather than corporate controlled. If it is corporate controlled, it must go out of the company and into the Linux Foundation. But we basically assign all aspects of the project to the Linux Foundation. So the Linux Foundation is the owner of the project. The Linux Foundation owns and now maintains all trademarks, brands, and all that sort of stuff. There is a charter involved in doing that which requires the organization of the project, how certain things are going to happen with regards to the decision-making process, to be handled by the technical steering committee, very similar to the board of the project.
The Linux Foundation has the ability to do monetization on projects. We chose not to do monetization on Apptainer because there was not a need to monetize for project’s benefit. This might change at some point in the future. You have to assign everything over to them. They come up with a charter. You agree to the charter, and you assign the first technical steering committee, which by the definition of the charter is the committers of the project, so those who have commit access or those merging or responsible for merging PRs. Contributers, of course, are the people generating PRs; committers are the people that are responsible for merging and they are the people on the technical steering committee. And who is the technical steering committee? There’s myself,Ian Kaneshiro, Cédric Clerget, Dave Dykstra from Fermilab, and Krishna Muriki from KLA.
Are There Uses for Apptainer Outside of High-Performance Computing? [08:51]
Zane Hamilton
It's always good to know who is involved. One of the other questions that I get quite often is: will Apptainer continue to be HPC-focused or are there outside-of-HPC use cases that people are looking at?
Gregory Kurtzer:
I would say, yes, Apptainer is going to be focused on the Higherformance Computing side, but that doesn’t mean that those are all the use cases that it is going to be used for. One that we are familiar already with people using it for is high security environments.
Apptainer has some very unique ways that it manages its containers in the image format. That image format allows us to do things like cryptographic signing, verification, layering cryptographic signatures as well as encrypting that image soonly people who have the passphrase or the keys to that can unlock a container. So there’s a lot of high security and high trust use cases around Apptainer.
The last thing I want to say is that the name actually came from application container. We see a big difference in terms of usage between service focus containers and application focus containers. There are good use cases for both of those. And one thing that I personally believe in is always using the right tool for the job. And if you are bringing up High erformance Computing applications, MPI applications, X or gooey applications and whatnot. You can do it in a service-focused container, but it's quite a bit more difficult. It's not designed specifically for that. Where Apptainer absolutely is designed exactly for that. So it works very well at that. I am a believer in using the right tool for the job. If I use HPC applications, NPI applications, X or GUI applications, you can do it in a service-focused container but. it’s quite a bit more difficult. It’s not designed specially for that, whereas Apptainer absolutely is designed exactly for that, so it works very well.
How does one transmogrify a Docker container into Apptainer? [11:10]
Zane Hamilton:
Glen Otero recently introduced me to the word transmogrify. Glen, would you explain how to transmogrify a Docker container into Apptainer?
Glen Otero:
Docker is the predominant container format for many bioinformatics tools. We simply import it into Apptainer. At my last job, at the end of last year we switched from Singularity to Apptainer. It was a flawless transition. We continued to use the same commands. We were able to import Docker containers into Apptainer easily. We also started using Podman for Docker containers to get a daemonless process running. Transmogrify might be a little heavy-handed because it seems to be a pretty easy process to get Docker containers run in Apptainer. We ended up doing this a lot. It’s not that hard! It’s actually pretty easy.
How are people using Apptainer today? [12:51]
Zane Hamilton:
Let’s start talking about how people are using Apptainer today. Dave and Glen, how are life sciences leveraging containers and Apptainer?
Dave Godlove:
I would like to talk about Apptainer from a historical perspective. I recently had a bit of an epiphany, as I was writing some documentation, that the use cases have changed around a little bit and are different from what it was originally. One of the things I always try to stress is there is a differentiation in the type of users between producers and consumers when it comes to containers. When Greg first started developing Singularity, which is now Apptainer, and started getting the community together, there really wasn't a good way to ingest Docker containers. And so if you were going to be a consumer of containers, you had to be a producer first. The way in which we wrote the documentation was to start off with: how do you build a container? That’s square one.
Recently, in going through some of the old documentation that we put together and updating it, I had the epiphany that we had the entire documentation backwards. We needed to flip it all around and write an entire section on how do you use containers first, and then, almost as a footnote at the end, “by the way, if you also want to build your own container, there’s a way to do that too, and here’s how.” That’s really a change, and it shows how much the community has grown now and how containers have been adopted to the point where if you want to use a container, 9 times out of 10, you don’t need to start writing a definition file, or a Docker file, or something. The first thing you do is you go to Docker Hub or another registry and you look around and you find the container that it is that you want to use, and you download it and use it.
Zane Hamilton:
Glen, do you have anything to add to that?
Glen Otero:
Having tried to build a lot of bioinformatics tools over the years and having scars from it, I am encouraged to see a lot more researchers are not just using GitHub, and posting their code, so it’s easier to get instead of having to email somebodywho left their postdoc to become a professor and cannot be bothered to send you their code. So I’m encouraged by that and even more so now that they’re actually posting them in containers. Some of the more enlightened researchers will post both containers in Apptainer and Docker. It makes sharing containers, code, and scientific reproducibility much, much easier than it was before. Because even compiling in different platforms, we know, can come up with different results, as small as they may be. So I am really encouraged by that and then for users to get other people’s code.
It goes both ways: sharing it from the original creator of the software and then people who want to reproduce it and then pass it on. It’s also easier for – researchers at my last organization would come in and say “we found this code we want to run,” and instead of the admins having to spend a week trying to build it, hey can spin up a container and test it really quickly for them. Or they could just run it on the laptop, and it’s not on the HPC cluster, and so that runs a lot quicker, too. I see the pace of reproducible research picking up really quickly.
Gregory Kurtzer:
You just brought up a couple really good points. First off, the mobility and reproducibility of these containers. Something I said a lot in the past, and I know, since we’re talking bio, there’s certain pharmaceutical companies, bio-focused organizations, and research that, whenever workloads are being used clinically or software’s being used clinically, there are certain mandates according to the FDA that require that that software stack is managed very similarly to a medical device. Dave and Glen, have either of you had similar experiences in regards to compliance, maybe not specifically FDA, but compliance and legal means of reproducibility?
Compliance and Means of Reproducibility With Containers [18:02]
Dave Godlove:
No, not me in particular, and I think that’s probably because the NIH is a bit more on the research side versus the actual applied medical side. If you got more into another agency or more like a hospital setting, I think that you might get into that a little bit. That’s just my experience. I don’t know, Glen, if you’ve seen those sorts of things in your experience.
Glen Otero:
At TGen, we had several startups incubate and spin out. They were all clinical bioinformatics organizations, so diagnostic tests and other clinical sequencing. And the code there was locked down but it wasn't in containers. They had an auditable process, and this should come as no shock: it’s actually the data that is more scrutinized in regards to compliance. How long are you going to keep it? Is it encrypted? Where is it going to be for the next 7 to 10 years or infinity? How can users contact you and have it deleted? All those processes have to be sorted out. The software was kind of the easiest thing to do. However, I would recommend – if someone had asked me, because a lot of these orgs started before I got there – why don’t we put it in containers, so we could more easily pass an audit, for example, and sign things as opposed to just trusting that the BWA version you downloaded two years ago doesn't contain any bad code or is a crummy version.
It's a big deal. It's going to be an even bigger deal as we try to cover the last mile of software integrity. There are companies out there that want to help researchers create encrypted enclaves in computer RAM, so that their code, data, and models cannot be altered even in DRAM. I think using containers there will be a lot easier as well to help manage those encrypted enclaves.
Gregory Kurtzer:
You also brought up the build infrastructure, well not directly, but the build process. What I really like to see, and this was part of the reason why I was so excited about moving Singularity/Apptainer into the Linux Foundation, is because the amount of innovation and capability growth that has occurred around the rest of the ecosystem around containers is massive. The fact that you can have Git Operations feed into a CI/CD pipeline and automatically generate containers on a GitOps such as a push or a tag is truly remarkable. And the fact that you can then automate that entire pipeline and automatically integrate that into your research is fantastic. I’m really excited about that.
Dave, to your point about flipping the documentation upside down, I totally agree and I’d love to see more emphasis on how do we make use of CI/CD for container creation generation. And how can it be such that anybody who’s a researcher, scientist, or creator/producer of these applications can automatically be creating these containers and the format doesn’t matter. I like OCI a lot because it works pretty much everywhere. Apptainer and Singularity both can leverage OCI directly. But you can also create SIF files and then even cryptographically sign those SIF files.
The second point I want to make really quick is, as we’re talking about reproducibility of these containers and portability of these containers, knowing where you got these containers is actually really important. Now, for the most part, we’ve all just been kind of ignoring this and not paying much attention, like: there’s a container up in a Docker hub; I'm just going to go ahead and pull it, use it, and see how it works.
There is risk associated with that if you don’t have validation of that to some extent. We already have seen certain container viruses or malware going out there. There was one that I thought was hilarious, which was everytime somebody ran a particular container, it started doing crypto mining on your own resources but putting it into someone else's coin account. That was brilliant, but it was a malicious container, right? You didn’t expect that to happen. We need to have some sort of management of provenance of these containerized workloads and trust in these. I do think there has been a lot of forward progress moving out of OCI and CNCF around exactly this, so I’m really looking forward to that, and I’m looking forward to integrating that into Apptainer and making this even easier and better for users.
Envisioning a Standard DOI for Containers [23:41]
Zane Hamilton:
Thank you, Greg. So we do have one question: does anyone on the panel envision a standard DOI for containers? I.e., if a published paper uses codes/systems for computation, to share the code and those systems via DOI for a container?
Gregory Kurtzer:
People use both, but the most common way is a DOI. That’s a great question. This was one of the initial goals and motivations for a project called Singularity Hub, which was started early on through Singularity, and Vanessa created this project. And that was the goal behind that, basically to have a place where you can post your container and have a UUID associated with that container, such that anybody in a publication or wherever can reach out to there and touch that. Our previous company, Sylabs, that David, myself, Ian, Cédric, and others were all in, also had something very similar to that called "the library."
So there are solutions for this. And in an OCI, I see people now using Docker Hub as a DOI for their research papers and as a reference. I think we need to do more of that. I absolutely think we should be doing more of that. In every research paper, if there's software associated, there should be a link not only to the source code and recipe for how they created their environment but the actual containers that they used to generate that analysis. And again, going for reproducibility is so critical here. There's multiple ways of registering and obtaining DOIs for various papers and software releases. I don't know if anyone is using the official DOI infrastructure for a container. But I’m going to ask Glen and Dave if they have any more insight on that than I do.
Glen Otero:
I don’t know if anyone does, but I was just trying to quickly research. The journal GigaScience is one of the leaders in providing DOIs for data sets. With GigaScience, when you submit your paper, you also have to submit your data and make it available. I believe it gets a DOI as well, separate from the article, but they’re associated. I would like to see that type of registry for containers as well. Because, to Greg’s point and as I mentioned before, it is one thing to have GitHub have your code and a link to your container, but will your GitHub account be around forever?
Will you always have access to it? What happens if you move from place to place? What happens if you want to take the software and make a company out of it? Shut some of it down and keep some of it open. I think for that split second in time, that has to be fossilized in time, so that people can always go back to that point in time and not have to use whatever comes upstream of that. So a Singularity Hub kind of makes sense. You could have a separate Hub, but then more journals should start requiring DOIs for containers. Maybe we have to get a build process that’s approved for journals as well.
If I wanted, I could put a container up there that starts bit mining on your computers. I make that joke, but I just realized the number of illegitimate journals that are out there now, requesting papers – I don’t know if you guys have seen some of this stuff, but some people publish just complete nonsense in these “online journals” that make themselves sound respectable. If someone falls for that and puts their container up there, or they start putting up a bunch of malware-filled containers, and then someone not realizing that’s not really a legitimate journal downloads that, then they’re infected or then they’re part of the bit mining.
We might start needing a process that’s certifiable when we build containers before they are posted. Maybe this is something the NIH or NSF could be a part of? I don't know, but obviously, it has to be a global agreement. So I don't know if the Sanger, the Broad, and GA4GH (The Global Alliance for Genomics and Health) – they are big into API development – maybe it’s something that they could jump ahead on and start thinking about.
Cryptographic Signing and Apptainer [29:35]
Zane Hamilton:
I know, Dave, we talked about this yesterday a little bit, having some sort of process behind it and even how the beauty behind Apptainer is being able to do cryptographic signings, so you can ensure what's in there, if you want to share some thoughts on that?
Dave Godlove:
I’ve got a couple points I’d like to make; you sparked a couple thoughts in my head when you were talking there, Glen. One of them goes back to Mystic Knight's question. Do we envision any standard for a DOI system? Well, the tools are all out there. In fact, there are multiple different ways already to do this. There are lots and lots of tools out there to ensure that you can use shot sums; you can sign and verify your containers; you can do things to make sure that you are getting bit-for-bit reproductions of the original container that the author intended.
I think what we really need is more community agreement on what kinds of tools we are going to use and how we’re going to implement those, and that’s what’s missing right now. There are lots of tools, but it is like the Wild West as far as how people decide to implement them or whether they decide to implement them. Like you were saying, Glen, we need to get together and try to come up with some standards around this tooling, if this is going to succeed, if we’re going to have this kind of infrastructure in place.
My second point that you sparked in my mind, I’ve said this before in some pretty public forums, so people have probably heard me talking about this in the past, but I’m going to get up on my soapbox for a second. I think people understand this now. It used to be that very few people understood this, but if you are interested in reproducibility and making sure you get exactly the same code base in your container every time, you cannot rely on a DEV file or a Docker file. You have to have that container. You have to have those bits. I really got surprised one night.I have an antidote that I often share. One night I was doing some testing on a TensorFlow container, and I was building the container and testing it, and building the container and testing it,, and asI was doing this iteration, all of a sudden the container build stopped working. I couldn’t understand why. It was because on Docker Hub, that tag that I was using had been removed. The internet is subject to change and it's changing all the time. The same definition file that you created five minutes ago might not work again. If you are really interested in reproducibility, you have to save the container. Don’t save the file that created the container.
Zane Hamilton:
I think what you’re saying, Dave, is always use latest, no matter what?
Dave Godlove:
Yes.
Work During COVID-19 [32:50]
Zane Hamilton:
Here’s a good question. What about work during COVID-19? How did that play out? Any thoughts on that, Glen?
Glen Otero:
My prior employer, we sequenced 70% of the COVID tests in Arizona. Our pipeline was set up very well, but it could have been improved. I saw all over the internet people posting containers specifically for viral genomics and then they would tweak it specifically for COVID. It would probably contain COVID reference genomes or something like that. Some projects became more prominent than others. ARTIC was one project that was putting out lots of pipelines for COVID sequencing and analysis. They were publishing them through the workflow Nextflow. I was a big fan of Nextflow. Years ago, when Nextflow started, they adopted Singularity a year after Docker.
They still support both Singularity and Docker very, very well. The ARTIC network was actually the labs that were publishing a lot of those pipelines. They put all that code in containers because obviously reproducibility was a big deal during a pandemic. When you have people using pipelines created halfway around the world to do viral genome analysis, so we can do the genomic epidemiology. For containers there, there wasn’t really any question that that’s what people had to do. The only code I saw that wasn’t in a container was from a publication, someone who had figured out a new mutation or something like that. They had their code; they figured it out. But people who were spreading pipelines for other people to use, all the code was in containers.
Gregory Kurtzer:
So I have heard of various research organizations doing things, such as National Labs doing research on it as well. I've also heard anecdotally that a lot of that was happening through Singularity and Apptainer. It's really cool to hear Glen's specific experience in working through that.
Dave Godlove:
I mean, one of the things that is so great about container technology is that it really is enabling for researchers. And at the NIH, a lot of the researchers build or use containers and do so without the explicit help of any administrators. And because of that, I am sure that there was a lot of enablement, and that was happening behind the scenes, that administrators did not even have to deal with it because users were able to use the containers on their own.
Glen Otero:
One more thing that you reminded me about, too: with viral genomics, the genomes are so small compared to human, or other mammalians, or even lots of plants. A lot of these things can actually be done on a laptop, if you are only going to do one analysis. Maybe not my laptop because it’s a couple years old. With containers you could actually go pull a viral recon container or something from ARTIC and run a workflow on your laptop. That allowed people who did not have large HPC clusters to actually participate in the global effort.
Containers vs. Packages on Conda Environments [37:23]
Zane Hamilton:
That’s fantastic. We did have one question that just popped in: could you comment on the pros/cons of container vs. packaging a conda environment (e.g., conda create)?
Dave Godlove:
A definite pro is that you can replace the conda environment with a container. Conda has become pretty unwieldy. Previously at the NIH, there was a huge conda environment that has lots of standard tools installed in it. Going through the dependency resolution step, it’s gotten to the point where we actually need a cluster to run the dependency resolution step because the memory requirements are such that they will crash your nodes; they’re just too much. It had become completely unwieldy. So now to be able to take those packages out of this big uber onda environment and just stick them in containers instead has been really a huge win for us and HPC.
Glen Otero:
I second Dave’s comments. I have felt the pain of the YAML environment, YAML file a conda, and build everything. It takes hours to complete. I’m like, , “You know what? Delete.” And just try and update it, just create a whole new one. I was a huge fan of conda. And working and talking with Peter, back in the day, like the Continuum days and Enthought days before Continuum, it was a huge leap for Python and as a Python developer. But then I started getting worried when it became all of a sudden like a package manager for anything. You could install anything with conda. You could install R; you could install Unix utilities.
I’m like this is going to sprawl all over everywhere and then get out of control. That is how I felt it happened. With our users, we’ve had this battle with: . yes, you could have this environment; yes, you have a list of things to do, but making an incremental update to that takes hours to do, becomes less unwieldy. And we were trying to promote, “Look, if you just put a single app in a container, your pipeline will work.” If you need a development environment, we tried to limit it, like you said, to keep the dependencies down because conda again can pull in everything. Let's just limit it to NumPy, SciPy, and the Python packages you need, and stop pulling in R and other things that make it a real mess to deal with.
Dave Godlove:
One thing, too: I kind of feel bad becauseI just sort of trashed conda and said that replace it with containers is the way to go. One thing I will add to that is conda is a decent package manager when it is used for what it is meant to be used for. One of the problems, I think, is a lot of HPC sites tried to make a big uber Python environment based on conda and that is maybe not really what it was used for. Conda within containers is actually great. There are many, many, many applications that I have installed where the developers have either instructions or they have the files to package all their dependencies within conda, up on GitHub.
If I want to grab that and stick that in a nice clean container environment with miniconda and use that to install everything into the container, that works great a lot of the time. Within the container, figure out where the conda environment is and put it on the appropriate path or Eeven just sourcing conda and activating it within the environment of the container. Those things usually work just fine. It is usually pretty quick to use conda to install something inside of the container. If you do that, then you modularize it so that you do not have a big conda environment that has a zillion different dependencies conflicting with each other. And how are you going to resolve them all? I think that the two can work together quite nicely, if you do it more in the way that conda is intended.
Gregory Kurtzer:
I am going to respond in a more general perspective. One of the really big benefits of containers is being able to isolate environments and move things around; it’s incredibly important. But the packaging format that we have created, we basically took packaging up a layer. Instead of packaging certain applications, services, or files within an operating system, we are now actually using the package manager to help create that custom operating system, and then we are packaging the operating system and then moving that whole operating system around. So for me, containers are a cool run time. There are a lot of cool things you can do with a separate isolated environment. But it is also a packaging system. And that packaging system is incredibly flexible because it is the user space of the operating system, so it has every single thing you need, and it is portable between whatever Linux distributions you want, as long as your architecture is binary compatible. You got an entire application set in a package in a manner of speaking.
Container Reproducibility With Different Kernel Versions [43:48]
Zane Hamilton:
I think you just pre-empted Jonathon’s question, too: have you seen any issues with reproducibility when running a container on a different kernel version? You just talked about having a container with different packages in it. What about kernels?
Gregory Kurtzer:
Yes. Not only kernels but hardware. The idea of saying that a container is reproducible is not exactly true, depending on what perspective of reproducibility you are talking about. If you are talking about reproducibility of outcome, it is not guaranteed. Kernel bugs, driver bugs, firmware bugs, and firmware changes can impact reproducibility of outcome. For example, we saw something not too long ago where a GPU firmware bug was actually causing wrong answers. These things do happen. When we talk about reproducibility, though, from a container perspective, what we are talking about is reproducibility of the software stack. 100% guaranteed reproducibility of that user space software stack. Some attention still has to be given to the underlying resource if you want to guarantee the entire reproducibility of the outcome.
Generally speaking, though, most of us consider reproducibility of the software stack to be “good enough.” Most of us are not thinking about kernel bugs, driver bugs, firmware bugs, and hardware bugs. We are assuming that all of that is going to work properly. In the kernel specifically, the kernel has an incredibly stable backwards-compatible API and ABI that it is passing to the user space portion of the operating system. Generally speaking, it is something we can strongly count on. But again, things happen, depending on what direction you are going. Usually, if you are going backwards, if you’ve got a newer kernel and you want to run an older container, you can actually run really far back. That API is so stable.
Zane, you know the story I could bring up from Berkeley. e had a system that was like 17 years old, and the hard drive died. We took the hard drive out, wemade a container out of it, and that software continued to run in that 17-year-old operating system. But you can’t go the other way. If you have got an old kernel, the obvious extreme case is very easy, a 17-year-old kernel is not going to support namespace, much as anything else. But if you even have a kernel that is two or three major operating system revisions late, if you wanted to run on CentOS 6, a new Rocky 8, Ubuntu, or a Fedora user space, it is not going to work. Glibc is actually going to check and you are going to get a glibc error. The direction of compatibility needs to be considered in that. But generally speaking – knock on wood – we say everything just kind of works.
Kernels, GPUs and Ending Up With a Different Answer [47:02]
Dave Godlove:
I have a quick question. I have seen that glibc error. A lot of people have seen the glibc error when you have some new version of the operating system and you try to run it within your container and your kernel is old. I have seen that. I know in theory, you could run a container on a different kernel and end up without a syntax error, but actually just a different answer. And I know in theory also with different hardware like a GPU, but I've never actually seen that. Has anybody ever actually seen that? I mean, it’s obvious if you get the glibc. It's great because you know it’s not going to run on this host. But it’s a little scary if you have a math problem and you come up with a slightly different answer. I know that is possible in theory, but has anybody ever actually seen that occur?
Gregory Kurtzer:
I've never seen it in person. I just know of people who know people that have seen it. An example is, again, the GPU issues where people were getting wrong numbers. That came from another national lab, which were the first ones who saw it. I knew people who knew people who have seen it. But you’re exactly right: that’s the one that scares me. When it errors out, and you know it’s just not going to work, that’s actually a good thing. When the error doesn’t happen and it just silently gives you the wrong outcome, that is actually scary. It is also very difficult to debug. I have seen that happen a number of times in software with various math libraries that were not given the right answer, or optimized differently, or loops were unrolled in a way for optimization that yielded a wrong (not completely wrong, just slightly wrong) error. That is almost as bad as when you get the call from a scientist who says my application's running 5% slower than it did yesterday. What is going on? Debugging that one is tricky too, but you are exactly right. I much prefer an error to a wrong answer.
Developer Reproducibility and Scientific Reproducibility [49:20]
Zane Hamilton:
Absolutely. John made an interesting statement: “It would be nice for the language around reproducibility to make the distinction between developer reproducibility and scientific reproducibility.” I think we all agree.
Gregory Kurtzer:
100%.
Dave Godlove:
Yeah, we should almost use different words when you’re talking about, like, reproducing an experiment versus reproducing a software stack.
Real Life Use Cases in Life Sciences [49:47]
Zane Hamilton:
One of the things I wanted Dave and Glen to dive into before we wrap up is common real-world use cases that you see in life sciences or that you have seen or been a part of? Just tell us what you were a part of and what you saw, and it would help me a lot because I like seeing real-world examples.
Dave Godlove:
At the NIH, there were different domains of science that picked up and are picking up containers at different speeds. But there are two different kinds of flavors of end users: the producers and the consumers. And what it all boils down to is: containers are enabling researchers to get their stuff done. It gives them more power and more convenience. That’s the standard use case: researchers using containers to either have stuff prepackaged or to be able to use apt or yum to install things that they would not otherwise be able to install in their HPC environment.
We’re also seeing a lot of people using containers because they are using workflow languages. Glen already talked a little bit about Nextflow, but there are also workflow engines like WDL, Cromwell among others that are using containers. People are starting to pick up on that more and use containers more for orchestration within workflows, even within the sciences. This is from the user perspective but also from the administrative perspective.
At the NIH, it is a unique HPC environment because people talk about the long tail of HPC and NIH is entirely a long tail. There are a thousand applications and growing that are installed on the cluster, which are maintained by staff, which is just a staggering number. The staff members have really picked up on using containers and it’s now become the default, and the vast majority of new applications that are installed on the system are containerized. There are a few different ways to do this and there are different levels of automation. But we put together a pretty easy way for staff to install a container and write a little wrapper script –in fact, in many cases, just copy a boilerplate wrapper script into the same directory and then make SIM links to all the binaries that they want to expose inside the container. And then, after that is all done, it can be put under the module system, just like any other application. The end result is that staff can easily install these applications in containerized ways. Users come along and use them as though they are installed directly on the system and never even know what a container is. That is a really big use case that we are seeing as well.
Gregory Kurtzer:
That’s interesting. I’m just going to touch on that before Glen jumps in. When we were first developing Singularity’s image format, we decided that we are going to make it a loop-based file system, an actual file system and squash all of the contents into that file system. And it was a single file. I forget how the idea entered our heads, but at some point, somebody said: "It’s a single file – can’t we just execute that?" We talked about how to actually run and make that single-file file system executable. When we first figured out how to do it, we realized it was actually much easier than we thought it was going to be, as we’re using a “crude hack” of the shell to do it. But with that being said, you can literally build a Singularity/Apptainer container, define a run script for that container, put it in your path, and then that becomes your application. David, you talk through that real high level but the magic that that actually creates and what it does is super, super, super cool.
Zane Hamilton:
That’s alwaysa fun use case, just because Python virtual environments can get kind of ugly. So being able to do that, it’s magic.
Glen Otero:
I second Dave’s experience. At the NIH, we were struggling with modules and then trying to bring containers into it. I thought it was this ugly hack, but then I thought, like what Greg was saying, it’s really brilliant. Why don't we just write, when they load the module, they just load a container instead of the binary that we used to build for them? This is really just this ugly hack. Why can’t we… oh yeah, the inertia of using modules; we can’t get people just to change their scripts to use containers.” So that is actually how we are sliding in these containers a lot onto our cluster.
There’s another thing that we thought containers were going to be required for. There were folks I was working with that wanted the future of HPC to be cloud native. That did not necessarily mean you were going to run in a public cloud, but you needed to be able to, as one of my colleagues would say, "Run my stuff anywhere, anytime." The only way you can do that is with containers. When we were experimenting with a lot of our pipelines, trying to run them in the cloud, we thought that containers were the only way we were going to be able to reproduce anything. It was the only way we were going to be able to be multi-cloud and cloud-agnostic, all the vendor neutral terms.I stopped using the word “cloud” and started using the word “serverless” because I got tired of what cloud was meaning. It could mean VMware, GCP, or lots of other things.
I think containers are going to be the key to cloud-native computing. We predicted this and we have started seeing this. The new hires at TGen, the faculty members, kind of expect, “Like where’s your AWS account? How come I can’t just spin up my instance there? How do I get this chart?” We predicted that as users are using containers and being more cloud-native because they could be the long tail of HPC and not run on big HPC clusters, they just expect to be able to spin this up. The researchers are going to eventually start demanding it when they go to be recruited by an institute: “So how am I going to be able to compute here? Is it a traditional HPC or is everything in the cloud? My petabyte of data is out there. How am I going to keep doing my analysis that I got hired for? I’m not downloading it here. How do you help me get my stuff out there under whatever organization's account? If this organization does not have an account with this cloud provider, how do I make it happen?” I think that is something that we have just started to see the tip of the iceberg on. We expect it to happen because of container usage, conda usage, and what I like to call container or serverless-based computing.
Gregory Kurtzer:
I think that was a hundred percent accurate. The only thing I would add is: I think containers are really the necessary next step for any form of more modern High Performance Computing as we’re looking forward, whether you want to just run in the cloud or want to be able to move your workflows effectively.. Instead of rebuilding your whole software stack up in the cloud, it is so much easier and more convenient to containerize that software stack and now be able to run it portably anywhere.
Zane Hamilton:
Absolutely. As I’m hearing Glen talk, it reminds me very much of what Fuzzball was intended for. I started thinking about Fuzzball immediately. Thank you for joining us today. We are at the end of our time. This was very informative. Please like and subscribe and follow us every week. We look forward to talking to you again.