Announcement From CIQ Fuzzball Has Been Released!
Join us for a webinar on the revolution in High Performance Computing (HPC) with Fuzzball. This advanced solution streamlines complex computing tasks in the converging worlds of enterprise and HPC. Fuzzball is a turnkey, Kubernetes-based hybrid computing infrastructure stack that empowers users with more control, administrators with more reach, and organizations with more productivity. With Fuzzball, you can combine the best of cloud, hyperscale, and enterprise with traditional HPC, resulting in a lower barrier to entry and increased security, supply chain confidence, and scale. In this webinar, we’ll explore how Fuzzball can benefit traditional HPC and compute and data-driven enterprise computing, enabling you to easily tackle your most demanding computing challenges. Take advantage of this opportunity to learn about the future of High Performance Computing with Fuzzball.
Webinar Synopsis:
- What is Fuzzball
- Origins of Fuzzball and CIQ
- Fuzzball Demo
- QA
Speakers:
- Zane Hamilton, Sr. Vice President - Sales, CIQ
- Rose Stein, Solutions Engineer, CIQ
- Gregory Kurtzer, Founder of Rocky Linux, Singularity/Apptainer, Warewulf, CentOS, and CEO of CIQ
- Forrest Burt, High Performance Computing Systems Engineer, CIQ
Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.
Full Webinar Transcript:
Narrator:
Good morning, good afternoon, and good evening wherever you are. Thank you for joining. At CIQ we're focused on powering the next generation of software infrastructure, leveraging the capabilities of cloud, hyperscale and HPC. From research to the enterprise, our customers rely on us for the ultimate Rocky Linux, Warewulf, and Apptainers support escalation. We provide deep development capabilities and solutions, all delivered in the collaborative spirit of open source.
Zane Hamilton:
Welcome everyone back to another webinar. How are you, Rose?
Rose Stein:
I am amazing. I am so excited for today, Zane. This has been... I feel like we've been holding our breath for months, years now, and now it's just ready to release.
Zane Hamilton:
It is very exciting. It's been very hectic the last few days, getting everything ready and making sure that we're prepared. I'm excited to have everybody in to actually finally talk about the real release of Fuzzball.
Rose Stein:
I know, I know, but I have a little surprise and I want to show you now first before we let anybody else in the room. So hold on a second here, Zane. Hold on. Stay with me. Stay with me. It's going to be totally worth it. Here. Can you see me?
Zane Hamilton:
I see it.
Rose Stein:
Hey Fuzzball!
Zane Hamilton:
There you go. Oh, that's awesome. Very cool.
Rose Stein:
I'm ready. I'm ready. Release me.
Zane Hamilton:
Does your Fuzzball have a name?
Rose Stein:
What? No, this is Fuzzball. I am the Fuzzball. Queen Fuzzball? There you go. Queen,
Queen Fuzzball.
Zane Hamilton:
We'll take it. We'll take it. So, I know we have Forrest waiting in the back if we want to bring Forrest in.
Rose Stein:
Forrest!
Forrest Burt:
Hello everyone. Can you all hear me all right?
Zane Hamilton:
We can.
Forrest Burt:
Fantastic.
Zane Hamilton:
How's it going Forrest?
Forrest Burt:
I'm great, Zane. How are you today?
Zane Hamilton:
Doing well. It's been an exciting last few days for sure. I know you've been busy getting some stuff ready to show us, so I'm excited to see that here in a minute. But real quick, just because I feel like we've done this quite a few times, but if you would just give us a quick overview, what is Fuzzball?
What is Fuzzball [2:18]
Forrest Burt:
Absolutely. Fuzzball is CIQ's platform for the latest in high performance, performance intensive, and enterprise computing. Fuzzball is our unification of a lot of different spheres of the computing world right now into one platform that works for everyone's use cases around AI and machine learning, simulation, data analytics, more enterprise type workloads. Fuzzball is the next generation of computing and what we sometimes call HPC 2.0. So, it's very exciting.
Zane Hamilton:
That is very exciting. We do call it HPC 2.0, and I know we've said that a lot. We talk about it a lot, but what actually makes it the next generation? So why is it different?
Forrest Burt:
So, at its core, Fuzzball does away with some of the legacy paradigms and architectures that we've used in high performance computing for a long time and replaces those with a much more modern architecture that's based around the best practices that have been established by places like Cloud, Hyperscale, and Enterprise. So, Fuzzball essentially integrates, for example, tools like Kubernetes and CICD platforms, things like that, into a platform that is easy to deploy both on-prem or in the cloud. And allows you to write up, for example, workflows for your users. So they can codify the work that they do in computing and run that on the reconfigurable resources available through the Fuzzball platform. So that means AI, machine learning, data analytics, simulation, and CFD or finite element analysis. Fuzzball is for all use cases and, as I said, is the integration between Kubernetes and the world of high performance computing that the industry has been looking for for a long time.
Zane Hamilton:
And that's something I want to go back to in just a second. I want to talk about the Kubernetes specific piece, but I think we have Greg waiting in the back as well. Okay. There he is.
Gregory Kurtzer:
Hi everybody.
Zane Hamilton:
Good afternoon, Greg. Welcome.
Gregory Kurtzer:
It's still morning here, Zane
Zane Hamilton:
That's true. It is still morning. It's not for me, but it's for you. So Forrest was just telling us high level what Fuzzball is, but I want to take a step back and I want to ask you, Greg, why did you start building Fuzzball? What brought this to light? What made you want to go do this?
Origins of Fuzzball and CIQ [4:54]
Gregory Kurtzer:
So I spent a large, very large portion of my career working, doing high performance computing and people that have done that. No, I mean, we've been doing the same thing for the last 30-ish, almost 30 years now. Which isn't necessarily a bad thing. I mean, when something works, right, you don't, don't fix it if it ain't broke. But what we started to see, I think really ever since the advent of containerization in high performance computing, we started seeing a new way of doing things. And we started seeing more and different types of science and research and computing needs being thrown at these high performance computing resources. And as a result of that, this architecture that we've been using started to show its age a little bit, became a little, it was a little long in the tooth. And how do we start moving this towards a more modern infrastructure?
So that was one side of it. The other side of it was we started seeing more people in enterprise being interested in something called performance intensive computing. And when they're looking at performance intensive computing workloads, they're looking at things like AI, ML, and compute, and data-driven analytics and so on and so forth. And they're looking at this in terms of, well, what kind of an infrastructure do we need to run this on? And they immediately went to, well okay, let's look at the cloud infrastructure. Let's look at Kubernetes, let's look at what everything is offering, everything that's being offered out there today. But historically, enterprise hasn't had this sort of a need before. Most of the time they're doing services, they're doing infrastructure type requirements, which the application requirements of that infrastructure is actually very low. This is why virtualization took off so big in enterprise, because you can have hundreds of VMs running simultaneously on a single server. When you're talking about performance intensive computing you can't do that. You really need to be focused on how do you most efficiently and just how do you run your applications? And if you're trying to run a hundred at a time, you're not going to run them efficiently. So how do you start thinking about this in a different way? Now, I went to countless meetings at various enterprise organizations about four or five years ago, who basically outlined to me, they've got an HPC problem to solve. And as I looked at what they're trying to solve, the answer was clear to me, go build a Beowulf. This is what you need to build. And their response is, oh my gosh, I remember those architectures back when I was a student in the late eighties. Certainly there's a newer way of doing this that's more in alignment with how we're... What, you want us to use SSH? We're trying to stop using SSH.
Now you want our users running SSH and logging in, getting actual shells. They don't know how to use Linux. I mean, this was the kind of responses we would get. And so Enterprise really started going to the tools that they know. These are things like Kubernetes and Kubernetes is great for microservice. It's a fantastic microservice platform. I have yet to see an extremely compelling HPC use case using Kubernetes. It'll come, I'm sure. Someone... It'll get there. I don't believe it's there yet. And I believe that it's the expression putting a saddle on a cow, it's what it reminds me of. And I found some cool pictures and I actually asked some AIs to actually draw me some pictures of saddles on cows. It was pretty cool.
Rose Stein:
Where Greg, where is this?
Gregory Kurtzer:
Forrest, that could be one of our next webinars, maybe a demo. We did this in the past
a while ago.
Forrest Burt:
We'll see what we can put together.
Gregory Kurtzer:
But to me that's what doing high performance computing on Kubernetes is, it's just not designed for it. It's fantastic at what it's designed for, just that's not what it's designed for. So could you, could you do it? Maybe... I don't think it's been done yet though. So when we started looking at how do we modernize high performance computing, we really took the best of all the different industries, the best capabilities, the best technologies and ideologies and put them together into a unified solution that we call Fuzzball and Rose. I love the hat. That is so cool. Can you remember those in the eighties, the ones that are on the pencils and you can spin them real fast.
Zane Hamilton:
Absolutely!
Gregory Kurtzer:
Can you do that?
Rose Stein:
I actually have a turning chair, but I don't know if it's real fast.
Gregory Kurtzer:
That's awesome. That is awesome. So, we started pulling the best of all these different sectors of the, these different technologies within the ecosystems. And that's where we ended up with Fuzzball. It is a cloud native on-prem hybrid computing solution. The infrastructure of it sits on top of Kubernetes because that is a microservice platform, and Fuzzball orchestrate is a microservice solution. But when we do the compute, we do it outside of Kubernetes because we wanted to make sure that we're getting the most efficient use of that hardware. And Kubernetes hasn't proven to be extremely lightweight at this point. So it would typically get in the way of when really running a lot of performance intensive computing. And so we can build this whole thing up and literally create a service for performance intensive computing. And this isn't our idea. I've now seen Gardner and other analysts actually talking about performance intensive computing as a service or pick ass for short. And that's just awesome.
I'm sorry, whoever came up with that acronym really...
Forrest Burt:
I was wondering about that one guys.
Gregory Kurtzer:
PICAAS and I was like, you gotta be kidding me.
Rose Stein:
It's more like a Picass, like a Picasso.
Gregory Kurtzer:
Oh, Picasso.
Rose Stein:
In my mind. That's where I was going with it.
Gregory Kurtzer:
It just sounds like you're saying PICAAS really, really elegantly.
Forrest Burt:
I just like how the number of letters that you're allowed to put in front of the "As A Service" there is gradually growing as the years go by. It used to just be a software, now we're got a full three characters there, what next? Very cool.
Gregory Kurtzer:
So that's what Fuzzball really is, and that's one of the reasons why I started. I started thinking about this, and the last thing I'll just mention on this thread, and then I promise I'll shut up for at least a minute or two, is the topper on the cake. And some people may have seen some of my other talks and whatnot, know this story. But I was hearing enterprises say they need something more modern. They need something they can do more than just a traditional HPC. And honestly, I was still... I got my blinders on. This is the right solution. You just are all too cool for this. But, then a big, huge social media company called me in and said, we're going to be the biggest HPC system in the world.
Here's how we want to do it. And they described an architecture to me that literally just I've never heard of and I never considered. And it was so massive. It was so big that honestly, it just, and it was so cool that it really just put my brain into overdrive. And I joke around about going completely OCD thinking about how to actually solve this problem. So, I started thinking about this social media company. We even talked about me working there and me helping them build this, but I actually thought it would be much more interesting to go and build this as a company and offer this back to the world as a solution. And that's what we did. And thus, CIQ was born.
Rose Stein:
Hmm. Thank you, Greg. Thank you for that. I think all of us working at CIQ are happy that it exists and it is happening, so appreciate that.
Zane Hamilton:
Absolutely. And it looks like we have somebody from Mexico City watching Linear. Appreciate it. Thank you very much for watching and welcome. You both have mentioned Kubernetes in this, and I think that was an important distinction. I thank you for diving into that, Greg, and calling out where Kubernetes sits in this and that it's not running the actual compute, it's running the orchestration piece of this. And it's actually, once you get off to do the compute, it's different. So thank you for clarifying that. Forrest, I know you brought some things for us to look at. I would love for you to show us. I feel like we've done some of these over time, but the catalog just keeps getting bigger and bigger of the stuff that you have to show.
Forrest Burt:
It does. Give me just a moment.
Gregory Kurtzer:
So while Forrest is bringing this up, and Forrest, just talk over me as soon as you're ready, but I want to stress. This is not just yet another Beowulf. We have completely rearchitected the structure of a high performance computing system and created what we jokingly call HPC 2.0. This really is an entirely new generation of how to think about workload management, workflows, how to think about orchestration and potentially even orchestration at a much wider scale than what we've been thinking about it. So I'm going to be quiet now that Forrest is ready.
Fuzzball Demo [14:55]
Forrest Burt:
All right everyone. So this is the main Fuzzball graphical user interface. So we're just
going to go through a couple of quick demos of what some workflows look like in this. One of the big, as we've discussed, concepts in Fuzzball is the ability of users to be able to write workflows that codify end-to-end their computational pipelines that they can then go and take and run through Fuzzball on whatever heterogeneous architectures they happen to have for their underlying compute resources. So this is the main Fuzzball GUI. This is what a user in Fuzzball would see when they hop onto the system. I've gone and done standard Google SSO login flow before this. So just logging in with my just standard company email and everything. We have a couple of different things in this that are worth showing off real quick.
These definitions over here are compute definitions that we have available for the Fuzzball cluster. What we're looking at here is a deployment of Fuzzball running up on AWS EKS. So, obviously most cloud providers have some type of dedicated Kubernetes service. That one's AWS's. When we deploy Fuzzball out onto that, we can leverage obviously those engines that are out there on those public clouds. So this is based on AWS EKS. The instances that we're going to be provisioning in this demo are different types of AWS instances that we've set up. So for example, we can use some large CPU only instances here, that are these c5n.9x large types. We have enabled EFA placement groups, stuff like that for optimal MPI on this type of thing.
We can also do GPU based instances. So in this case, we've got a P3.2x large and so on and so forth. Over here we can define secrets and users. Those I'm going to more so get to the workflow side of things, but this is secrets that you can define within the cluster that essentially allow you to template out on the server side, different credentials, things like that so you don't have to keep those in your workflows. So, it adds security. And then users is just essentially the list of users on the cluster members, owners, that type of thing. Let's see here. So I'm going to go ahead and run a couple workflows. This is what they look like in the Fuzzball system. You can see I have my whole list of workflows that I previously ran here.
But we're going to start here in the workflow editor and just go ahead and open a few different use cases, run them and we'll see how it goes. The first one that we're going to look at is a pretty simple, pretty basic one. This one is GrowMax computational molecular dynamics done for pharmaceutical drug discovery, all kinds of different things. In Fuzzball, as I said, users can codify their workflows from end to end. So data movements, image polls, things like that. Because Fuzzball is based on Kubernetes, everything is done out of container. So each one of these jobs is a separate container image. And Fuzzball will end-to-end orchestrate the execution of this workflow on those resources that we saw defined over here based on what I've told this workflow that it needs to go out and run on.
So first off, we can define the concept of a data volume in Fuzzball. So in this case, we have the volume here that's going to reach out to this link on the internet and pull down this file, drop it into our data volume at the top level directory with that name. And jobs will then be able to mount that volume. You can see the volume name at that path so that, for example, in this first job, we can untar that tarball that we pulled down over here in the ingress. That's got our benchmark data in it. So that's data movement in. You can see we have all these jobs that do different things. This one's pretty simple, just an untar job. You can see that we're pulling from a container registry here. This is one that we use for different testing, different kinds of workflows, that type of thing.
So in this case, we have this container, Gromax C3R, with this tag container stable on it. We're going to reach out to it and use essentially a service account type tooling to get into it. One big thing about Fuzzball is how well it integrates with CICD systems. So you can see that we can use secrets and credentials, service accounts, that type of thing to make it very seamless for, in this case, me to just utilize the secret and be able to pull that container into all these workflows as I need or into all these workflow jobs as I need. Over here we have resources. So one core, one GB of memory, this one's a pretty basic job. If we want to look at one that's a little bit more complex. This one right here is doing about the same thing as this previous one, mounting that volume, pulling the same container, but we're going to use two cores, 14 GB of memory and one NVIDIA GPU.
So, this is going to map to g4dn.x large instance on AWS. So a simple GPU instance, but we got the idea that we're running on a GPU. Fuzzball also supports common paradigms in HPC like multi-node workflows through MPI or task arrays. Also called embarrassingly parallel workflows. In this case, we're running an MPI workflow through open MPI. We're going to use two nodes. And this is the actual run benchmark step of this. So, we're going to go ahead and pull basically the same resources that we pulled for this, but we're going to get two nodes, each of these resources on it. Fuzzball is going to wire those together up in the cloud and then allow this compute work to proceed on those instances.
As noted, the Fuzzball Orchestrate stack is run as microservices through Kubernetes. So, we're not actually doing the batch compute work inside of Kubernetes pods here, which has significant performance benefits and is one of the big, best practices that Fuzzball leverages for how it uses Kubernetes. And then this final job is just catting the logs. So we'll go ahead and run this. We can add a name if you want, demo. Go ahead and start this workflow. And you'll see that we have a success, workflow started successfully. We can go to the status of that. And this is what the actual workflow interface looks like as a workflow is running. So you can see we have all the steps. This is our definition what that looked like in YAML. Fuzzball workflows can be written in YAML as well, if you prefer to just type them out as opposed to using the interactive editor.
So, all of that, everything that we just saw in that graphical interface is encoded within this YAML definition here as well. And you can see that we're executing on the different parts of that workflow. We're creating the volume, we're pulling the image and we're starting on a file transfer from the internet. So here in a little bit these will start to run while we're waiting. And we're actually going to let this untar one run really quickly so we can see that go on. Then we're going to, we can talk in the background if we need to while these run, but I'm going to kick off a few more and we're going to just check back into them throughout the webinar here. So you can see the untar started. This is essentially reached out to AWS with those resource requirements that I gave it that one core and one GB memory.
It's spun up a very minimal instance for that and has landed the container on that, landed the command, all that on it, run it. And you, there's no logs from tar, but we now have that data in there, un-tarred. In just a moment on one of these longer running jobs, I'll grab a terminal into one of these and we'll be able to actually look around, see that data, see what's up. But that untar one finishes very quickly. So once that comes up, we'll take a look in there. But in the meantime, we're going to go ahead and kick off a couple more of these really quickly. Let us take a look at a little bit of sequencing. So I said Fuzzball works for a lot of different workflows. This is some sequencing, genomic sequencing using a common HTC star SAM tools type pipeline in genomics.
So, this is doing a lot of the similar stuff that we just saw in that last one. But this introduces the concept of S3 ingress in Fuzzball. So Fuzzball can reach out to any S3 API compliant object storage and be able to pull down data from that. So in this case, you can see I've got my key and my key ID here that I've generated from AWS's authentication type panel. I've got my region, and then if I wanted to use a different, this by default targets AWS S3, but if you wanted to use a different S3 compliant object store gen point, you could put it in there. There's a couple different services, different things that offer that. So it's compatible there. And you can see we're doing two ingresses.
One is a file of FAST Q input files. So little genomic reads like base paradero and then this is a genome that we're pulling down to compare those two. You'll notice we also have an egress here. An egress essentially allows us to move data back out of this job once it's done. So we have two egresses here. One is going to reach into this data volume and pull out the file at results/output and upload it to our bucket, into this bucket name. And then those kinds of sub directories in it. We have another one that we're also moving, these HTC count.txt, and then HTC count-demo1.txt. So we've got obviously a couple different egresses going on there. So a couple different results files we want to preserve from this workflow.
We're pulling down, once again, another container from our registry. It's worth noting that Fuzzball supports not only just private registries obviously, but any public registry, and for that matter, any private registry. So you can pull down from Docker hub, you can pull down from the NVIDIA NGC any number of given private registry solutions, et cetera, et cetera. I'll show you guys that in a bit. The rest of this workflow is what we've seen before, different commands. In this case. We're running some python scripts inside of this to do a little bit of multi-processing. And at the end, we're going to wrap up our results into a table. So go ahead and start this. Let's see here. We'll check back in on GrowMax really quickly. Oh, and we can see that this, our second job here, prepare benchmark data is finished.
In Fuzzball, you can set up, obviously as you saw here, this directed a cyclic graph of job dependencies. A better one shows off a little bit more of a complex field here. But in Fuzzball you can set up, for example, this requires field. So this job will only run after this one is done, and you can set up pipeline or workflows of execution there. If we go into this run benchmark, we can see that this is printing us out some logs. But furthermore, we can get a terminal directly into the instance running this. So if we go ahead and do say NVIDIA SMI because we're using a GPU you can see that we've got our NVIDIA SMI output here. We've got 34% GP usage, 343 million bytes of memory.
We don't see the exact process name because of how name spacing works between containers and the host in this case. But it is indeed running. So we've got NVIDIA SMI we can look at, we can check top. So for example, we've got that GMX MPI running on the cores of this instance that we're on. So you can see GMX MPI going there. We can also do things like LS/data because we attach that data volume to this workflow at /date or just workflow job run benchmark /data. So we can do LS/data, and you'll see that tarball and the unpacked tarball that we set up in some of our previous jobs. And so that runs, we can do our logs, that's interesting.
You can see those are running there. We can go back in here and look at this going. Fuzzball has support for not only multi-node paradigms in MPI, but also PGAS networks through GasNet. So for example, you can run the chapel programming language on Fuzzball, and do workflows with that type of thing. Just to point out, if you want to, for example, edit the YAML directly in your workflow screen, you can do so and you can directly type into that if you want to edit something there. What was that that I heard earlier about cows on a saddle or something like, or saddle on a cow, something like that.
Zane Hamilton:
Saddle on a cow.
Forrest Burt:
Let's see here. Fuzzball is obviously for all use cases, including the latest in generative AI, data analytics, all that type of thing. So let's see here...
A saddle on a cow oil painting... Somebody was right. We have done this similarly before, but we did it with Dall-E Mini back in the day. So this is genuine stable diffusion running on Fuzzball. Here in just a little bit, we'll be able to see a little bit more about this. But for example, we're pulling down our checkpoint file here for our weights test image. And then once we're done with this, we're going to upload our generated images back to our S3 bucket that we've been working with before. And once our results are in these we'll be able to go look at those through that. So, first of all, obviously for all different types of use cases.
It's great for CFD, that type of thing. This is a little bit more complex of a workflow, but I believe we just put a video out on our YouTube channel today that shows this workflow running in action and how the results and stuff come out of that. I'll go ahead and fire it off here and we'll see if it gets done by the end and we can see it upload live. But we've got, like I said, a great video on our YouTube of this workflow specifically going right now. We see that our chapel hello world is finished because it's pretty quick. And we have all of our logs from our four locales because I believe this is a four node workflow.
So you can see we have all of our logs, for example, there.
As I said, it also supports embarrassingly parallel workflows. I am not a Dask programmer, so this is a little bit of a rough script. But we should get results and everything here as expected. We can obviously do task arrays and things like that, embarrassingly parallel workflows and Fuzzball. So if you've got 10,000 input files and you want to be able to process them all at once, we can set this up. So we have 12 jobs that are going to be done here on three instances at once. So we can tell Fuzzball to spin up three instances matching these resource specifications here. And then we'll start iterating three spells. This script uses an FB task ID here sort of like what you might expect with swarm or something similar to map these tasks around. But in this case we'll spin up three instances matching these resource requirements and then we'll start executing called tasks on them. So we'll go ahead and start this up.
So, these are going and we are just waiting at this point for these to finish. Some of these will take a little bit but actually I do have one more. So, as I mentioned in Fuzzball you can do different types of workflows that maybe are a little bit more complex than just these straight line ones that I've seen. And Fuzzball are tight on a technical level, workflows and Fuzzball are directed as cyclic graphs. So they're essentially a graph that has no cycles and has only one way forward. You can't go backwards basically. And so with that, you can solidly create, for example, in this case, this first job, create directories, will run. Once that's done, you'll notice that this retrieve query sequence and retrieve database sequences both depend upon this create directories one.
So both of those jobs will be spawned once this one is done. And we'll start running concurrently. Once both of those are done, make blast database will start and you'll see that it depends on both of those. And then run blast at the very end of this. Just depends on that. So you can imagine this being a little bit more complex. I don't think I have mine much more complex when I'm hand, unfortunately, but you can imagine this being much more complex. You can extend this out at infinium to create arbitrarily complex workflow graphs of whatever complex data processing with tons of different steps that you're having to do normally on your HPC resources.
So, we can see if this GrowMax is finished here. So if we go back and look through this, we can see this is all finished execution. We've got our logs here. I always get a little nervous because GrowMax has some cheeky little quote, but it shares the... I always get concerned it's going to be a little bit too much, but that I don't want to get rid of because it seems it's the GrowMax community values seeing the quote. So I keep it around. But, so you can see we have our GrowMax results. We've got our quote, we've got our nanoseconds per day performance. If we cap our full log file you can see that we've got everything here showing RCI instructions, GPU support as I scroll through here, running on two nodes with a total four cores, four processing units, two compatible GPUs.
So just our standard GrowMax results that we would expect. We then get down here to the bottom our mega flaps accounting, load balancing, and then obviously there's our results. So if we wanted to preserve this file, we could upload out to S3. I don't in this case just because it's not terribly useful. But these are running and, we can go back and view previous workflows. So these are all, you'll notice I've been running these for a little bit making sure things are in order. But if we go look at for example any one of these past ones, actually let me scroll through these really quickly. Where's that one? Let's see. You can see what the interface looks like for finished, failed or canceled workflows. A couple days ago I was working on getting the HPL linpack benchmark working on Fuzzball.
So, for example I can go back and retrieve this workflow that I ran a couple of days ago. This was 5/16, so two days ago. I can go retrieve my logs from it. I can do whatever I want here. I can go ahead and rerun this directly from that. And if we go back to our workflows thing, you can see we now have that workflow running here again. So, that's Fuzzball. Oh, there is one more thing we can look at. Or another thing here, Fuzzball isn't just for batch computing, it's also for interactive computing. So in this case we're going to do a simple Jupyter Notebook type thing. So basically bringing up a Jupyter Notebook, being able to connect to it from our terminal here. Normally as we've discussed, you'd have to SSH into a cluster. This is a little bit of a headache of a process to get connected to one of these.
Connecting to Jupyter Notebook
You'll see how in Fuzzball I'm able to port forward this Jupyter Notebook server running out on cloud resources directly back to my user terminal here. And how I'm able to just interact with that without having to directly SSH to the cluster just through my own web browser. So in Fuzzball to support that, we have the concept of an isolated network namespace. So I can tick this back and forth and it'll create essentially a network for this container that I can connect to. And then we're just going to port forward this port back to my local. We're running this on a GPU. We're going to see a few AI use cases in this. So we'll go ahead and run that. And just to point out, I don't think I made it explicit, this blast one, which we can now see as finished and we can the logs from that one box from that one, which one has logs?
There we go. So we can see the logs from our blast workflow here. This is obviously NCBI Blast doing genomic mapping. And just to point out, the container we're using in this case is directly something pulled Docker. So this is just NCBI blast directly from there. We can also do things, this is lamps. I believe it was pulled directly from the NVIDIA NGC as well. We can go ahead, probably run this, I might hold off on this one for a moment just until these finish. So, our star sequencing is done. We'll go look at that in a moment. I'm going to fire this one off.
So, you can see that we're pulling, like I said in this case, a docker image down from the NVIDIA NGC. Then we're going to land this just on a simple CPU node because we're just running this on CPU. Our GrowMax one was on GPU, but this one just to show you some CPU only stuff as I noted just to point this out, we have this star sequencing that's done. Just lost my windows. There we go. And as I mentioned, this one has some output that comes from it. Let's see here. Hold on one moment. I pull up a different bucket and don't have too much.
Gregory Kurtzer:
Now that we're on camera again, I was reminded to sit up straight and look professional.
Forrest Burt:
Let's see here.
Rose Stein:
Surprise, you're live.
Forrest Burt:
My apologies on that transition. Here we go. Let's go back to this.
Gregory Kurtzer:
That has gotta be the next conference, like swag or something we give out.
Forrest Burt:
I like that idea.
Rose Stein:
I agree. I don't think Robert would agree, but I agree.
Gregory Kurtzer:
I would just, I just want to see Robert wearing that.
Rose Stein:
Oh, me too. That would be great. I'll send him one.
Forrest Burt:
Awesome guys. So here on the interface, like I said, I have to open up an additional directory in the same bucket that we're in here. You can see that we've got, for example, this HTC count-demo.txt and star-results-demo.txt here that have been uploaded here today. I don't think there's anything particularly interesting in this file, but you can see this is our HTC results here. So you can see that from where it is at from the star sequencing here, this egress that we had that's represented here, but that we defined in this volume right here. We've gone ahead and to this CO CIQ missed support demo output bucket uploaded those couple of files, the HTC count demo one, and then star results demo one. So looking at these, we can see these are all starting to finish out.
This task array one, I believe I missed the tasks themselves because I was off explaining other things. But in this case, this is a FinTech type thing. This is running some wrapper scripts. They're just doing some number munging. It spins up, like I said, three different instances. It lands a Python multiprocessing library on them called Dask. And then starts running these different tasks that use this Python script that utilizes Dask to do some multi-processing on that node. So we've got our three nodes. These, we might just rerun this one in the background so I can show this sequentially coming out here in a moment. But you can see we have all these tasks that respond from this. 12. Each one of these has its own set of logs and everything like that.
We'll grab one of these and you can see that each one of these is just grabbing a file with 20,000 numbers in it, doing some averaging. But you can imagine, some FinTech, Black-Scholes numbers stuff like that. Analyzing say a hundred, 200 stocks for back testing all in, embarrassingly parallel format. Or you've got 20,000 input files from your space telescope and you need to process them all sequentially. This workflow right here is a fairly small task array. So it's only got three node or nodes basically that are going to run on it. But you can scale this up to a hundred, a thousand, 10,000 and if you've got a million files, you can have 10,000 tasks running at once. It's fairly extensible and as said, built for scale.
So, we can see that HBL is finished. So we've got our linear algebra system results here. And obviously as I mentioned, I ran this workflow for the first time a couple of days ago and I was able to just rerun it. And now we've got it completed again today here lamps, CPU NGC, this is completed. We pull down an input file that's just a common, standard hello world input file for lamps. And you can see we've got, let's do all of our logs here. I'm not running on a GP enabled node, so it complains a little bit about not finding QA, but you can see we have all of our results. CPU usage et cetera, NPI tasks. This is just running on some smaller instances, not DFA enabled ones.
So, we see that we've got just 32 cores there. But, if we wanted to scale this up we could run this out on EFA enabled instances. We can run this out on, the best essentially and best practices that there are for MPI on the cloud and AWS. This right here, we should jump into really quickly. This is the Jupyter Notebook workflow that we've been working on. So you can see this is started and we've got this link here, but if I go, you can see that if I go to my logs here and I try to open this, I get this problem loading page. If I stop sharing my screen for a second, I will give you all a quick view of what the other way you can use Fuzzball looks like. So one of the things about Fuzzball is that it's entirely API driven. Obviously this gives us the great benefits around not SSH-ing that type of thing. But it also being API driven means that we can pretty easily wrap most of the actions that a user can take versus a cluster and Fuzzball. We can wrap those in essentially whatever wrapper we want. So we've got our graphical user interface, but we also have this CLI that allows you to do all of the same things that you just saw me do on our GUI, but through a graph or through a CLI based format. So I go ahead and do Fuzzball context login. This allows me to log into the cluster context. This is going to open... Sorry, I'm trying to find my stringy yard really quickly. There it is. So this opens when we control click that I get this right here. Where's that?
Zane Hamilton:
So this also goes back to a question that George asked earlier saying that the GUI looks amazing, but do I have to use it? And I think it's what you're alluding to now Forrest.
Forrest Burt:
Oh, I see what's going on here. My stream yard is on the same window. Sorry, here we go.
Zane Hamilton:
Just don't hit back. To answer your question, George, the GUI is amazing and it's been fun to watch. People who typically don't use GUIs actually start using the GUI, but you do not have to use the GUI to do this. You can actually do this all via command line like Forrest was talking about and he alluded to very shortly there. You can do everything via command line if you'd like.
Forrest Burt:
Where did this window go? I don't think stream yard is giving me the option to open the window that I'm expecting it to give me to open here. So I think I'm just going to move on. I'm not sure why that's doing that. But here in just a moment I'll go ahead and do this SSO flow. And I'm able to, this will complete. There we go. Okay. So like I said, that opens up. You control click this, it opens up just the standard Google sso, select what Google account you want to log in with. Once we're in here, we can do Fuzzball workflow list, for example tail 10. And here are all the workflows that we've just been running as a part of this webinar. So there's star sequencing, chapel, hello world, the stable diffusion one, ooh, which is finished. Open phone, ParaView, there's everything we've been running. We can go ahead and do a Fuzzball, let's see, workflow port forward. Grab the idea of the workflow from here and then the name of the job and the workflow they want to connect to in the ports. So you can see we've now got this listening on our remote. I'll go ahead and stop the screen here, open up this.
And where were we at? So, back in this window here, you can see we had a problem loading page. But once I, yes, I am trying the right window. Cool. Once I go ahead and refresh, you can see that we're connected in here as we would expect. So this is, as I said, a Jupyter Notebook instance running out on some cloud resources on AWS. We've got workspace that we can go into here with some notebooks. So this is going to, for example, this is just AI training with Py Torch. We can go ahead and run through these. We're going to train a simple model on the Cyfar 10 data set. So we've got our data set downloaded and extracted. Go ahead and show a couple of the images from that data set. So you've got a plane, bird, dog, bird. Go ahead and set up the code for the model itself and then set up a little bit more code. And then this right here actually starts training this model. So, if we go over here, as we might want to in our Jupyter Notebook, we can do a quick NVIDIA SMI and we can see 28% GPU usage, 947 GB of memory. Again, because of name spacing, we don't see the exact process, but we can do top and see Python three running there. We can go back over here. We can see our loss values starting to be printed from this training. If we go over here and refresh this.
Actually here, this is what I want. We can also access that same information through here. So you can see we've got that Python three there and video SMI. We can LS, let's see, /workspace and you dot, there are notebooks and everything. As I just showed in the other window. This will finish training here in a little bit. And you can see we've got our 302 and stuff like that coming here from the output of this. So this right there, some of these different bits of information being captured in the logs. And, once this is trained we can pretty easily just go right over here, download this directly to our laptop. Once we save the CyFar network .pytorch file we'll be able to really easily, let's see. You might be able to see this, but if I go back over here, you can see we have CyFar net.PTH and go ahead and download. And I get my CyFar net file. I'm not sure if you can see that pop up, but I can go ahead and download it. That goes right to my local.
And, this will stay alive for as long as we set the timeout value right here too. So this will stay alive for 30 minutes before it'll close. We can go back over here and check our, oh, this one right here. So Dask, oh, I missed it again, my apologies. Well, you can see our last task here running on this Dask one. So that normally this would spin up and you can see them in action, but I've missed that. Anyway, you can see our last task running there. You can see we've also got, for example, stable diffusion is finished. We've gone ahead and uploaded our images. If we go into, I believe this bucket right here, I should be able to refresh, and then we have our blast results from that blast workflow that we ran earlier. And we've got our stable diffusion images. So I'll go ahead and download these. And, you can see the only thing that's still running is the AlexNet notebook because that is set to run just indefinitely for about a half hour. But that everything else we ran here is finished correctly. Dask is once again finished. Lamps is finished. HPL blast. Oh, open phone ParaView also finished. Awesome. So this is a CFD one. We're doing a whole bunch of different CFD tasks. This, for example, has been uploaded to this S3 bucket, our final photo result.
I can just open it. There we go. So there's for example, our crash test image result, you can see a video of this as well and our YouTube channel. So there's that. I believe our stable diffusion images have downloaded. So let me pre-screen those really quickly. I don't see many saddles, but we do indeed have our results as expected. So some cows there, oil painting like. Okay, cool. That's about what I have for the demo here today. That is a fairly comprehensive view of what you can do with the Fuzzball system. We've got a lot of different stuff that we're working on. But this is Fuzzball, so excellent. Thank you all for watching. My apologies if that was a little long-winded.
Zane Hamilton:
Oh, it's good. It's good.
Forrest Burt:
Those are the facets.
Gregory Kurtzer:
It's the fastest full training of Fuzzball I've ever seen.
Forrest Burt:
Thank you. That's the long and the short of it right there everyone.
Zane Hamilton:
Excellent. Thank you Forrest. We do have several questions to get to. Throwing those up. So Nicholas asked if Kubernetes doesn't do the compute, why have the overhead of Kubernetes at all instead of just traditional RPM dev installed services? It's a great question. Thank you Nicholas, for that.
QA [51:27]
Gregory Kurtzer:
So Fuzzball is built out of, right now, two pieces and there's a third one coming. The base piece is called substrate. This is what runs on the compute cluster itself. Fuzzball Orchestrate is what orchestrates all of the different substrate instances to do all the different things that you saw Forrest just demonstrate. Everything from ingress of data, managing the workflows, and when I say the workflows, it's much more than just what you have in your YAML file. There's a lot of context in workflow that Fuzzball has to manage. Again, all the ingress, the egress, the volume set up, setting up containers, downloading containers, building the images for those containers, persisting and storing those containers, just to say a few, there's a lot of things that it has to do. Fuzzball itself, or Fuzzball orchestrate, of as itself runs as a microservice platform.
So there's a lot of different pieces to it, and they all fit together via microservices. So being that Fuzzball Orchestrate is a microservice or a cloud native platform, it wants to run on some sort of microservice solution. And Kubernetes is just the one that most people are very comfortable with and very familiar with at this point. Also many of the clouds will offer Kubernetes services. So it's very easy to stand up Fuzzball Orchestrate on top of these various services. So once you have this Fuzzball service running on top of Kubernetes, you can now start running workloads. So one other piece as well is, I believe Forrest, you ran all of these different workflows in AWS. In AWS you can think of it as it scales as needed, right?
And it'll bring, so every time Forrest would run jobs it would actually spin up the appropriate number of instances, run those on the right type of instances and then tear them down when it's all done. If you're running this on-prem, you have a finite number of compute resources. So it will appropriately schedule. And we didn't talk about the scheduling features, but it will be able to schedule all of those workflows appropriately for running on an on-prem solution. Which again, the scheduling parameters of that is very different than scheduling in the cloud. Hopefully I answered that. Fuzzball Substrate is a traditional RPM dev installed service. Fuzzball orchestrate is just a lot more complicated.
Rose Stein:
I like that. It's just a lot more complicated. We'll get into that later.
Gregory Kurtzer:
It depends.
Rose Stein:
True. James, thank you for posting a question here. And he says, are you showing this in the cloud today? And you mentioned hybrid cloud. Will this be a true hybrid cloud platform?
Gregory Kurtzer:
So I just mentioned that there's two pieces of Fuzzball that we're showing off today. And there's a third. So there's substrate at the bottom, Fuzzball Orchestrate, which orchestrates all the different substrates. And then there's Fuzzball Federate, which sits on top of Orchestrate. Because, as Forrest mentioned, there's no SSH involved in this. The entire thing is API driven. It makes it very easy now for us to extend Fuzzball clusters and unite them. So to the question, yes, we can actually, absolutely, you can run a Fuzzball orchestrate engine instance and AWS in one availability zone, a Fuzzball orchestrate instance in a second availability zone, Fuzzball orchestrate instance in a different cloud and OnPrem, and then federate them all together. And when you do that level of federation, Fuzzball or Fuzzball Federate is going to be making meta orchestration decisions based on things like resource availability, cost, how much is it cost to run?
So obviously if you've got on-prem resources, it may be much cheaper to run than having to go spin up cloud instances. And then lastly, data, where's that data? And when you look at all of these together, you actually get some really interesting possibilities in terms of scheduling. So as an example, well, it's cheaper to run on-prem, but my data's up in S3, right? Now you just have to start making decisions on when is that, when and how is that job going to run? And those decisions are tuned via policies by the organization. So the organization will control that. Today, we are releasing Fuzzball Substrate and Fuzzball Orchestrate. Fuzzball Federate, keep an eye out. It's coming soon.
Rose Stein:
I like that. That's a good tease. Oh, another question from James. Awesome. Does Fuzzball know what resources you have available or are you guessing? Or what if you say, I want this resource and that one, like you were doing Forrest but then that resource actually doesn't exist or it's being used?
Gregory Kurtzer:
Forrest, you want to grab that one?
Forrest Burt:
So Fuzzball does know what resources you have available in the cloud. As I believe I showed at the very start of that, there's definitions that you can set that map to, for example, different instance types out on the cloud provider. So you can set up different compute node definitions, for example, that mapped to different AWS instance types, like their P3 GPU series, their C5N CPU series. You can, like I said, essentially map those to different instance types on-prem. Fuzzball becomes aware of what resources it has as a part of the bootstrapping and setup process with orchestrate and substrate there. So, it is able to programmatically determine what it's aware of, what's out there, what resources are on those nodes, et cetera.
If you say, I want this, but that resource doesn't exist in the cloud, you will essentially just get a workflow failure that says no sufficient provision definitions exist, and you'll have to head to your CIS admin and get a definition put to that. So that's just if your system doesn't have a matching resource defined in it, if it does have a matching resource defined it's, as Greg mentioned, going to just reach right out to the cloud provider provision that instance type live and then route it back to the workflow to use the compute resources for that. On-prem, it essentially does the same thing. It takes some of the compute resources that are available and assigns them to the workflow. But, Fuzzball is aware of what you have available. You can define that. You can also... It discovers some of that on its own. And then if you say, I want this, and it doesn't exist, but it knows how to find it, it'll go spin it up or get some of it. But if it doesn't exist, then you don't have it defined, you just get an error as you would expect. Get some different nodes working.
Zane Hamilton:
Thank you, Forrest. So I think that leads into the next question from Sylvie. What is the overhead on this, since you're actually putting something on each compute node? What is the overhead like?
Gregory Kurtzer:
So this goes back to, oh I guess Forrest and I are tag teaming. I'll take this one. Jump in if there's anything else you want to add. It goes back to one of the earlier questions about Kubernetes versus why use Kubernetes and versus not. So a lot of people that are trying to solve these problems are actually trying to go the other direction where there's basically saying, we know what Kubernetes is,we know how that operates. Let's just go run everything in Kubernetes. A lot of the feedback we got is Kubernetes is just way too much overhead, and you're not getting a lot of the performance out of the underlying resource due to Kubernetes. Just honestly just getting in the way. It's just too big. So we spent a lot of effort, and this is why Fuzzball Orchestrate sits on Kubernetes. While Fuzzball Substrate does not, Fuzzball Substrate takes a lot of the experience that we glean from open source applications like Singularity and Container and knows how to run these containers and know how to run this infrastructure in a way that gets out of the way and gives you direct access to that underlying resource and is the most efficient way, as absolutely possible. So we have Benchmark Fuzzball in a variety of different ways at this point, and we are always getting bare metal speed.
Forrest Burt:
Really quickly, I know we're very close to time, but just to elaborate, just put some numbers on performance and overhead. We've run benchmarks on some Fuzzball clusters like HPL AI. We've found that we've got the numbers out of those that we've been told by different manufacturers, that type of thing in that case that we should get out of that benchmark. So we've seen that Fuzzball provably doesn't add overhead to your HPC. We've also run up in the cloud comparing doing benchmarks around GrowMax comparing to some of the numbers. AWS is published and we are dead on to what's out there. So this is, as we've noted, this adds no performance overhead and, like a more standard Kubernetes installation does. And we've got numbers that we've had to back that our overhead is minimal.
Rose Stein:
Appreciate that. Thank you Forrest. Oh, Ian, what's up Ian, thanks for watching. Okay. So I know that we are at time of, where, how we usually do about an hour, but this is really exciting. This is our release that we've been working on for years, and so we're going to stay on and answer a few more questions. Okay. Ian, for those who are familiar with workflow solutions for the life sciences, for example Cromwell, I'm not going to read that there. Is there any possibility of a brief comparison or contrast to better appreciate Fuzzball other than just the name? Thank you Ian, it is a cool name.
Gregory Kurtzer:
Forrest. You want to take it or you want me to?
Forrest Burt:
My understanding around DAT K and Next Flow, things like that, are that those are mostly workflow description languages that give you a very robust way to codify a workflow. But Fuzzball includes a massive software platform backend that actually goes ahead and doesn't just provide a workflow engine, which is one of the microservices that's a part of the Fuzzball Orchestrate stock. There is a workflow engine that takes in users workflows, parses them out, sends the information to the different parts of the other microservices that it needs to be in. So, overall I would say the biggest difference is that Fuzzball versus like Next Flow, that type of thing. Fuzzball is not just a workflow description language, it's an entire Kubernetes based platform that you can deploy itself as a cluster, not just take workflows and run it versus existing resources. Fuzzball is its own entire computing platform, that a workflow engine like that is just one component of.
Gregory Kurtzer:
The other thing I would just mention as well is there's nothing stopping you. As a matter of fact, it's encouraged to use those workflow managers inside of Fuzzball. So, you can absolutely use that. You can provision out your resources with Fuzzball and then leverage all of those same tooling that you're already familiar with.
Forrest Burt:
I've done things, for example, taking existing Next Flow workflows and converted them fairly simply over into Fuzzball workflow formats as well. So, it's all translatable fairly easily.
Gregory Kurtzer:
We can also, in theory, Next Flow is a little specific as well because it can actually spin up cloud resources and interact with clouds. So we've actually brainstormed in terms of, can we leverage Next Flow as a Fuzzball client? And so that may also come in the future.
Zane Hamilton:
Very interesting.
Gregory Kurtzer:
In theory. We haven't actually done it yet, so theoretically it should work. By the way, hey,
Ann, great to see you.
Zane Hamilton:
So here's a good question that Nicholas asks. Will Fuzzball be free open source with the option for paid support Slur? We get this one a lot. I'll let Greg answer that one.
Gregory Kurtzer:
So, it is our intention to open source Fuzzball, but we're not going to open source it on day one. We put a lot of research into this, a lot of work into this. And to be blunt, some of the feedback that we got from some of the other companies in HPC, is we're looking forward for you to open source this so we can go and run this and sell it to our customers again with not including us. So we are not open sourcing it simply because we want to encourage good behavior and good partnerships with our wonderful partners. So, that's our initial goal. As a purchaser of this, we've got no issue whatsoever with providing source code. If that's something that you're interested in I'd be happy to provide source code licenses.
Zane Hamilton:
Thank you, Greg.
Rose Stein:
Ian's back. Okay. Enough of that. Thank you. I'm getting a great sense from the excellent UI. Yes, it is, but I am gapping on the execution and overall architecture other than use of containers from any repo.
Gregory Kurtzer:
There, there's a lot in terms of the amount of architecture that we built into this. So again, it starts at substrate working up from there, substrate is a container runtime. Even though we took and leveraged a lot of the knowledge we had from Singularity and Container we didn't use any of that code. Actually, it is a complete new container runtime specifically about and leveraging APIs to control it completely. So it is an API based container runtime that API uses something around the idea of leasing. So you can say you can make a request to substrate. I'm interested in this, eight GPUs, 24 cores, this much memory and so on, this architecture and so on and so forth. And substrate will say, I have that available. Here's a lease I can provide, and so on and so forth.
Now, orchestrate will manage all of the substrate instances for you. So as workflows come in, orchestrate will know what each one of the substrate instances are doing. But at the end of the day, each substrate, instance, itself is the source of truth. So it will confirm and check, and then it builds a graph of all of the different places that all these different workflows can run. It breaks apart the workflows and then runs everything on that platform. Again, it also does data management. So it'll do pulling in data volume management, managing the volume, it runs on parallel storage, as well as straight NFS, as well as local different types of volumes we can manage and we can build, does all of that embedded into the platform. So it is an incredibly robust compute and orchestration platform, specifically for performance intensive computing as of service.
Zane Hamilton:
Thank you. We have more. Keep going. So Nicholas, ask again, what kind of metrics are available to admin? So he's looking for something like X mod from Slur. What do we have in Fuzzball?
Gregory Kurtzer:
Forrest you want that one or you want me to?
Forrest Burt:
The metrics backend side of it is... The latest status of that is a little bit more of an engineering question. I know that we have, for example, wired up our Fuzzball installs to Grafana and different common logging tools like that. And we've been able to go in and see how a cluster's running on EKS, all that stuff. So the best thing I can say here is that it works with, for example, logging platforms like Grafana to ingest the logs from a Fuzzball cluster and put those into a searchable format. Greg, I don't know if you have anything to add there?
Gregory Kurtzer:
I would just say as well from the administrative side, it's using Kubernetes, it's using a very more traditional enterprise stack on the orchestrate side. So all of the typical tooling that you have for monitoring a Kubernetes cluster, you can use on the orchestrate side, on the substrate side, we're as close to bare metal is absolutely possible. And in that side, you could just use Fuzzball in terms of what it is providing, which to be clear, it's not a huge amount today. It's enough that you can get by, you can figure out what everything's happening and whatnot, but you can't do high level profiling or real-time metrics or analysis on what jobs are doing through that. If you wanted to do that there's a variety of solutions you can use as you mentioned. Really it's then a matter of just tallying that data backup and how do you want to present that back to the users. But, Fuzzball is very, very flexible with regards to that.
Rose Stein:
Okay. Hey, are you guys going to, how do you pronounce this? Isaac? Isaac? ISC, what is that?
Gregory Kurtzer:
So, the International Supercomputing Conference. So it's the Europe version of Supercomputing in a nutshell. Fantastic conference. I'm not planning on going and I think we are sending some people, but we're still figuring that out. There will be at least several of us over there. We're a pretty big company at this point, well, big, right? I guess it's all relative for me. It's a pretty big company and we're at about 85 people right now, and we are international, so we do have a number of people in Europe, so we'll probably send some people from the United States over as well as have some of our European contingent join us over there. So, yes, but not me.
Rose Stein:
I have maybe a silly question. Greg, does Fuzzball really take away the need to have a head node? Did we just get rid of that? Is that gone now?
Gregory Kurtzer:
Yep, it is gone in a traditional sense. So what Rose is asking is, in a traditional Beowolf architecture, we typically see things like, you have a control node sitting up in top, and then you've got all of these compute or sub nodes sitting below that control node acts as an interactive server where people will SSH into that control node, and then do all of their major interactive work and then interact with the scheduler that will then run the jobs on the compute resource on all those compute node. Fuzzball does not have the same architecture. And this is what I was referring to earlier. Fuzzball is a new way of looking and thinking about high performance computing systems. And there is no kind of control or interactive system as we've had in the past.
Now what we have is the notion of APIs. And there's a number of different interfaces and tooling you can use against those APIs, and it turns an HPC system more into a computing appliance really. Just a massive scaled computing appliance. So you're now interacting with this giant computing compliance via these APIs, which again, users don't usually interact with APIs, but they're interacting with the clients that are communicating over these APIs. So, you can use a Fuzzball cluster, no matter wherever it is in the world via your laptop or via your workstation. And again, just connecting and leveraging these APIs.
Rose Stein:
So that brings up a question about security, which I imagine just because I've known you for a little while, that that has been, that's top of mind, making sure that it, I mean, I imagine that Fuzzball is more secure, but can you touch on that?
Gregory Kurtzer:
So, one of the biggest difficulties or challenges with securing a traditional HPC system is the fact that you are allowing over SSH over a secure shell. You're allowing people to have full access to that, to that interactive resource up on top, and then to the compute resources when their jobs are running. And it's very hard to validate that a user coming in is in fact the right user, right? To start it off thinking about passwords, as an example, right? A password is something that only you know, well, that's the hope, right? Only you know that. And then you can use that password to go get access to the system. But what people found is this, well, it's pretty easy to either brute force or hack passwords and get in. So now all of a sudden people are getting in with your private password but they're hacking it, right?
So then we start using one-time passwords or, or additional tokens to get in. So multifactor tokens or multifactor, MFA, multi factor authentication to get into these systems. But again, that just solved the problem for a little while. Giving users full access to pretty much do whatever they want is a very difficult thing to secure. Fuzzball, because everything is going through those DSLs that Forrest was showing, those workflows, everything is defined. You can reproduce, you can replay, and you can audit every single action that's happened. As a matter of fact, all of the data ingress and egress could be completely locked down to the point where we're actually getting authentication or authorization for any sort of IO going in or out of the system via another source. So, if a lab, or classified facility has limitation in terms of where they can pull data from, or who can pull that data, or what applications are allowed to be used against that data, we can integrate that via the ingress and egress, and we can also lock down when the job is running, we take it off the network.
So they have to do all of their ingress and egress via that section of the DSL. So, we can now start securing these workflows in a way that honestly, we just haven't been able to do before. And it gives a lot of capability in terms of management, right? This is the type of thing that a CISO or somebody who's really focused on the security side is going to be very interested in making sure that everything is auditable, it's reproducible, and it's secure. Not just secure from a hacker perspective, but secure from a supply chain perspective. Keep in mind, everything that we're running now is also coming out of something that can be verified and validated. Each one of those containers can be validated. So, we can very easily have complete transparency on everything that's happening and then trust on everything that's happening. So, security's definitely top of mind.
Rose Stein:
Thank you for that, Greg. I know, I feel like we've kept you a little bit longer. I think you might have a couple other meetings to go to, but thank you so much for coming on and sharing with us the story and building this amazing company and releasing Fuzzball and making it fun and awesome. Really appreciate your time being here. And I don't know, Zane, if you have something else to say, but you guys, we are open, we are open for business. We are ready for your comments, your questions. You can go to our website, you can schedule with us and dive a little bit deeper into how Fuzzball can really benefit your environment and how we can work together. And we'd love to chat with you. So leave a comment, leave a like, say hello, share with your friends, schedule a meeting. We're here for you.
Zane Hamilton:
Absolutely. Thank you all for watching, and I want to thank the engineering team. I know you guys have put a lot of time and a lot of effort into this, so we really appreciate it. Forrest, appreciate all the work that you've done. I know you've spent a lot of time working on these workflows to be able to show people and show off Fuzzball and it's been fun watching you, over the last almost two years now, play with this thing and watch it grow. So, really appreciate everybody out there. Like Rose said, go like, subscribe. If you want to talk to us more about this, go to the Fuzzball page and sign up and give us some time. Appreciate it. Good to see you guys.