Apptainer: Latest Enhancements, GA Release & More

In case you missed it, Singularity has been moved into the Linux Foundation and renamed to Apptainer! We are pleased to announce the first full release of Apptainer v1.0.0. In this live webinar, we will present a live demonstration of new features and bug fixes for v1.0.0.

Pull Request #109 Experimental: Enable instance checkpointing with DMTCP: https://github.com/apptainer/apptaine...

Webinar Synopsis:

Speakers:

  • Zane Hamilton, Vice President of Sales Engineering, CIQ
  • Forrest Burt, High Performance Computing Systems Engineer, CIQ
  • Robert Adolph, Entrepreneur, Strategy, Technology & Trading, CIQ

Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Full Webinar Transcript:

Zane Hamilton:

Good morning, good afternoon, and good evening, wherever you are. Welcome back to another CIQ webcast. We appreciate you joining. Don't forget to like and subscribe so we can stay in touch with you. And if you don't mind, drop a comment in, let us know who you are and where you are. We like to stay in touch with you if you can. So today we're going to be talking about Apptainer. I know we talked about Apptainer in the past, but I think there are some things that we would like to update you on. And I have Forrest and Robert with me today. 

Forrest Burt:

Good morning, everyone.

Zane Hamilton:

Morning, Forrest. Welcome back, Robert.

Robert Adolph:

Good morning.

Zane Hamilton:

Thanks for joining. So we talked about Apptainer once in the past. We had Ian and Forrest on to talk about Apptainer, but I think it's important for us to go back and dive into it a little bit, and start from the beginning. So can you guys describe to me what is Apptainer?

What is Apptainer? [00:52]

Forrest Burt:

So Apptainer is essentially a container runtime that is built to execute and build containers on your architecture. That is specifically built with High Performance Computing and security use cases in mind. So similar to how we would have Docker Podman or something like that. It's essentially a container runtime in that same family of software.

Zane Hamilton:

All right. So it's called Apptainer now. I know previously it was called Singularity, so I think it's important to be distinct and are they the same? Are they different? What is that relationship?

Difference Between Apptainer and Singularity [01:29]

Forrest Burt:

So we'll back up a little bit and start from the beginning. Singularity is a containerization technology and software platform that was started in development at LBNL by Greg Kurtzer and his team there about the mid-2010s or so. Going back even further, High Performance Computing is essentially an industry where there has traditionally been a focus on monolithic architectures and the usage of software stacks that are well defined in that sphere. So in about the mid-nineties, the Biowolf model came out for High Performance Computing clusters. That is essentially a bunch of commodity computers, all linked together with one primary computer. The head node, they call it, managing all the rest of those systems. And with that came concepts like MPI, which allows you to spread software runs over multiple different servers, or compute nodes as we call it, at once.

And so HPC has traditionally used a lot of these same technologies, the same job schedulers, that kind of stuff. There has not been a lot of ability to bring some of the latest and greatest of different innovations that the rest of maybe enterprise is working into HPC. And a big example of that is containers. Containerization didn't really have a place in HPC before Singularity. It was something that was more used in enterprise, but there wasn't really a good implementation of it in the High Performance Computing architecture. And so as mentioned, this need was recognized because containers are a fantastic piece of technology for High Performance Computing. Because of the portability, reproducibility, security, and other convenience aspects of them that make them very useful for a lot of different things in that sphere.

Singularity was built to fill that container gap in HPC and provide a meaningful way to allow containers to be built and deployed across High Performance Computing architecture with all of those modern benefits like the ability to use MPI and stuff like that. Singularity was born out of a need to bring containers out of enterprise and into HPC as well. It was built, it was deployed at many different sites over a few years. It started to gain a very large user base across a lot of different places. There was over 25,000 installs of Singularity at different sites around the world. Singularity continued to develop until about mid-last year or so.

Singularity is a very mature technology. It has been in development for a number of years. It has a number of different features that are very useful. And then there was a question that was put out to the Singularity community at that point. It dealt with what was next? Where should Singularity go? And this question revolved around moving Singularity into the Linux Foundation and allowing it to be in that sphere in a similar way that other projects like Kubernetes and such have ended up under the purview of different organizations like the Cloud Native Computing Foundation. The question went out as to whether or not Apptainer wanted to be moved into the Linux Foundation. The Apptainer community unanimously approved this. About 50 or so community members were sent the question: should Apptainer join the Linux Foundation?

The response from the Singularity community was unanimous from those 50 people that yes, this was a good idea. And so inroads were started to be made there. As a part of that process, it was demonstrated also that there was strong support from industry users of Singularity as well. So places like Intel and AMD. People also wrote in support of this move into the Linux Foundation. And so with all that set, both the community and the enterprise or industry side of the Singularity community in approval of it, things went forward. Singularity ended up joining the Linux Foundation. Because of legalities surrounding trademarks and a commercial version of Singularity that's maintained by a third party, the name Singularity couldn't continue to be used.

The project had to be renamed as something. Apptainer was a suggestion that came up and it ended up winning out among the community. And so now we are here, with the initial release of Apptainer, separate from the previous Singularity. The open source Singularity, the  development of it has ceased, and it's now moving under the purview of Apptainer. And so that is how we are here with Apptainer.

Moving Apptainer to Enterprise [06:48]

Zane Hamilton:

Thanks, Forrest. I appreciate that. Robert, I'll throw this one over to you. I know most of this was built for HPC ,and I know that that's where everything's been focused. But what things are changing or what has been done now in the Linux Foundation to drive this to the enterprise as well?

Robert Adolph:

In a lot of cases what we are seeing is a need for the immutability, need for cryptographic signing of applications, the need for software supply chain security, a need for the running traditional HPC type of workloads inside of enterprise AI, machine learning, and other workflows are really driving the need for containerization of those applications. We're seeing a big growth in that capability and this is a format that folks are appreciating.

Has the Linux Foundation improved capabilities in Enterprise and HPC? [07:48]

Zane Hamilton:

And then moving this into the Linux Foundation, has that helped with adoption? Is it helping grow capabilities in the enterprise and across HPC? What does that look like and what is that doing today?

Robert Adolph:

Yeah, Zane. That was one of the biggest reasons to move it there was to get tighter with Cloud Native Compute Foundation, open HPC, and the entire Linux Foundation's family of projects. It was an extremely important step to expand the reach of what it can do.

Forrest Burt:

Yeah. Just to jump in there. You look at some of the other projects that are currently under the purview of the Cloud Native Compute Foundation or the Open Containers Initiative. You see things that have a huge effect on the enterprise and increasingly HPC. But we see that these organizations are already heavily involved with technologies like Kubernetes, containerd, runC, and Helm also is another good example. This whole cloud computing as it relates to containers is really organizing around these organizations. And as HPC increasingly moves into the cloud, having something like Apptainer with the model that makes it good for HPC, things like, for example, its integration over isolation approach, whereas with Docker, you will find that the natural way the container runtime wants to run is in an isolated way, so that the containers are as isolated from the host they are running on as possible.

Apptainer is far more focused on integration because there are a lot of specialized hardware and software pieces that are a part of a High Performance Computing cluster. We definitely want those to still be able to be used within Apptainer. So it has very robust support for GPUs, MPI: these things that we need to make available inside of containers so they can be effectively used in-scale in HPC. It also has that single run time that builds its containers as single flat files that you can move between systems very easily. You can then take and cryptographically sign those containers. You can encrypt them as well, the underlying file system that they are made up from. So Apptainer is obviously something that is hugely focused on HPC.

As HPC begins to move into the cloud, we see more and more use of public cloud or even private cloud for people wanting to run HPC. We definitely want to be coupling Apptainer as tightly as possible to these organizations that are driving that innovation and driving that development of those vital technologies out there like Kubernetes and that kind of thing. So it is definitely a very good thing for Apptainer, being essentially the premiere container runtime and HPC, to be a part of those organizations. To be able to make sure that Apptainer is included in making sure things work with it and such like that. There's a lot of cross-pollination that can go on there.

There's also as part of the Linux Foundation, like OpenHPC, which is another big project that traditionally is focused around the open source. If you go and look on there, they will talk about Warewulf and Rocky Linux and stuff like that at OpenHPC. So that's something we're very excited to be closer to as well, being Apptainer and HPC, as I've said a few different times. We're definitely very excited to move closer though High Performance Computing-focused projects that are part of the Linux Foundation like that.

Zane Hamilton:

Great. Thanks, Forrest. So Mike, appreciate you jumping in and being the first one to post that you are listening and where you are from, Greg obviously joined in too. So we have a good question from Stefano and he's wanting to compare and contrast: what are the pros and cons of Apptainer compared to Docker for an HPC workload? I think it's an important question we should probably distinguish.

Forrest Burt:

Absolutely. So I've touched on a little bit, but I'll go into a little bit deeper. Essentially, the security model and the running model that Docker uses in order to run its containers, doesn't work well with a multi-tenant HPC system. For example, an enterprise server, we are maybe thinking of something like a web server, where if you don't count thousands of people that might be accessing your website, the total number of people that are actually going to interact with that server is probably very low. It is you, the rest of your system, and administrators. So half a dozen people more, depending upon your organization. 

So when we are in an HPC system, we are not in an environment where we can count on just having a select number of very access controlled users being on that system at any one given time. A traditional HPC system could have hundreds of accounts for different users on it to be able to log into it. Typically what we see is sites that will set up a login node, which is a node that is just for users to be able to log into things on. This is done for certain reasons in HPC like avoiding things being run on a head node and stuff like that. When you have hundreds of people that are on this node or need to be able to use these resources, Docker’s model of needing to be privileged in order to interact with it falls apart.

When you have hundreds of different people, you can't possibly give all those people the level of access with regard to security that Docker requires in order to be able to run it. And so Docker in general does not work very well in a traditional High Performance Computing cluster environment. It also doesn't particularly scale over nodes as easy as Apptainer can with its integration for MPI. And so those are a couple of big things. As I touched on, a big focus of Apptainer is integration over isolation. So whereas in Docker, you would have a container that's trying to isolate from the host and present its own file system, its own interface, with Apptainer, when I get into an  Apptainer container on a compute node or HPC architecture, I'm going to be my same user as I was outside of that container as I am inside the container.

You can immediately see how that would be useful. I can spin up custom environments that have AI or ML tooling in them. And then be able to apply that environment to what I'm doing on that HPC cluster. Maybe if I'm developing on a node, I can interact with the files that I own on that cluster directly from inside of an Apptainer. The security model makes it very easy to integrate and do HPC work. Another big thing about Apptainer that I wanna make sure I mention is it can lead to much greater efficiency when you are deploying High Performance Computing workloads because of certain aspects of how it does file, input and output. If you can imagine trying to run a MPI job across a thousand nodes at once, maybe running a molecular dynamic simulation, or maybe something that's using Python doing a lot of small file IO imports.

If you were doing this on a normal file system, the operating system would be having to seek and look through that file system, traverse directories to find those files. If you're using an Apptainer instead, the internal file system, all of the different components that are represented in that container – the AI or ML tooling, anything, system stuff – that's all represented inside of a compressed SquashFS file. When you're reading from this SquashFS and doing this small file import, the speed of that will be much faster than if you were pulling it from a standard directory. Because instead of doing directory traversals, the SquashFS compressed file system underneath it allows you to just read offsets into that SquashFS.

It becomes much faster. The target of your IO becomes a single file instead of the entire directory structure. You can get that same thing if you're writing a lot of data by adding an overlay to a container. An overlay in Apptainer is an EXT3 image. You can add onto a container to add a writeable layer to it. And if you target that writeable layer with outputs of small file IO, because you are, once again, just writing data at offsets in an image file, it has that same type of speed than if you're trying to traverse all over the directory and write stuff. So Apptainer has a lot of different benefits over Docker in an HPC environment. Docker is really meant for running services.

This is where the concept of microservices, one container for one small task, came out of. Essentially, the difference boils down to the fact that Apptainer has a fundamentally different model than Docker with how it works. It also features a single runtime, whereas in Docker you're going to be, and this touches on the security model of it as I discuss, needing people interacting with Docker to be privileged. Docker's daemon-based system requires you to manage starting and stopping that with systemctl. In Apptainer, the container runtime is a single executable platform that you just feed in these container images that Apptainer produces called SIFs, Singularity image formats.

Zane Hamilton:

That's great Forrest. I appreciate it. You're good. Go ahead.

Forrest Burt:

Sorry. I did want to finish my thought there. If I can remember, oh yeah, the run times, sorry. There was a knock at my door a couple minutes ago and I'm not sure what it was and it threw me off a little bit. Like I said, in Docker, it's a daemon-based system. You have to manage it with systemctl. In Apptainer, it's a single execution run time that operates over these single flat files. In Docker, you're going to be managing images that are all managed through something like Docker image LS. You will be able to see the Docker images that you've built on your system. It's something that's inherently tied to Docker. With Apptainer in these SIFs, they can be treated like any other executable file, especially if you use some of the other features that Apptainer has that allow you to use them like an executable and specify what they do when you run them that way. You can treat Apptainers as an executable and they can be moved between systems. For example, in research moving a file around and then because of the security features that we've discussed, you can also cryptographically sign and encrypt those containers in transit, so that you can have guarantees that the container is what it is and it hasn't been tampered with at all. So there are a lot of differences. Apptainer is very different from Docker.

Security and Executing Encrypted Containers [19:57]

Zane Hamilton:

Thank you, Forrest. So on the security side, Robert, I know you like to talk about this quite a bit. Even though it’s great that you can encrypt the file so that in transit you'll have it, you also like to talk about the fact that you can actually execute that thing and it stays encrypted. So at runtime, you can verify it before you execute it. You can execute it, run it encrypted. Am I off?

Robert Adolph:

No. And with the software supply chain being such a crucial aspect to everything now, having multiple signatures to identify that container, similar to what Forrest was just describing as far as sharing, but also now as far as placing that onto resources, right? So when you place those onto an endpoint or resource, you can then run it in an encrypted state. It never decrypts out of memory. It gives you the capacity to deliver a solution that can fulfill an entire life cycle of an AI workflow, all the way from training, to inferencing, to delivering those algorithms to an edge device, to now perform. It just gives people a way to systematically maintain that software supply chain throughout their entire solution.

Forrest Burt:

You can imagine, building on that, how Apptainer could be integrated into, for example, your CI/CD practices to do automatic verification, encryption, and security. There are a lot of different ways that you can look at those security abilities, as Robert talks about, use those features to ensure Apptainer's security across the software supply chain. So very useful features there that are useful in both more human-focused and also more automated settings as well.

Vendor-specific Container Hubs [21:55]

Zane Hamilton:

Thanks, Forrest. Misha has a question and it's talking about vendor-specific container hubs. NVIDIA has theirs, AMD has theirs, and then Docker obviously has theirs. Is there any plan to get them to use Apptainer instead of Docker? I think I know where the answer's going to go, but I'll let you answer that for us.

Forrest Burt:

I doubt it. I can't really speak to any plans, but I can also tell you that that's not particularly necessary. One big thing Apptainer is also built with is Open Containers Initiative or OCI compatibility. So the standards that they're putting out for things like Docker, Apptainer is fully compatible with. Whereas in Docker, just to explain how that happens, when you're pulling a container off of the NGC, we might do a Docker pull and then provide the pull tag. With Apptainer, it's the same thing. Apptainer pull and then provide the pull tag and the URL to the container. 

Whereas in Docker, there is a concept of layers. For example, if you're building a Docker file, every run copy or other command introduces a new layer into the container.And that has other things. For example, that increases your attack surface with the number of containers or number of layers you have in an image. So that's the way those work. But if we're using Apptainer and we do that Apptainer pull, there's no concept of layers in Apptainer. All of those layers will be compressed down into that single SquashFS or compressed SquashFS that makes up the core file system of an Apptainer container (aka the SIF). So, in essence, there isn't a particular need for them to specifically build it with Apptainer. That would be great, especially if this is HPC-focused technology. And I know that if you go to the NGC and look around, I believe that they give you instructions on how to use both Docker and Singularity (Apptainer) to deploy those containers as well.

I believe I've seen tutorials over there. It's just included in their instructions. So there is already a concept of support for some of them out there. But the most important part of this is Apptainer has full support for OCI images and pulling those from a container registry is an operation of taking all the layers that make up that Docker container, compressing them down to the SquashFS and then delivering you the Apptainer. So it works just fine with Docker and containers built and deployed on those registries.

Zane Hamilton:

Excellent. That's where I figured this was going to go. I just wanted to let you guys talk about it. So again, Misha, great question. We appreciate it.

Forrest Burt:

Absolutely.

Zane Hamilton:

The proliferation of Docker there, it was built that way for a reason with Singularity and Apptainer, to make sure that you could pull from those. So I know that Apptainer 1 has been released now. Right?

Forrest Burt:

Yes, and I think 1.0.1 was also released yesterday as well, the next one after.

New features in Apptainer 1.0.1 [25:15]

Zane Hamilton:

Excellent. Can you tell us some of the features that are in that?

Forrest Burt:

Yeah. I have a demo here in a little bit as well and one of the bigger ones that came out of it. So just to go through our changelog here. What I will demo here in a bit is instance checkpointing using DMTCP – that's Distributed MultiThreaded Checkpointing. That is an HPC-focused technology that allows you to checkpoint the state of an application while it's running. Why would you wanna do this? If you're running something in HPC, it's entirely possible that this one single run of an executable could last days or even potentially weeks. If we are sitting there and we have a simulation that's running for a very long period of time, it would not be good to have something go wrong.

I don't know, perhaps there is an error in the code that becomes apparent after a long period of time; a cosmic ray hits a bit and flips it and causes the whole thing to crash. Then suddenly you have just lost those days or weeks of computational time that you have been waiting on. DMTCP allows you to do checkpointing of applications and this allows you to save off the state of an application while it's running, and then be able to restore the application to that state at a later period. We have implemented this on container instances in Apptainer. A container in Apptainer is essentially equivalent to what would be called an image in Docker. A container instance in Apptainer is equivalent to what would be called a “Docker container.”

This is the concept of the one-off immutable base copy of an image versus one-off copies of that image made to do something specific. So you have a specific service container and you're spinning up copies of that to run a service. You can now use DMTCP to checkpoint those container instances in their state in Apptainer. You can start up an instance, attach a checkpoint to it, checkpoint that application as it's running in that instance, and then be able to restart that instance at any point from that checkpoint and have the state of the application inside of it be the same. So that is very useful. That's a great HPC-focused part of it. That will be very useful for checkpointing, workloads running out of Apptainer containers on High Performance Computing architecture. Just to run through a few other things here that are a little bit smaller, but still interesting.

There is an option to set up a writeable tmpfs when you're working with a container. This container is immutable. It's a read-only file system. While you can interact with your host file system inside of it because of how the Apptainer security model works, you cannot edit the container itself. The container is immutable. This is the basic concept of a container. You can use a writeable tmpfs in Apptainer that gives you a temporary file system in RAM to be able to write things to. That's useful for certain use cases. 

Nowadays, one of the new features is that when you are doing a build of an Apptainer container, you can take the test section, which is a section that allows you to specify commands that will be run after the container is built to test it for functionality – so this could be checking for the version string of different applications or even something more complex – you can now use writeable tmpfs with this build option to generate a writeable space and RAM for the actual build process to use with that test functionality, so that you can use a writeable, temporary file system with your tests as well once a container build is done. So that's very useful for some use cases. 

There is a shorthand flag that implies a few different options that people typically want to use with OCI or Docker containers. So that is shorthand there that makes that easier. There was some improvement around the way that GPU libraries are brought into a container. As I’ve mentioned, one of the big things about Apptainer is that it has support for GPUs and being able to utilize, give what is inside the container GPU to work with.

Normally, the libraries that this needs to have bound into the container from the host, in order to make this work, are discovered via a configuration file that comes with Apptainer. Or they are discovered via one of NVIDIA command line tools, called NVIDIA Docker. Nowadays, there is some new tooling that has come out of NVIDIA, surrounding managing NVIDIA-based containers and GPU-based containers. So there is some new support in Apptainer to be able to use their NVIDIA container CLI instead of this pre-made list or this NVIDIA Docker tool, if you specify that you wanna use that container CLI tool. That is a new way to work with GPUs, making them, or having support for them in containers as far as their libraries. So that is very useful for deploying AI, ML, and anything that uses a GPU in a High Performance Computing environment. There are some differences surrounding how cgroups are done. It now supports the second version of cgroups and that hierarchical setup, as opposed to just the first version of cgroups.

In general, there are a lot of other smaller changes that came out with it. Those are some of the bigger, new features that have come out with it. There were some changes to how some of the internal security options that Apptainer provides work. For example, you can do things like set flags on whether or not you want unencrypted SIFs to be able to be run within the Apptainer runtime. There are also other things that give better support for Docker and OCI images. In general, lots of internal changes that work on how libraries are found across not just GPUs, but other aspects of finding those into the container.

In general, a lot of different changes. A lot of these changes came out in the initial release a couple of weeks ago. We had the next release yesterday. A lot of the changes that were in the initial release were focused on providing backwards compatibility for Singularity installs. If you are curious about that, maybe you haven't made the transition yet to Apptainer, we do have a previous webinar that went over that migration, how that works, and what that looks like. So feel free to refer to that if you are looking for a tutorial on what that migration looks like. But in general, the first release was focused around the different things that need to be done in order to provide that backwards compatibility, improving support with Docker and OCI images, and improving the ways that libraries are found and presented to the container. And then added a few new features: most significantly, the DMTCP checkpointing.

Zane Hamilton:

And I believe that is what you are going to show us, right?

DMTCP Checkpointing [33:22]

Forrest Burt:

Absolutely. Give me a moment to pull up my VM and share my screen. I'm here on a Rocky 8.5 box. I have Apptainer installed and I have also got DMTCP installed. We will make sure to provide a link to the poll request on Apptainers repo that shows the basic tutorial I’m about to go through here of how DMTCP works. It has some of the information about how you should install DMTCP. You want to install it from source but there is also an option that you want to make sure you use that enables it to use static C libraries or something along those lines. I would have to go and look at the thing again. We will make sure that link is out there that tells you how you should build DMTCP and the options that you should make sure it's built to provide support in this manner. And then I installed Apptainer as an RPM from the repo there. We will get started on this demo here. I have got my commands here to make sure that I don't lose them. So the first thing that we will do is: apptainer checkpoint create. Then we will provide a name of that checkpoint. So we will call this example-checkpoint.

Go ahead and create that. This creates a file in apptainer/checkpoint. You can see it goes down like this. There is nothing in there right now because we have not actually done anything with this checkpoint yet. You can see that is where the data's put out. So we haven't done anything with this checkpoint, but let's go ahead and do something. I have a simple HTTP server container recipe that I have built, that we're going to use with this demo. You can see that this is pretty simple. You can also get this from the pull requests that we’lll link. If you have any problems with ‘which’ and for whatever reason DMTCP not liking the function that ‘which’ is occasionally implemented as, this is a workaround. Basically just setting an alias there inside of this specific folder. So a little bit of an odd deal there but if you run into any problems with that, that is a workaround.

We will go ahead with the SIF that I've built from this definition. We will go ahead and start a container instance of it. We have the base SIF that is our container in this case. And then we are going to start an instance of that container. That is a one-off copy of that that is comparable to a Docker container running a service based on a Docker image. So I'll go ahead and do: apptainer instance start. I will provide the DMTCP launch flag to let this know that we want DMTCP to be set up inside of this container to be used as well. We will provide the checkpoint that we want to link to it, which is the one that we just added or the one we just created: example-checkpoint.

Then we will go ahead and provide the path to the container SIF that we want to make this instance from. We do server.sif because it is just here in the same home directory that I'm in. And then we'll go ahead and provide a name that we want this instance to have. You will notice down here we have a start script. This is what this container is going to run when an instance of it is started. In this case, you can see that we have python3 and we're just going to run this code right here. Then we are also going to pick up an option from the start of the instance. In this case, that option will come from right here. So we will provide the port that we want this to connect to.

Here on our localhost there on the command line, see that we have the instance started successfully. We should at this point be able to curl localhost 8888 and get a response out of it. Pretty simple one, but still a response. You can see we get a zero printed out of this. This is a very basic example, but you can see how if we can checkpoint applications like this, it would be very useful across HPC. So this is currently serving us out of 0. When we do that crawl command, we will send a post so that we can change that to just a 1. Then we will go ahead and curl that again and you will see that that's now changed.

We are getting a 1 out of it. At this point, we have changed something about the way that this instance is running so we can actually checkpoint it. And so we will do: apptainer checkpoint instance server. The name of this instance is ‘server.’ So we are just referring to that instance here and telling it to checkpoint it with the checkpoint that we set up which is: example-checkpoint. So go ahead and do the apptainer checkpoint instant server. We get info telling us what checkpoint we are using. And then we will go ahead and stop the server.

Normally, this would wipe away the state of the server. If we restart this off that same image, per the concept, it will create a new unedited copy of that base SIF container. But because we are using DMTCP checkpointing, we will be able to restart this container instance here once I put the command in. In just a second we will be able to restart that instance based on that checkpoint. We will be able to see that immediately once that data is loaded. We are able to see that 1 being printed out again. As opposed to the 0 that it was before, even though we are creating this container instance from the same base container that we were using before.

We would expect it to be 0 if we were not using checkpointing. I will go ahead and put the next command that we need in here. We do: apptainer instance start --dmtcp-restart, which in this case tells it that we are looking to restart a previous checkpoint with this instance that we are spinning up. Then go ahead and provide the same name of the checkpoint that we used before and I will show you. You can see now that there is data inside of that Apptainer checkpoint DMTCP example-checkpoint. That is how you can have some visibility into what the checkpoint looks like. For example, you can see the restart script there. You can see the restart script that is about to restart this checkpointed instance. I will go ahead and copy this.

Go ahead and provide: apptainer instance start --dmtcp-restart example-checkpoint. Then we will call this restarted. Actually hold on do: server.sif and then we will name this something else. We will call it restarted-server. Then we provide the same port. When we do this, this instance will restart. You will see that instance start successfully. And then here in just a second, once DMTCP loads all that memory and that application, we should be able to curl this again. And get the 1 that we previously saved out of this before. So we'll go ahead and do this and you can see we get that 1 out of there. And we have checkpointed this instance. That is the DMTCP checkpointing. 

Just to reiterate what I did there: I created an Apptainer container instance of this server.sif Apptainer that I have here. I changed some data in it. I checkpointed that with that Apptainer checkpoint instance server. I checkpointed it with that. I shut that server down. If we were to restart another server off of that without doing this checkpointing, it would curl 0 in that case because we are making a copy of what is already there without the modifications that we used curl to post to it. Once those changes were made and it was checkpointed, we were able to restart that instance then and have the state of the application inside of it be saved and be able to be redeployed with that instance. So you can see how that would be useful for a wide variety of different use cases, especially things that are long running, and are using this instance framework. That is DMTCP checkpointing in Apptainer.

Zane Hamilton:

Thanks, Forrest. That is great. Very interesting and cool to see in progress.

Forrest Burt:

Very neat.

What is CIQ's role in Apptainer? [43:52]

Zane Hamilton:

So as we come to the end of this, Robert, what is CIQ's role in Apptainer?

Robert Adolph:

We definitely stand behind it as one of the members of the Linux Foundation and have multiple folks that contribute to the solution. We do have a full commercial offering around helping people utilize it and get the most out of the solution. CIQ is definitely standing behind it in all ways. Additionally, I would also say we are actively growing our team extremely fast, hiring folks that can take solutions like this and help customers execute. So this is something that CIQ is committed to doing.

CIQ Support Model [44:49]

Zane Hamilton:

Excellent. Robert, want to touch on the support model that we have and the way that we approach that?

Robert Adolph:

Yeah, definitely. So our core ethos is to support people and empower people. So our model is based on that. It is based on the number of people that need support and help in your organization. We do have other ways to structure that, that might be more beneficial for you. But that is the best way we have seen yet is to work with the individuals that are using it, in order to make them successful, which then makes their organization successful. We found that to be the most direct way to do it. And we are in this to help people and empower people. So at the end of the day, we thought our support and structure and our services model should represent that.

Zane Hamilton:

Yeah, it is definitely something that I haven't seen in the industry before. It's very interesting. And it seems to be getting a lot of interest instead of having to count everything and keep track of everything. So it is a great model. Thanks, Robert.

So Patrick, I know Patrick Roberts posted in there that he had done some that they did smoke testing after a build. So appreciate that comment in there, Patrick. So we are at the end of time, guys. If you don't have any more questions, as Robert said, we are growing very fast. There are a lot of openings out there on our website, go check them out. I think they just got posted to links for the jobs that are out there. Looking forward to hearing from you guys. Don't forget to like and subscribe so that you can stay in touch with us and can keep track of you. I appreciate your time today, guys. We will hang out for another minute. See if we have any other questions. Maybe? I like the new background, Forrest..

Forrest Burt:

Thank you.

Zane Hamilton:
Very nice. Robert got a haircut. All right. We appreciate the time today folks. Join us again probably next week for the next topic. Thank you very much. Talk to you later, guys. Thank you.