CIQ at RMACC 2026

May 18, 2026

5 min read

What Jonathon Anderson presented at the 2026 RMACC HPC Symposium: Version 3 brought Slurm integration and multi-cloud to production. Version 4 closes storage gaps for on-premises HPC teams.

The CIQ team attended the 2026 RMACC HPC Symposium in Boise, Idaho, on May 12–14. RMACC, the Rocky Mountain Advanced Computing Consortium, is a regional collaboration of academic and research institutions across the intermountain states, organized around advancing the effective use of high-performance computing. The annual symposium brings together the research computing community: system administrators, HPC engineers, and the researchers who depend on that infrastructure to do their work. It is one of the more focused HPC gatherings on the calendar, with an audience that builds and runs these environments professionally.

Jonathon Anderson, HPC Product Engineer at CIQ, presented a session titled "Fuzzball: New Features and What's Next." He opened with a pointed observation about how HPC infrastructure has historically worked: every cluster treated as its own bespoke environment, configured differently, accessed differently, and effectively a serial number of one. Fuzzball exists to change that. The platform is designed to make workflows run consistently regardless of what sits underneath them, and version 3 and version 4 are where that design becomes something research computing teams can deploy against their actual infrastructure.

What Fuzzball is

Fuzzball is a high-performance computing orchestration platform that makes workflows portable across on-premises clusters and cloud infrastructure. It sits in the orchestration layer of the CIQ software ecosystem and the goal is straightforward. Researchers and engineers define what their pipeline needs, and Fuzzball handles the logistics of getting it there, regardless of where "there" is.

The platform has two core runtime components. Orchestrate is the control plane, responsible for scheduling, provisioning, storage management, and secrets. Substrate is the compute agent that runs on each node, acting as the container runtime and keeping job execution as close to the hardware as possible. Users reach the system through a web UI, a CLI, or an SDK, with full feature parity across all three interfaces. The underlying API is exposed via gRPC and OpenAPI, so programmatic integration requires no special connectors or drivers.

Every job runs in a container. Fuzzball supports both Apptainer and Docker image formats, so the software stack travels with the workflow rather than depending on what is installed at the target site. Data moves automatically to wherever the job runs. Storage provenance and access are tracked throughout execution, which matters for research environments with compliance or reproducibility requirements.

Multiple Fuzzball instances connect through Fuzzball Federate. Federate evaluates every available environment and routes each job to its optimal destination based on cost, performance, and data locality across AWS, Google Cloud, Microsoft Azure, Oracle Cloud, CoreWeave, and on-premises clusters. A training job can land on cloud GPUs while sensitive inference stays on-premises. The workflow definition does not change between environments.

Meet your infrastructure where it is (version 3)

Fuzzball version 3 focused on deployment flexibility and operational scale, with specific additions that address where research computing environments actually are today.

The cloud target list expanded substantially. Version 3 added production support for Google Cloud and Oracle Cloud, with CoreWeave available in preview. These additions are alongside AWS, which was already supported. For institutions running on-premises schedulers, version 3 also added provisioners for existing PBS and Slurm clusters: Fuzzball layers on top of the existing resource manager rather than replacing it. Teams get modern HPC orchestration without a full migration or partitioning resources between two systems. The same Fuzzball orchestration experience is then available on any supported cloud, on natively-provisioned Fuzzball Substrate compute nodes, or through a legacy resource management system.

A local Docker Compose deployment option reduces the barrier to getting started. A single command, fuzzball cluster docker-compose deploy --up, configures and brings up the full stack locally. Teams and individual researchers can validate workflows and test configurations before committing to a production cluster or cloud environment, which lowers the cost of evaluation considerably.

Version 3 also introduced fuzzball run, a command that drops an engineer directly into a shell on a Fuzzball-managed node, or executes a single job, without writing a workflow file. Before version 3, every interaction with Fuzzball compute went through the full workflow machinery; but sometimes a researcher needs to test a container environment, run a quick GPU job, or debug interactively. fuzzball run handles all three cases in one command.

Service endpoints expand the use cases that Fuzzball can offer beyond batch processing. A workflow can now natively serve a Jupyter notebook, a virtual desktop, or an inference API as a persistent service, accessible directly in the browser or across the network with no SSH tunneling required. A researcher can submit a batch simulation and access the results through a Jupyter notebook served through Fuzzball in the same session.

The workflow catalog, also introduced in version 3, provides pre-built templates for common HPC and AI workloads. Current entries include AlphaFold2 for protein structure prediction, Ansys Fluent and LS-DYNA for simulation, BLAST for sequence alignment, and a growing set of additional templates across AI/ML and scientific computing. Each template is containerized, parameterized, and ready to run with user-supplied inputs and storage.

Workflow observability received a significant expansion. Fuzzball has always reported stage and job status as execution proceeds, but provisioning and image management operations take long enough that additional visibility into their internal activity is useful. The event system now surfaces that activity throughout the workflow lifecycle, accessible from the CLI, the API, and the web UI, with filtering by stage and event kind, and structured JSON output for programmatic consumption.

Closing gaps for on-premises HPC (version 4)

Fuzzball version 4 targets three areas: storage architecture, AI integration, and the user experience.

Storage is the most consequential change. Orchestrate gains an integrated object cache that removes the dependency on external S3 buckets for data ingress and egress. Today, moving data into and out of Fuzzball workflows requires external object storage. The integrated cache brings that capability inside the platform boundary, which simplifies deployment, reduces external dependencies, and gives Fuzzball direct visibility into data movement and provenance. The new object browser in the updated UI reflects this: users see and manage their data directly within Fuzzball rather than navigating between the orchestration platform and a separate storage system.

The internal storage system can also be used as an internal container registry. This allows images to be pushed directly to, or built entirely within, Fuzzball, without depending on an external public or private container registry.

For sites with parallel file systems, version 4 adds support for external and host path volumes, improving integration with Lustre and GPFS as well as support for existing home directories and project storage. This lets Fuzzball work with existing site storage directly rather than requiring data to route through an intermediate object store.

On AI integration, the workflow catalog expands with additional AI and ML templates specifically targeting support for a turnkey sovereign AI experience, whether you use AI primarily for content generation, agentic collaboration, or code development. Fuzzball will also become accessible through an MCP (Model Context Protocol) interface, allowing a self-hosted language model to execute computationally intensive work through Fuzzball, observe the result, and optimize further runs or decide what to run next. For teams building pipelines that incorporate LLMs as active participants in the computation, this streamlines custom integration work that currently sits outside the platform.

The version 4 web interface is also a full redesign. The updated UI unifies the runtime view and the workflow editor into a single surface with real-time resource monitoring and input and output file access. An updated CLI ships alongside it, streamlining the traditional command-line experience as well.

Finally, version 4 rounds out Fuzzball's support for hyperscale cloud providers by adding support for Azure.

Why it resonated

RMACC draws the people responsible for the infrastructure these problems live in. The Slurm integration and the version 4 storage roadmap were consistent topics in conversations at the event, which reflects where a large portion of research computing teams are right now. Existing scheduler infrastructure represents years of operational investment. A platform that layers on top of it rather than asking teams to replace it meets those environments where they are. The Lustre and GPFS integration addresses the same reality from the storage side.

If you missed the session and want to see where things stand, request a demo or reach out to the team.