Nvidia documents as a known issue the inability to mount CIFS and SMB shares when using their Mellanox OFED on Enterprise Linux (including Rocky Linux) 8.4 and beyond.

Description: OFED installation caused CIFS to break in RHEL8.4 and RHEL8.5. A dummy module was added so that CIFS will be disabled after OFED installation in RHEL8.4 and RHEL8.5.
Workaround: N/A
Keywords: Installation, RHEL8.4, RHEL8.5, CIFS.
Discovered in Release: 5.4-

At first, this seems an arbitrary restriction; but Enterprise Linux 8.4 added support for SMB-Direct, accelerating SMB performance over RDMA. This introduced a dependency between the cifs kernel module (which handles mounting of CIFS and SMB shares) and the ib_core and rdma_cm kernel modules (which handle RDMA). The Mellanox OFED replaces both ib_core and rdma_cm with its own versions. However, the Mellanox versions break compatibility with the OS-provided modules and, as a result, are incompatible with the OS-provided cifs module.

We consider three potential ways to address this incompatibility:

  • Nvidia could ship an updated cifs module along with Mellanox OFED that links against their rdma_cm and ib_core modules. (For now, Mellanox has stated that "CIFS is not a part of the NVIDIA product portfolio.")
  • We could rebuild the cifs module against the Mellanox kernel modules. This is what Nvidia has suggested.
  • We could rebuild the cifs module without SMB-Direct support, avoiding the dependency conflict.

The third option, while perhaps less than desirable, is the most straightforward path to restoring CIFS / SMB functionality as it existed prior to Rocky Linux 8.4, removing the conflict with the Mellanox OFED.

To demonstrate this process, we developed an OCI Containerfile which incorporates a rebuilt cifs module without SMB-Direct support. The resultant container can be used as a Warewulf node image.

FROM ghcr.io/hpcng/warewulf-rockylinux:8.7

COPY MLNX_OFED_LINUX-5.9- /var/tmp/
COPY rpmbuild/RPMS/x86_64/kernel-modules-4.18.0-425.19.2.el8_7.x86_64.rpm /var/tmp/

RUN dnf -y install \
        libnl3 \
        numactl-libs \
        tcsh \
        elfutils-libelf-devel \
        gcc \
        gcc-gfortran \
        gdb-headless \
        kernel-rpm-macros \
        kernel-{core,devel,headers,modules,modules-extra}-4.18.0-425.19.2.el8_7.x86_64 \
        libtool \
        lsof \
        make \
        patch \
        perl \
        python36-devel \
        rpm-build \
        tk \
    && dnf -y reinstall /var/tmp/kernel-modules-4.18.0-425.19.2.el8_7.x86_64.rpm \
    && dnf -y remove $(dnf repoquery --installonly --latest-limit=-1 -q) \
    && tar -C /var/tmp -xf /var/tmp/MLNX_OFED_LINUX-5.9- \
    && ( cd /var/tmp/MLNX_OFED_LINUX-5.9- \
       && ./mlnxofedinstall \
           --skip-repo \
           --kernel $(rpm -q kernel-core --qf '%{version}-%{release}.%{arch}\n' | tail -n 1) \
           --add-kernel-support) \
    && rm -rf /lib/modules/4.18.0-425.19.2.el8_7.x86_64/extra/mlnx-ofa_kernel/fs/cifs \
    && /usr/sbin/depmod 4.18.0-425.19.2.el8_7.x86_64 \
    && rm -rf /var/tmp/MLNX_OFED_LINUX* /var/tmp/*.rpm \
    && dnf clean all

COPY smb.conf /etc/samba/

RUN dnf -y install cifs-utils samba \
    && dnf clean all \
    && mkdir -p /srv/samba/shared \
    && chmod -R 0755 /srv/samba/shared \
    && chown -R nobody:nobody /srv/samba/shared \
    && systemctl enable smb

This Containerfile has three primary dependencies and concerns: the Mellanox OFED itself, a modified kernel-modules package, and (optionally) Samba for testing.

The Mellanox OFED

You can obtain a copy of the Mellanox OFED from the Nvidia website. You may need to modify this example Containerfile for the version of the OFED that you use; but for this example we used MLNX_OFED_LINUX-5.9-

Much of the Containerfile is concerned with installing the Mellanox OFED. This includes a list of packages that are dependencies of the installation process.

The Containerfile also removes the dummy mlnx-ofa_kernel/fs/cifs module. If this module is left in place it takes precedence over the OS cifs module.

Custom kernel modules

The main payload of the Containerfile is the addition of a custom cifs kernel module. To keep this as simple as possible, we rebuild the kernel package entirely, and then install the modified kernel-modules package.

dnf builddep
rpm -i 
echo "%dist .el8_7" >>~/.rpmmacros
rpmbuild -ba rpmbuild/SPECS/kernel.spec

This procedure installs the dependencies for building the kernel, then installs the kernel source package. Installing a source package adds it to $HOME/rpmbuild in preparation for building. From there we update the packaged kernel config file to disable SMB-Direct, configure rpm-build to tag the new package as an el8_7 package (matching the upstream binary package of the same version) and build a new set of kernel packages.

The provided Containerfile copies the new kernel-modules package from the default build location, but you may need to update the Containerfile for the correct path on your system.

After installing the updated kernel-modules package, the Containerfile also manually runs depmod. This removes the now-missing dummy OFED cifs module from module dependency tree and ensures that the cifs module from the new kernel-modules package is included.

Testing it all with Samba

We included Samba (and cifs-utils) in the image to allow us to demonstrate the capability to mount an SMB share locally. To do this, we also included a minimal Samba configuration.

workgroup = WORKGROUP
server string = Samba Server %v
netbios name = rocky-8
security = user
map to guest = bad user
dns proxy = no

path = /srv/samba/shared
browsable =yes
writable = yes
guest ok = yes
read only = no

With this image booted on a compute node in a Warewulf cluster, we were able to mount the local Samba share, demonstrating that the functionality had been restored.

# mount -t cifs -o guest //localhost/Anonymous /mnt
# findmnt /mnt | cat -
/mnt   //localhost/Anonymous cifs   rw,relatime,vers=3.1.1,sec=none,cache=strict,uid=0,noforceuid,gid=0,noforcegid,addr=0000:0000:0000:0000:0000:0000:0000:0001,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1


This new conflict between the Mellanox OFED and the upstream kernel highlights one of the potential risks of any out-of-tree kernel module: a lack of ABI compatibility. To avoid such incompatibilities we encourage Rocky Linux users to use native kernel modules whenever possible. However, if you need to use the Mellanox OFED, for hardware compatibility or performance reasons, this workaround should allow you to restore the ability to mount CIFS / SMB shares, albeit while missing out on the performance improvements available via SMB-Direct.

Jonathon Anderson
+ posts

Similar Posts