In the world of High Performance Computing (HPC), every second counts. For institutions like Texas Tech University’s High Performance Computing Center (HPCC), maximizing research productivity is paramount, which is why the recent transformation of their HPC infrastructure, powered by CIQ, is such a significant milestone.
The challenge: increase uptime, minimize staff time
HPCC is always looking for opportunities to save staff time by deploying technologies that increase reliability and decrease time to deliver the service. HPCC’s job is to deliver HPC infrastructure support in ways that empower researchers to do their best work. However, the challenge is achieving this without consuming excessive staff time – a precious resource in academic and laboratory settings. In adopting any suite of support products, HPCC needs to determine whether it will save more staff time in aggregate workload than it will cost them in money.
Like many in the HPC run-your-own-cluster field, HPCC’s immediate goal was to find a reliable replacement for CentOS, which is approaching end of life, but the team also wanted a solution that would extend further, evolving all their technologies with each new version release.
In addition, HPCC sought a support team that would help them resolve any issues rapidly, so they could achieve their ultimate goal: to maximize the research productivity of the university.
CIQ's solution: a modernized HPC software stack
To meet these needs, HPCC chose the CIQ HPC software stack, which includes:
- Rocky Linux: a seamless, stable, and secure successor to CentOS
- Apptainer (formerly Singularity): containers for HPC with full software supply chain security
- Warewulf: a highly scalable cluster management and provisioning toolkit
With CIQ’s simplified and supported turnkey HPC stack, HPCC could harness the full power of computational resources and easily and efficiently execute critically important performance-intensive workloads.
HPCC also engaged CIQ’s escalation support, customization, optimization, integration, and other professional services.
The outcome: exceeding expectations
To measure the success of its engagement with CIQ, HPCC set qualitative goals that were greatly exceeded. For example, based on a previous experience of upgrading operating systems on head nodes, HPCC put aside four days for the process in a planned shutdown schedule. Instead, with the active involvement of the CIQ team, the upgrade was accomplished in a little more than a morning, saving significant time and money. HPCC increased up time and minimized staff time, successfully achieving its mission to maximize the university’s research productivity in dollars, sophistication of technology, papers and students taught.
Alan Sill, manager director of HPCC, said, “It’s been a good investment of money to spend on the service contracts we have with CIQ. It has accomplished my goals of not just saving staff time, but saving staff time in a way that lets them be more productive on other things. My experience with my staff has been that they can quickly become dismissive of support that they don’t consider to be expert. If they’re calling someone up, and they’re getting an answer that they knew already, they will quickly tell me how much of a waste of time that was. That hasn’t happened with the CIQ folks. Every time we’ve come to them with a problem, they’ve delivered a solution. That’s what I was looking for: people who know more than I do.”
With CIQ’s assistance, Texas Tech has not only demonstrably achieved savings in staff time and costs but also has delivered reliable and scalable HPC infrastructure to help Texas Tech researchers do what they do best: science.