NVIDIA just called the inference era. We built the OS for it.

March 18, 2026

RLC Pro AI

2 min read

Contributors

Brian Dawson

This week at GTC 2026, Jensen Huang made a statement in front of 30,000 developers in San Jose that many infrastructure teams have already been experiencing in the trenches: training is no longer the hard part. Inference is.

Huang was direct: inference has overtaken LLM training as the dominant AI workload. Every token, response, and automated decision an agentic AI produces is an inference call. At scale, that demand compounds fast. The result is a fundamental shift in where AI compute costs actually live, from building models to running them. That shift is massive and permanent.

We are now at an inflection point, and it changes what the infrastructure underneath AI workloads actually needs to be.

For the better part of the past decade, enterprises have built their AI infrastructure on whatever was available. Linux runs beneath roughly 88% of machine learning workloads, but not all Linux is equal. About 34% of that footprint runs on distributions built for research, not production inference pipelines. Optimized for experimentation, with a permissive package ecosystem that prioritizes flexibility over throughput. The remaining share is largely dominated by enterprise distributions built for stability, and slow to adopt the kernel and driver updates that AI performance depends on.

Meanwhile, generative AI is no longer a research project. Enterprises are moving it to a core business function, and with that shift comes a migration away from turnkey cloud stacks toward specialized and on-premises infrastructure they own and control. That is what is driving inference demand.

To get performance, the OS for GPU workloads needed to be assembled by hand, updated with caution, and maintained by engineers who spent meaningful time managing the OS instead of the models running on top of it.

That's not only inefficient, but costly.

When inference was a secondary workload, that was tolerable. When inference is the primary workload, the OS is no longer background infrastructure; it is part of the performance equation.

RLC Pro AI, CIQ's new AI-centric OS, solves that problem, and GTC confirmed what it was built for.

Every component in RLC Pro AI was chosen to deliver more AI output per dollar of infrastructure investment. For example, the NVIDIA CUDA and DOCA-OFED stack is pre-validated and ships ready to run; no assembly required. With RLC Pro AI, nothing needs to be patched into compatibility; the kernel parameters, PyTorch configurations, and networking stack were all selected and validated together, as a cohesive system, optimized for inference workloads at production scale.

This transition from AI experimentation to production AI is not primarily a model problem because the models already exist. The challenge is the infrastructure underneath them: how stable it is, how efficiently it uses the hardware it runs on, and how much engineering time goes into keeping it running versus pushing more work through it.

At the OS layer, that is exactly the problem RLC Pro AI was built to solve.

Inference at scale is fundamentally hardware-constrained. One of the reasons for this is that demand for compute regularly outpaces GPU availability. The organizations that will lead the charge are not the ones buying the most GPUs, they are the ones extracting the most from the ones they have.

In benchmarks on identical hardware, RLC Pro AI delivered up to 32% faster throughput on vision and segmentation workloads and up to 10% faster LLM inference, without changing the GPU, the model, or the application. That translates directly to more tokens per GPU-hour, lower cost per inference request, and infrastructure economics that improve with scale rather than working against it.

The inference era is here, and RLC Pro AI is the Enterprise Linux built for it.

Learn more about RLC Pro AI and get started → https://ciq.com/products/rocky-linux/pro/ai/

Inference: When a trained AI model applies its learned knowledge to new data to produce predictions, decisions, or outputs.

Ready to learn more about what CIQ can do for you?

Get in touch