6 min read

The AI engineer era has an infrastructure problem. Here is what solves it.

April 30, 2026

The API is the beginning, not a foundation Here is what the infrastructure layer is actually missing What the full stack actually looks like The decision the rebuild requires

Contributors

Gregory Kurtzer, CEO of CIQ

Subscribe to our newsletter

In 2023, a newsletter called Latent Space published a piece that named something real. The post was called "The Rise of the AI Engineer," and its central argument was that a new class of engineer had arrived: software engineers who could take AI advancements and ship them into production without needing an ML research background. Companies like OpenAI had made AI accessible through a simple connection point: no research team required, no specialized infrastructure, just plug in and build. A whole new class of engineer emerged to take advantage of it.

Andrej Karpathy, one of the architects of modern AI, predicted that engineers who build with AI would far outnumber the researchers who build AI itself. He was right. The tools got simple enough that you did not need a research background to ship with them.

What no one fully saw is what comes next. Easy access and safe ownership are not the same thing. Every call an engineer makes to a commercial AI service sends data to infrastructure the organization does not control and cannot audit. For a startup building a demo, that tradeoff is acceptable. For a regulated enterprise with unreleased IP, legal obligations, and compliance requirements, it is a structural risk. The problem that is not being addressed is infrastructure, not engineering. And most organizations have not built the infrastructure layer that production AI actually requires.

The API is the beginning, not a foundation

I was talking with a company recently. Their engineers use AI tools every day to write code, summarize documents, debug issues, and plan roadmaps. All the things you want your team doing with AI. So I asked: who has access to this data? Source code, internal architecture plans, unreleased product details, customer lists, revenue projections, financial models. The answer was essentially: we're looking the other way for now.

Most organizations are in that position. And most of them know they cannot stay there.

Here is the thing most people miss: most AI infrastructure is not infrastructure at all. It is a service. And that service provider needs your data to deliver value back to you. They store it. They keep it. Over time, they learn more about your business than you may realize.

Enterprises are now rebuilding their entire technology operating model around AI, and the scale of that rebuild is real. Deloitte's 2026 Tech Spending Outlook found that 71 percent of organizations are actively modernizing core infrastructure to support AI implementation, and 66 percent are piloting AI-enhanced enterprise architecture designed for modularity and observability. Two-thirds plan to increase AI investment over the next two years.

The AI Engineers doing that reconstruction are not building demo applications. They are building production AI systems at regulated enterprises. They are deploying agentic workflows inside organizations where data governance is a legal requirement. They are shipping products on top of code that is unreleased intellectual property. They are operating in environments where every prompt sent to a commercial AI platform is data the organization does not get back.

RTX chief digital officer Vince Campisi said it directly: as AI becomes more agentic, organizations need governance that builds in explainability and auditability so humans can verify and trust the results. That governance starts at the infrastructure layer.

You cannot audit what you do not own.

AI is revolutionary. It is a genuine game changer. It is also, right now, a compliance risk. The commercial API is not a foundation for organizations that need to control what gets trained, what gets stored, and what leaves their environment. It is a liability.

Here is what the infrastructure layer is actually missing

The AI Engineer era requires the infrastructure layer to do something it has never had to do before: run production AI workloads at enterprise scale, on hardware organizations control, with the security, compliance, and governance posture those organizations require.

Most of the infrastructure stack was not built for this. Three gaps define the problem.

The operating system layer was built for general-purpose enterprise compute. GPU drivers, AI/ML frameworks, and hardware-specific optimization were never first-class concerns. Compliance was a configuration step, not a design principle. The result is that most Enterprise Linux distributions cannot carry a production AI compute environment and a validated compliance posture at the same time.

The compute layer was built for general-purpose workloads. GPU provisioning, resource allocation, and workload scheduling for AI are specialized problems. Organizations that run AI workloads on general-purpose infrastructure discover that the overhead is not a one-time engineering cost. It is operational drag that compounds at scale.

The orchestration layer is where most self-hosted AI initiatives stall. Provisioning compute, allocating GPUs, managing model serving, connecting inference backends to application frontends, configuring persistent storage, handling authentication, keeping all of it running when something breaks. That operational overhead is why most organizations stay on the API side of the line even when they understand the risk.

The infrastructure that the AI Engineer era requires does not exist in most organizations today. The great rebuild Deloitte describes cannot succeed without it.

Here is what the next three years look like. The organizations that close this gap build a compounding advantage: they own their models, they own their data, they own their audit trail, and they ship AI products their competitors cannot replicate because their competitors are running on infrastructure someone else controls. The organizations that do not close it hit a governance wall, not an engineering problem, not a budget problem, but a structural one. They will have built production AI systems on a foundation they do not own, at the moment regulators and customers start asking them to prove they do. The AI Engineer era does not end at the API line. It ends at the infrastructure layer, and the organizations that understand that now are the ones that define what enterprise AI looks like in 2028.

What the full stack actually looks like

The answer is not a single product. It is a coherent stack from the operating system up, and every layer has to be production-ready.

Rocky Linux is the foundation: the same open source Enterprise Linux that powers research universities, national laboratories, financial institutions, and defense contractors. No commercial licensing agreement. No vendor tax on the software the infrastructure depends on. Millions of active deployments.

RLC Pro is the enterprise tier built on that foundation. Guaranteed stability. Long-term support. Full lifecycle ownership. Compliance begins at the OS layer, and regulated environments need an OS that carries its own validated compliance posture rather than requiring organizations to bolt one on. RLC Pro does that, and it is the only Enterprise Linux distribution with active FIPS 140-3 certificates that include post-quantum cryptography readiness. Federal contractors running Enterprise Linux should know: the September 2026 NIST deadlines are not theoretical.

RLC Pro Hardened extends that foundation for security-critical environments with kernel hardening, reduced attack surface, and STIG compliance, maintained as a supported product. Not a custom build every organization has to own and maintain.

RLC Pro AI is the operating system layer built specifically for AI and ML workloads. GPU drivers and AI/ML frameworks are built in, not bolted on. Hardware acceleration support for both NVIDIA and AMD. AI workload optimization at the OS level, not applied after the fact. Other Enterprise Linux distributions do not treat AI workloads as first-class. RLC Pro AI does.

Fuzzball sits above the OS layer and solves the orchestration problem. A complete sovereign AI deployment (inference backend, model serving, chat interface, document knowledge base, authentication, and access control), runs through a single workflow submission. AI Engineers pick a model, fill out a form, and run. Fuzzball handles provisioning, service connections, dependency sequencing, persistent storage, and GPU allocation. It runs identically on-premises, in the cloud, or across hybrid environments.

Fuzzball is not a simplified version of the infrastructure problem. It is the infrastructure problem solved as a reusable, portable workflow.

The decision the rebuild requires

Deloitte identifies six markers of the AI-native technology organization. Every one of them carries an infrastructure dependency. AI as a core collaborator requires standardized, secure, scalable foundations. Human-agent teams at scale require infrastructure organizations can audit. Embedded governance requires explainability built into the systems that run the agents. The CIO as AI orchestrator requires infrastructure the organization controls, not one rented from a vendor whose pricing and roadmap it cannot influence.

The organizations doing the great rebuild have one foundational decision to make: does the infrastructure underneath this model belong to us, or to our vendors?

The answer is straightforward. You run the models where your data already lives. You decide what gets trained, what gets stored, and what leaves your environment. The AI Engineers building the next generation of production AI systems, the ones inside regulated enterprises, working with unreleased IP, building agents that need to be audited, are moving into the infrastructure layer because the API is not a foundation they can build on.

The full stack exists. The OS is hardened, validated, and AI-ready. The orchestration platform turns self-hosted AI from a prohibitive infrastructure project into a workflow. The only question is whether the organizations doing the rebuild claim that infrastructure now or spend another year on someone else's foundation.

See Fuzzball's sovereign AI stack in action: ciq.com/solutions/sovereign-ai
To evaluate Fuzzball for your AI infrastructure: ciq.com/products/fuzzball
Learn more about RLC Pro, RLC Pro Hardened, and RLC Pro AI: ciq.com/products/rocky-linux/pro