CIQ

Turnkey HPC: File Systems and Storage Architecture

January 26, 2023

video_id: OQ24ER0pZyg

Our Research Computing Roundtable will be discussing File Systems and storage architecture in the HPC industry. Our panelists bring a wealth of knowledge and are happy to answer your questions during the live stream. To learn more, email us at info@ciq.co. https://ciq.com/

Webinar Synopsis:

  • Traditional Storage

  • Changes in Storage

  • Parallel File System

  • NAS File System

  • Object

  • ZFS

  • Different Types of File System

  • Specialized Storage People

  • User Input

  • Fitting the Workload

  • Changing the Future of Storage

  • Opening Up Data

  • Storage and AI

Speakers:


Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Full Webinar Transcript:

Zane Hamilton:

Good morning, good afternoon, and good evening, wherever you are. Thank you again for joining another CIQ webinar. My name is Zane Hamilton and I'm the Vice President of Solutions Engineering here at CIQ. At CIQ, we're focused on powering the next generation of software infrastructure, leveraging the capabilities of cloud, hyperscale and HPC. From research to the enterprise, our customers rely on us for the ultimate Rocky Linux, Apptainer, and Warewulf support escalation. We provide deep development capabilities and solutions, all delivered in the collaborative spirit of Open Source.

In today's webinar, we're actually going to be talking about file systems and file system and storage architectures for HPC, so we'll bring in our panel. Jonathan, John, welcome.

John White:

Hi.

Johnathan Anderson:

Hello.

Zane Hamilton:

Thank you for joining. I'm going to let you introduce yourselves. John, I think it's been a while, but I'm going to start with you.

John White:

I'm John White. I work at Lawrence Berkeley National Lab, where I am the group infrastructure architect. I focus on storage and also have appointment over at the UC Berkeley research IT group.

Zane Hamilton:

Thank you, John. Jonathan?

Johnathan Anderson:

My name is Jonathan and I work with CIQ with the Solutions architect team. And in many past lives, been an HPC CIS admin and never been particularly specifically tasked with storage, but have done a fair bit with it from parallel file systems to NAS level, like ZFS and NFS stuff.

Traditional Storage [6:37]

Zane Hamilton:

That's great, thank you. So this one to me is an interesting topic because, coming from enterprise, I think we look at storage in different ways for different reasons. And I know some of that will play into how you look at storage and the way that you design storage for HPC. But I think maybe we should start off in talking about just traditional storage. I mean, we've seen a lot of advancements in the last 10 years. Storage has come a long way from spinning disk to SSD, NVMe. Let's talk about that a little bit just to set the basics. Storage in the last 10 years has changed. I'll open that up. I'll let you go first, Jonathan.

Johnathan Anderson:

The reality is the last time that I was doing storage was like right at the cusp of that. I haven't been in an NVMe system, and I've traditionally been in like higher ed that didn't have access to the cutting edge of the NVMe stuff. So we were parallelizing a bunch of disks for scratch and then SSDs for metadata. But yeah, with, with PCI attached NVMe, at least as far as I've seen, the story has changed somewhat in that a lot of the technologies you tend to use in those, like putting things behind raid, end up being bottlenecks for the storage itself rather than something that you can use to parallelize and get more performance out of it. So it's an interesting and exciting time to be doing it, and I'm interested to hear what perhaps more recent experience John might have in this arena.

Zane Hamilton:

Absolutely.

John White:

I've been in this space for about 20 years. I jumped right out of college into parallel file systems from GPFS back on power six machines or power five machines and stuff. The advancements then were shocking to say the least. We were coming into the maturity of the fiber channel age, and spinning discs were growing rapidly into their current pace of terabyte or doubling every so many years. You know, when that was happening, the big advancements were really just literally paralyzation of the IO itself. The working into workloads that were you know, NPI workloads, et cetera the industry kind of adopted that stuff, but they never really got their foothold.

NAS has been the king in the enterprise space for a very long time. But at this point it's all about just breaking down all those bottlenecks. Like Jonathan said, raid used to be the bottleneck moving up the stack. You would just try to be as efficient as you could just to take care of that one massive inefficiency. But as the last 10 years have happened, it's seemed to be largely about removing all of those lower end bottlenecks and VME is just talking directly to the processor. That was just removing a huge chunk of the stack from the situation. And when we're talking about NVMe, it was initially positioned by the industry as just a new interconnect that was supposed to speed things up.

But in the back of their minds, it was entirely about this new media, this nan flash just taking over the industry. When we were talking about Sayta, obviously that's another layer of complexity that was removed from the system, where you don't need sectors. You don't need cylinders and stuff. You just talk directly to the man and everything's happy about that. So, the last 10 years have largely been about removing bottlenecks. And when we talk about enterprise storage now, it's folding in all of those lessons that we've learned from the parallel years. Adding our domain interconnects, removing all these bottlenecks, aligning all of your IO's, all the way from top to the bottom of the stack. It's been fascinating to see how that's really worked its way through and not what a lot of us thought was just like, everything's going to be a parallel file system. That was the assumption. But, no, it's about collapsing all those things back into this turnkey idea of an enterprise storage solution.

Zane Hamilton:

That's great. I want to come back to that, John, cause I want you to talk a little bit and explain what parallel file system actually means, but we have one more person to join, Fernanda. Welcome, Fernanda, how are you?

Introductions [11:09]

Fernanda Foertter:

Hey. Hi everybody.

Zane Hamilton:

Good to see you again. Want to introduce yourself very quickly?

Fernanda Foertter:

So, I'm Fernanda Foertter. I currently work at Voltron Data. It's a startup. I can't talk much about what we're doing. But I have a background in high-performance computing. I spent some time at Oak Ridge National Lab, spent some time at NVIDIA, spent some time at some other startup, and now I'm here.

Changes in Storage [11:31]

Zane Hamilton:

That's great. I don't know how much you heard of what Jonathan and John had said about where storage has come over the last just 10 years. I mean, before that we can go back further, even more massive change, but just in the last 10 years, there's been quite a bit of change in storage, and I was getting their thoughts on that, and I'd like to get your thoughts on that.

Fernanda Foertter:

So I'll tell you where I started right off of college, in grad school in 2008. We had a typical closet HPC system that we built based on, was it rocks? Like cluster distro? Yes. I managed that closet system for two years while I was in grad school. And we had a typical NAS connected to the head node, and that's how people did their stuff. And then I went to my first job, and that wasn't possible. We had people all over the globe and we needed data centralized because it was a lot of data. And we ended up using what was like a new startup that had just come out, .

That was the first time I had heard of anything that was like distributed or at least in this case, a parallel file system. Not distributed, but it could have been distributed but we're building that too. And it was very different. It sort of looked the same, but it was very different. At one point we had to get a driver custom made for us because of what we were trying to do. And then when I ended up at ORNL later on, things changed on from there. And ORNL has gone through, what, three iterations now of file systems since I joined ORNL in 2012. And things are just changing every day. And now with this distributed format but we're still holding on to things that look like POSIX, I mean, object storage is trying to really carve a space, but it still needs to look to the interface. It still needs to look like POSIX.

Parallel File System [13:27]

Zane Hamilton:

Thank you. And we'll definitely come back to object store, because I think that's another interesting topic to talk about. But John, talk to me about parallel file system. When you say that, what are we talking about?

John White:

The classic way I describe it is largely just file access. When you have a number of writers, you need to either be able to intelligently tell the writers where to write their data within a single file, or you need to have some gigantic locking mechanism that you can request locks, get directed to where you're supposed to write, release your lock and keep going. So when we're talking traditional POSIX file system, shared file systems, NFS, et cetera, that can be a very challenging workflow to take care of. But with parallel file systems, the dream is that that simplifies the whole equation.

NAS File System [14:20]

Zane Hamilton:

Thank you. So looking at where things have come from, I know you said NAS earlier and looking at HPC, is that really something that HPC is using still today? I mean, is that where people just are or is there something different at Open? You talked about different types of files systems, different types of architectures, but is the traditional NAS still where it's at?

John White:

So where I operate is solidly in the mid-range space. You know, Lawrence Berkeley Lab has NERSC as a leadership class facility, but I work on the other side where we take care of all the rest of the scientists' needs at the lab itself instead of the national scale. We traditionally have a two-tier system where we do operate fairly beefy masses you know, home directories, group directories, software module farms, et cetera. It's fairly, I won't say foolish, but it's a fool hearty endeavor to try to wedge those workloads into our parallel file system when we're talking about our cost, because we want to tune our parallel file systems to be aggressive workloads, whereas we just want to have this nice stable enterprise style file system doing snapshots, doing backups, doing everything else off to the side.

Traditionally we do NAS's for that. That in the last five years, three years has started to change fairly aggressively. We're still in a two-tier setup but we are very quickly moving towards all flash file systems. When we talk about our NAS side. We aren't at the scale yet where we like. NERSC back when Glenn Lockwood was working at things. We can't just say we're going to buy all flash for our parallel file system and just make it work. We still have to work our economies and kind of split those two things right now.

Zane Hamilton:

Fernanda, you were talking a little bit about this and you nodded your head yes. What are you seeing?

Fernanda Foertter:

I answered tickets at Oak Ridge, right? I was part of the user's assistance group, and there's nothing that will grind to a halt a file system down when somebody's trying to compile something, or somebody's trying to do like many, many tiny file writes into a parallel file system. It's just not the way that it's built. And we haven't really come up with an alternative to that or something that can automatically switch when it notices it, so it's not quite yet smart enough. So we do have to have this two, maybe even three tiered systems. So Summit came out with another kind of storage that was in the node and it behaved doing the job, right? You can, sort of, load data onto that part of the storage system. So, it ended up being a three tier system. But NAS is fast. I mean, it works, it's dense, it's fast, it's cheap. You can do your CI there. You can even do a lot of your data analysis. So a lot of people ended up doing data analysis on parts of NAS, so why not?

Zane Hamilton:

Thank you. Jonathan.

Johnathan Anderson:

My experience is pretty much identical to what's been described. The way I tend to talk about it is there's storage, the technology, the thing that stores data somewhere, but I prefer to think of storage as a service versus the part of the storage that is part of your compute service. And so for me, the NAS is about whatever part of your storage infrastructure is the part of the storage service that prioritizes things like redundancy and resiliency, things that you would more associate with an enterprise focus apart from the computational service and capability of your compute infrastructure. And there, in theory, you could spec out a NAS that is performant enough to do your HPC computation and support that, but it would be very expensive. Or you could make your parallel system redundant and resilient enough to be your NAS, but it would be expensive. And it's more about separating out the different technologies into the buckets that are the most suitable for them.

Fernanda Foertter:

I want to add that the burden actually ends up being on the user, right? The user needs to know where they are. There's nothing yet that's quite smart enough and lots of folks and lots of vendors will sell you that it is smart enough and it can do both and it can do both loads. And my answer to them every time was, I'll throw you some biology problems and you can see if your system will handle it.

John White:

Yeah. I think specifically that biology bit there really brings out the opposite side of it where we build these big parallel file systems to handle the workloads of the compute. But there's always been biology, biology's always been the problem, but now there's AI as well. But when we're talking about these NAS file systems, suddenly we're talking about folks like Vas and maybe WEKA to a point and a few others, Qumulo, where you might tell these folks, run out of your home directory. Whereas before we would design these systems, like Fernanda was saying, you have to be aware where you're running your workloads. Well, maybe now you do some of your workloads over on this all flash file system that we happen to be able to afford over here, and then do your big hero numbers over on the other parallel side.

Zane Hamilton:

And that's something I think we should get back into too. Sorry, go ahead, Jonathan.

Johnathan Anderson:

Maybe this doesn't match either of your experiences. I'd be interested to know. I've seen that kind of workload be more applicable in the prevalence and emergence of HTC workflows where you're not doing parallel computation across multiple nodes, but you just have a big data set and you have jobs that are talking to portions of your data set at a time that these all flash arrays that are have taken off in the NAS space have been suitable for at least that kind of a workflow. And I think maps like those workflows have often grown up in, in a cloud space and are anticipating objects to our capabilities and those end up mapping reasonably well to an all flash NAS.

John White:

The honest answer, if there was a question there, is that parallel file systems have always, especially in the mid-range space, we've never been handling hero MPI style workloads. They've always been much smaller. They've always been you know, 50% back in the day, but maybe 90% these days. Non MPI, we don't need specifically the parallel locking of a parallel file system, but we do need these vendors who make a hundred gigabyte file systems to be able to serve the aggregate workloads that we're talking about. So generally the parallel file systems handle the 90% of compute workload, but it's always been those bio users who don't get forgotten when we design these file systems, but they are the 10% and we just can't design for them most of the time. There's a new world now where suddenly we have an option for them.

Object [21:37]

Zane Hamilton:

Excellent. Thank you, John. So we brought up, a little bit, Object earlier, but where does Object fit in this whole thing? And I'll start with you, Fernanda.

Fernanda Foertter:

Actually, I'm going to bow out of that one. Object is beyond my experience. I'm out. I sort of understand it. I sort of see it come up, but it's beyond what I've had hands-on experience with. I'll let someone else go with that.

Zane Hamilton:

John, I'll come back to you. You brought it up first so you get to, you get to go for it.

John White:

Yeah, we deal a lot in field of dreams style infrastructures where I'm working right now. And Object is definitely one of those. We build a system and we hope people will come. You know, Object's been sitting out there for 10, 15 years now. The last eight years or so, it's been a situation where we work closely with a lot of universities. A lot of our users come out of a pipeline from the academic world. The academic world is teaching their students how to do Object just as part of their education. So we are moving towards a world where Object is a tier. We have a fair sized Sef deployment that we're just about to turn on. It's a complete unknown to us, really. We have business backup plans where if the customers don't come to us, we have backups easily fit into an object store. So it's not going to be a business loss for us if that happens, but we sure hope people will be adopting it and at least teaching us where this fits in our portfolio.

Zane Hamilton:

Great. Thank you John. Jonathan, I know that we'd spent some time in Object stores around here, so talk to me about Object and where you see it fitting today.

Johnathan Anderson:

I'm actually a little bit sad because I was running a MinIO instance in my home lab to get more familiar with it and kind of understand where it would fit into a scaled down, but a tiered infrastructure. I've been migrating away from it because my needs have exceeded my single node deployments capabilities and I was getting timeouts. But my current thought process on Object is if I can go back to my spectrum of, there's the computational storage and then there's Enterprise storage. To me, Object is on the other side of enterprise storage that the purpose of it, where it shines is to be an even more resilient than Enterprise storage tier of storage. But to get there, you have to put restrictions on what you can do with your data that are outside of what we generally associate with POSIX.

And it's trying to remove the constraint of POSIX compatibility to get you the ability to do more data distribution and charting and it just gives you more capabilities on the management side of your data. So it's been in the past for me, a difficult fit to understand because people come in and they're being taught to use Object and they're doing it on the web. But it's easier if you have access to use a POSIX file system. And if you aren't webscale, there's not a need for Object Store, but then you still have that user education piece where if the object store is what they're used to, now you have an interface problem and how do you make something that provides that interface that people are used to, but is performant in a way that we would want them to be able to take advantage of, all this hardware that's associated with your HPC workload. So what I would go to if it were me, is probably a Sef deployment, which I think John said is what you're doing. And I've been interested to experiment with MinIO, but not in a production environment. But yeah, it's almost the opposite. I think of "build it and they will come" sometimes where we did build a thing called POSIX and give them all this capability, but they want to get and put and I don't know, it's an interesting problem.

Zane Hamilton:

So do you think some of that actually stems from the fact that it's easy to get cloud-based resources that are object-based storage. So most people, that's where they can do it. They can't afford to do it at their house. I mean, storage can be really expensive, so that gives me the ability to go play with something and that's kind of where people are just going natively now. I'm sorry I cut you off, Fernanda.

Fernanda Foertter:

No, I was actually going to play the role of naive user. So I downplayed a little bit of my knowledge in Object. I understand it somewhat, but what I'm wondering is, is Object Store something that requires users to come to it in the way that we see parallel file systems and NAS, right, there's an interaction? Or would the sweet spot really be, and especially for the national labs where it would be the serving of data and that's it, right? And it's serving objects out of that and there is no real interaction except with an interface. Is that who you're hoping will come is those people that will build those interfaces to those data sets?

John White:

Speaking personally, the reason we went with Sef was largely because on the back end it is an object store. But on the front end it's everything. So with that, it's exactly that. We're hoping that this is a play to generalize our huge swaths of our stack and be able to slice off little protocol bits off to whoever needs it, whether it's POSIX, whether it's actual Object and I mean, look, when it's object even, we probably aren't going to offer native , the natives object access mechanism for Sef. We're probably just going to do an S3 gateway for it. So everything's going to be a gateway to this object store. And that was a lot of cover in our butts really in what is frankly a huge investment for us.

Johnathan Anderson:

But that to me is exactly where MinIO and in my little toy environment, that's where it's remaining is to throw blobs in that I want to serve out through a web portal somewhere. The thing that appealed to me with the object store in my environment in the first place was to not have to worry about Unix permissions at all. I hate when I'm serving data out and I have to have a Unix UID and GID and set permission bits on. I'm like, I just want to put it in there and have it be done. And I liked the idea of shedding the bits of POSIX that I didn't need, like in a web service environment. So I don't know, I'm still learning.

Fernanda Foertter:

One of the comments that came up here is really interesting. Somebody basically said that we should treat it like something for digital pack rats. Yeah, exactly. I love it. That's how I see it. And so maybe I'm not as up to date and the more you know, the latest that has come out and where you are treating it a little bit more generally, but that's how I've always seen object store is, I put a bunch of stuff in there and I just serve it to people and there's really, you just have to leave it there.

Johnathan Anderson:

Yeah, I've spent a lot of time advocating with a certain tape vendor that I have relationships with to have an S3 compatibility layer for their tape server. Because that's a perfect match for me. And I see the same benefits with what AWS calls like S3 Glacier now, where you don't have to interact with Glacier directly, you can just put things in and out. That's great. That's exactly what it should be for, in my opinion, cheap and deep and just put data in and you can get it back out later.

John White:

I mean, from a provider point of view, however, I take a lot of exception with the description of cheap. ZFS, NFS is still going to beat the pants off of running a full object store, staffing the admin time. The CPU, the actual storage requirements for object stores are huge unless you're doing dangerously deep, dense shelves. To be on the provider side suddenly is very interesting from a cost perspective.

ZFS [30:08]

Zane Hamilton:

There was a question earlier asking if we could touch on some of the options outside of using a parallel file system on how to scale the design capacity and throughput with ZFS or products like Vast data or other solutions. So you kind of touched on the ZFS side of this, but how should people look at this and what are the options out there that they could look at?

John White:

So ZFS has always been the easy button for us. ZFS has always been the cheap solution for us. It's not the scalable solution. It is a scalable solution within the single digit petabytes for us. When you start talking about tens of petabytes, you start talking about multi-headed situations and multi-headed ZFS can get to be a real bear within a single name space. But when a customer comes to us and says, I can buy a drive at Office Max, or I don't know if they even exist anymore, that's our solution for them. It's like, you want to buy sheet metal, you want to buy hard drives and you want to buy a processor with some network, fine, we'll do that. But yeah, I mean, when we're talking about multi-petabyte, when we're talking about massive performance, now we're starting to talk to weird new vendors like Vast. And then you're talking about some really strange enterprise licensing, then you're talking about not being able to just spin something up in a day. But you are talking about millions of iops, you're talking about much lower latencies than CFS is ever going to be able to provide you, et cetera, et cetera, et cetera. So it's an interesting world out there right now when it comes to that.

Zane Hamilton:

So you're looking at it from when you get larger and you're trying to scale up, there are advantages to some of the other tools out there that it's maybe harder to add to or maintain, but it's also going to give you the ability to do it in a way that will continue to be fast and scale. Okay.

Johnathan Anderson:

I really like tools that stack on top of each other. And so I really like ZFS as your baseline. You've got some disk or you've got some SSDs and you're going to provide a reasonable storage experience to someone. Great. And then if you start needing to scale beyond that, I really like layering on something like BGFS or Luster, both of which back end to ZFS now, and you get to utilize those skills that you built up doing ZFS stuff and not have to have completely separate teams for that. But the cool thing that the VAs guys have done and I haven't used it in production, but I've talked with them a fair bit, is since they know that their platform is all flash in some way or another, it's all manned or other solid state storage actually, they can go back and redesign some of the software around those expectations so they have a new NFS layer that doesn't expect hard drive level latencies, things like that. So that you get benefits from integration with a full stack vendor like that that you don't get when you're taking things off the shelf and layering them. But it also means that you're going down that path and committing to it.

John White:

Yeah, I think going back to a previous comment, paying attention to your admin time is massively important. Your FTE time is probably going to be one of your more expensive pieces of the pie. And when we talk about layering, you get to a point with layering and doing it yourself where we get scared off pretty quickly. There's several times where we say, no, we're not going to run an open source CFS Luster solution for you. Sorry. We just don't have the bandwidth to do that. And if you get stuck, you're going to have to pay for… if you get stuck with a real code problem, I'm not going to fix that. Our colleagues aren't going to fix that. You're going to need a vendor to help you out with that. So we always have that in the back of our mind when it comes to providing a production style solution to this sort of stuff. It's certainly very easy to do a Luster setup, but it's when things break is the real cost there that we've experienced.

Different Types of File System [34:38]

Zane Hamilton:

So you kind of started down the path of bringing up Luster. I know we talked a little bit about Sef, but Jonathan, you mentioned BGFS I know there's GPFS, there's all sorts of different types, but let's talk about specifically what those are good at, what they're not good at, because that we talk about biology, that these different workloads can be better served by different types of file systems. Can we touch on that a little bit? Like what is Luster good for? What is BGFS good for? How are they different? Everybody is smiling at me like I'm crazy.

John White:

Luster pays the bills. That's what I gotta say about that. Okay, you know, Luster's always been great at hero numbers for large throughput file operations. It's traditionally been weak for the bio side. That's changed quite a bit with a huge amount of development from a number of vendors and the WHAM cloud team themselves.

Johnathan Anderson:

About all the different space stuff?

John White:

Yeah, exactly. The DNE work, all the progressive file layout stuff where you're suddenly able to actually leverage flash instead of just building a flash metadata target and hoping that the overkill overcomes a lot of the inefficiencies. It's a new world when it comes to Luster and that sort of stuff. I can't speak to BGFS, unfortunately. But the other side of it is there are these other vendors like, and I don't want to keep saying Vast, WECA's out there too. There are a couple others that are doing these redesigns, or not just greenfield designs of all flash file systems. And like Jonathan said, just coming up with some really intelligent client side stuff that suddenly these workloads aren't a concern at all. Not to just downplay it, but I don't think about the bio side when I'm throwing down something like that. But then I get concerned about the people who need a lot of capacity. The financials for flash still aren't there for mid-range computing. So there's definitely still the tradeoffs.

Zane Hamilton:

So Jonathan, I know this comes up because we're working with customers that are asking these questions about the different types of bio systems, and I know you mentioned BGFS earlier, but how is it different than something like Luster or are they the same? They're just ones commercialized and one's not?

Johnathan Anderson:

Well, they're all kind of converging to the same requirements, they've just started from different points. I think BGFS's biggest win is that it is really easy to get up and running with. it. I wouldn't have as much of a concern handing that to someone, a general Cis admin and saying, you can set this up, you can put this on some hardware and it will run. And it has a lot of that metadata benefit that I typically have looked to GPFS for in the past. Although my understanding is that luster has dramatically improved in that realm since I last touched it. It's almost a perfect like cost versus capability trio. You have metadata and you have throughput and you have price or complexity in, and I think of Luster as the one that can do everything if you can put the time and the money into it. And GPFS is a bit more enterprise friendly and has a bit more data management capability. And then BGFS is there and approachable, but not quite as mature as either of the two of those has been my experience.

Zane Hamilton:

Thank you. Fernanda, what are you seeing? From a file system perspective, what are you working with?

Fernanda Foertter:

You know, I've had these dreams where we're talking… Jonathan was talking earlier about composability and it's one thing for us to design a system with certain users in mind, but what if we design systems with how data will be read in mind. Not necessarily how a user is going to use it, but you know, this is a service out, this is a stream of data or this type will be, and a file system would be smart enough to elevate it to portions within this composable system that would serve it best. And maybe it is something that looks more NAS like or faster, bigger throughput, wider pipes or maybe it would be something that's object because of just the way that the data's formatted. It could be. Now I'm working a lot with folks that are working with Parquet files, so it could be a ton of Parquet files elsewhere.

So I'm wondering if we could change the idea of what we think about file systems as something that the user has to decide how to use it properly to something where the system is designed to be more flexible and it can store just about anything. And all of those components sort of exist in harmony and interconnects don't break. And I know this is make believe world, but I think that that's where we need to go and places like Weka are building that, right? Lots of different interfaces, lots of different interconnects, building a cluster on top of it. They're trying to do that.

Specialized Storage People [39:56]

Zane Hamilton:

That's very interesting. And when we talk about composable infrastructure, I think that it takes me back to what John said earlier. Are we looking at an environment where we have to have very specialized storage people? Because it seemed like for a while, storage people were hard to find, to find really good storage people who could design a storage environment. And then they started overlaying the ability to do a lot of software making decisions for you and kind of laying it out. And of course the storage guys would argue with it and want to go back and change it, but it got easier that someone like me being a regular CIS admin could come in and at least survive. But are we getting back in a place where it's getting more complicated to where we need those specialized storage people to at least give us the ability to maybe expose things to a composable way? Or are we still going down the road of let's just make it easy for somebody to manage? There's a lot to unpack in that one question, but I'll start with you, John, since you started the topic.

John White:

In my mind, you're always going to need the specialized storage guy. I mean, you're not asking the wrong person here. My livelihood depends on it. Fundamentally, yes, things have been changing the last five years towards that direction. Running a Luster file system, people can still say it's easy. Even with a support contract, you're still going to need to pay attention, at least at the design phase to a huge, huge number of things that a generalized admin isn't going to be paying attention to. You know, you're going to have to look at a lot of historic data that you have on IO sizes, on file sizes, on read patterns, et cetera, et cetera.

You know, that seems to be changing quite a bit when it comes to at least the Vasts of the world. When it comes to Weca, they still seem to be a very specifically parallel file system that you want to be paying attention to things. But the things you're paying attention to are more, like Fernanda said, interconnects and stuff like that are where you're going to want to route your data. Where the presence of the file system needs to be is where you want to focus more when you come to that sort of thing.

Zane Hamilton:

Jonathan, I don't know if you had something you wanted to add.

Johnathan Anderson:

Your question makes me think of the last time I hired a storage admin. A big part of the conversation was trying to describe the difference between what he thought storage meant from an enterprise standpoint and like, oh, you want sand and you need fiber channel and all of this like very specific gridiron level storage experience versus what we were looking for. What I think of as storage, because I've always worked in HPC and I think the most specialized skills that come in storage, at least in the way we're talking about right now is the distributed systems part of it and managing state across a wider and wider multi-system environment. So I won't speak over what John has said about his particular set of skills and the need for them but I think a bigger and bigger part of that is managing the distribution of the data. And I don't mean like the management and data movement. The distribution of data across a broader set of systems rather than what people have historically thought of as the storage skillset, which is bespoke protocols for storage and talking to this specific raid controller, that kind of thing.

Zane Hamilton:

Yeah, on same line, Jonathan, the storage guy, my favorite storage guy, his name is Scott. If you're out there watching Scott, I hope you do some time. I miss you buddy. It was interesting to watch whenever he would get a new system in and the amount of design and time that he would spend trying to get it right for what it was supposed to be doing, and he would look at the tools the software vendor gave, they would spit out what they believed it would be and he would completely destroy it and start over from scratch every time. And it made me always wonder why do they even have the tool? Because if the tool is that wrong every time that this storage guy is tearing it up and starting over, what is the point?

And that's my curiosity is where, if at all, have we actually made any progress into making this easier or it sounds, and maybe I'm off, but it truly depends on the environment and what you're trying to do with it. It's not a one size fit all, it's not an easy thing to just go stand up storage and hey, that's great there it is. To Fernanda's point, being able to have a storage solution that kind of is as a service that fits the right requirement and the right need sounds really complicated. So I don't even know where I was going with that, but it's always, it's just interesting to me. Jonathan, don't laugh at me. Where was I going? I had another question that I wanted to ask that, John, you touched on a little bit.

User Input [45:06]

Fernanda Foertter:

Before you go there, I think I just wanted to add one more thing. What I'm proposing is let's get people the ... when people are designing file systems, they're designing file systems with a mix of users in mind and a mix of ideas and what's coming up or maybe what they've seen already. And we talk about the systems that we've done at the labs in a multi-tier system, but can we get more input from the user at the time of creation of the data set? And we don't have that ability to tag that as it's right this way, but it's read this way or accessed this way or searched this way, I expect it to be done this way. And then the system goes, oh, okay, this is written transactionally, the red object. That is what I think is missing from this design. How do we design a system? I wish a vendor was trying to come up with that. The connectivity and the interface. Those are nice to have. But that's a step into, I'm going to write it here, but this is also going to always be read as an S3 bucket. And so it goes, okay, we're going to write it here and then later on at night, I'll make the move and I'll organize it over there and I'll create some metadata and I'll push it, right?

John White:

Yeah. And I think that's one of the biggest challenges, especially at a ludicrously multiuser environment like LBL where we just don't have the bandwidth to be tracking our customers like that. If a vendor stepped up with something like that, that somehow automated in an intelligent way, this sort of access both the access pattern and access protocols, et cetera, et cetera and the efficient, the cost efficient storage of those things in addition to that, that would be a massive win. But yeah, I don't see a lot of development in that direction.

Zane Hamilton:

So I didn't think of what I was going to ask you, John.

Fernanda Foertter:

Fund my startup then I think. I think we're going to start a startup. That's what we need.

Fitting the Workload [47:25]

Zane Hamilton:

I like it. So John, looking at the way that clusters get built today, they get built in a specific way and they are what they are. As things change and as these different workloads come in and the different needs from that, how easy or complicated is to take what exists and make it fit. I'm not even sure if I'm saying this correctly, kind of fit that workload. Because it seems like everything gets built for a specific reason in a specific way, but everybody coming two years from now may not want that anymore. How easy is it to change without causing just complete chaos?

John White:

Over the last 10 years, that's another huge trend in the storage industry as far as I can see. You know, 10 years ago when you wanted to see access patterns, for instance, you wanted to see your traffic patterns, you wanted to see your IO sizes, you wanted to see what your live file system is doing and what your clients, how your clients are beating it down. It was a mess of RRD tool, custom scripts, horrible, disgusting graphs, et cetera. And frankly, I just waited. It seemed to me that it existed. There's going to be development and it's going to get better. Five years ago, eight years ago, I went back and suddenly there was Grafana, there was telegraph, there was Prometheus now coming up and stuff. And suddenly this world of data suddenly became available to the storage administrator.

Where before, frankly, we were just shooting at hero numbers and understanding that there are certain workloads that were just going to lose out. And we are shooting at hero numbers with assumptions about what those workloads were, not even saying that you know, 75% of our users are going, it was just we assume 90% of our users do this. And when we got that instrumentation, and I think this is true of a lot of storage storage groups across national labs and other industry, we were shocked, just blown away by what the read heavy environment that exists. We had no idea, we always tuned towards right throughput hero numbers because right was the hard thing to do. So if you do the hard thing, clearly the easy thing's just going to fall in place. And now with the advent of the monitoring but also instrumentation like Darshan and a few other analysis APIs, we suddenly have the tools to be able to go to a vendor and say, no, look, we have a ludicrous breadth of customer here, but this is actually what we're seeing day-to-day. So, let's combine all of this data. Not point out individual people, but this is our workload, this huge wealth of information is our workload. Let's design towards that.

Changing the Future of Storage [50:22]

Zane Hamilton:

That's very interesting. So we we're getting close to time and I want to ask this, and I want to ask each one of you individually to answer this question, but I'm going to start with you, Fernanda, how have you seen, or have you seen that things like AI becoming more and more prevalent, how has that changed storage from a people creating models, the amount of data that's stored, the amount of data that's accessed, whatever that would look like. How is that changing storage?

Fernanda Foertter:

So when I went to NVIDIA, I went to the healthcare team and I had experience in biology at my first job and then I ended up at the folks with, after NVIDIA, with BioTeam. And the same thing kept coming up and that's data integration. If I want to do AI, it's great for me to grab everything there is, let's say in healthcare every image that there's ever been from a particular specific MRI machine. But if I want to do this at a hospital level, city level, you know, system level, country level, I can't do that. I can't integrate those data sets because they're coming from different systems, different machines, different resolutions, different time period, different metadata, different data associated with it, different formats, et cetera. The biggest barrier now for us to make use of any data and why, and my advice of bioIT was, if you're going to start doing AI on anything, do not focus on patient or customer facing AI, focus on things to improve your processes, things to improve things within your business.

Because you can control much better how you access the data and you don't have all the constraints within that. Or if you do want to just do a research activity, then start from scratch, forget what you have because that's going to be too hard. And you're going to spend a developer years and maybe decades just trying to organize and integrate the data. And that's just been a consistent problem throughout. And in fact, Andrew Ang yesterday posted something to the same tune was data integration's really the hardest part. So, how I see it is, as always, give users the ability to find ways to agree the data. And part of where I move to is what's interesting to me and the reason why I move jobs is just to kind of follow what's interesting to me is how to solve this data integration problem.

How do I solve the connectivity? There's some really nice open source projects now in the Apache ecosystem that allow for that, that allow for you to write Python. And then there it generates a backend, whatever the backend might be, and you just point to it. And so you don't have to think about how do I write it in that particular API or back end? This is how I write it in Python, it compiles it to that backend and then connects it. And now I have one single interface, I can connect to several backends. I can grab things from several backends. I can put 'em together in something that looks like an optimized data frame that's not slow pandas or something maybe that's distributed. And now I can do some AI. That is when we're going to start to see real progress and the ability to make use of legacy data or just data that you have now or coming from different systems or federated data, et cetera. What I see is, can we integrate that with storage? Can we partner with storage to say we need to be able to communicate with all these systems. Can you agree on this standard or a data standard? Can you serve Parquet? Right? Can you just serve columnar data because this is the way that AI ingests it. That's where I see it.

Opening Up Data [54:09]

Zane Hamilton:

Thank you. That's fascinating. So are you seeing it where people who have data that they kind of keep behind their four walls are more willing to open up as long as they can get access to other data outside of that. So somebody may have a large amount of data on something, but it would be easier to share in some other way. Do you see people willing to do that?

Fernanda Foertter:

Yeah, I mean they have to. There's just not enough data, right? Even all of the world's written things go into this chat GPT thing and it's kind of dumb, right? We thought we fed it enough and it just looks smart. But you can't take risks like that, especially in healthcare, right? It can be a little bit dumb. It has to be equal to or a little bit better than what humans can do. And humans can see a lot of those things. So to feed all the world's data and it's still a little dumb, it's kind of a waste of time. And if you're feeding it a small amount of data, then it's going to be even more dumb. So you're wasting your time. So you have to branch out.

And you have to talk to other people's data. And maybe that's anonymized data or maybe that's public. Maybe that's me combining data sets from a city and also city water quality and trying to identify, am I going to see more cancers because I know that there's more lead in that water over the long run. And maybe that can help with prediction of I don't know, government systems and making sure that people have the right treatment, et cetera, or climate change. What am I going to see out of climate change? What are the sorts of tropical diseases that are going to come up? Those kinds of simulation with the people that, who do I have here if I'm Japan and I have elderly people, who do I have here? What are the likelihoods of diseases that will come with this additional dataset that I've integrated with it? That's really hard. That's almost impossible now.

Zane Hamilton:

It's fascinating. It's very cool. Jonathan, storage and AI.

Storage and AI [56:11]

Johnathan Anderson:

Honestly, I'm kind of just caught up in the vision of what that would mean from Fernanda. Yeah, the beginning of your story was really interesting to me because of how much it resonated that people would want to do something with AI in our environment and then the first thing they would try and do is apply it to the data we had internally, just like you were talking about. Because all of the interesting data is otherwise encumbered. But where I've seen AI in storage has been trying to do things like that internally and I haven't seen that pan out yet. Things like fault prediction and failure analysis by feeding storage data into an AI system. But that's not quite the interesting part of the crux of the question. And I don't really have other experience with AI where I've noticed it. It's just been another workload.

Zane Hamilton:

Fernanda's answer was awesome. It's kinda hard to follow that, sorry. And now we have John. You get the last word here. You get to follow both of them.

John White:

It's interesting because I'm coming from the opposite direction as a service architectural provider I guess is a description of it. Beyond the obvious speeds and feeds issue and just the shape of the data that I have to be able to provide a system that can ingest and consume it. The one thing that I would echo from Fernanda's point is that the industry as a whole, not just in storage over the last 10 years, we've had to become way more flexible to our users' concerns. It used to be when I started that we were old school UNIX admins and we could say, no, this is the way you have to learn how to use our batch schedule.

You have to learn how to do MPI. You have to learn how to just use FORTRAN or compile in these specific languages. I hate to use this example, but Python came along and suddenly we were caught unaware, we were caught completely off guard where our users finally were saying, no, I can't do my work if you don't provide this option for me. So when it comes to AI in the storage world, 90% of the challenge is just being flexible enough for these people to be able to do their work and provide the systems, the protocols, and whatever flexibility they need for us to provide for them to do their work. If I can echo anything, it's absolutely that.

Zane Hamilton:

No, that's great. Thank you very much, John. Guys, we are up on time. I want to really thank Fernanda, John, appreciate you guys coming. It's always good to talk to you. Always have fun. Fernanda, thank you for that last thing and I probably will spend the rest of the day trying to wrap my head around that. So thank you very much. That was awesome, Jonathan, as always, it's great to see you. Guys, we'll see you next week. We thank you again for joining us and spending your time with us if you would like and subscribe. We will see you then. See you everybody. Thank you. Thanks a lot.