An In-Depth Look at the Testing Process for Rocky Releases

Webinar Synopsis:

Speakers:

  • Zane Hamilton, VP of Solutions Engineering, CIQ
  • Skip Grube, Senior Linux Engineer, CIQ
  • Chris Stackpole, Testing Team Lead, RESF
  • Wale Soyinka, Documentation Team Lead, RESF
  • Steven Spencer, Documentation Team Deputy, RESF
  • Lukas Magauer, Testing Team Member, RESF

Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Full Webinar Transcript:

Zane Hamilton:

Good morning, good afternoon, good evening, wherever you are. Welcome back for another CIQ webinar. My name is Zane Hamilton. I'm the Vice President of Solutions Engineering here at CIQ. At CIQ, we're focused on powering the next generation of software infrastructure, leveraging the capabilities of cloud, hyperscale, and HPC. From research to the enterprise, our customers rely on us for the ultimate Rocky Linux, Warewulf, and Apptainer support escalation. We provide deep development capabilities and solutions, all delivered in the collaborative spirit of open source. Today, we're going to be talking about Rocky Linux some more. We're going to actually be talking about the testing process for Rocky Linux, so we have a good group of people on from the Rocky Linux community. I'd like to bring them in. Excellent. Welcome, everyone. Good to see you guys. I know some of you have been on before, and some of you have not. I'm going to go ahead and go around. I'm going to start right next to me. I see Skip, introduce yourself and tell us what you do in the Rocky community, please.

Introduction of Panel [6:08]

Skip Grube:

Hey, I'm Skip Grube and I do whatever's needed to be done. Mostly, I work at the release engineering and development team, but I'm very testing curious. So I'm here as an asker of questions and a learner.

Zane Hamilton:

Excellent. Thanks, Skip. Stacks, good to see you.

Chris Stackpole:

Good to see you too. Yeah, I'm Stack. I am the testing team lead and just help out wherever I can and whenever people need help, I try to fill in. 

Zane Hamilton:

Excellent. Thank you, Stack. Wale, it's good to see you again.

Wale Soyinka:

Hi, my name's Wale. Wale Soyinka. Also, Skip just stole everything out of my mouth. I have nothing to say. Yeah, so mostly documentation on the Rocky Linux side, and I am very, very curious about testing as well.

Zane Hamilton:

Excellent. Thank you. Steven, good to see you. Welcome.

Steven Spencer:

How are you? Good to see you again. Yeah, Steve, I work with the documentation team as well. I've done a little bit with testing, but only a little bit. And I do some work with Lukas and Stack on release notes for the most part, and how it relates to those in testing. Then, the rest of it is just what I do with documentation.

Zane Hamilton:

Excellent. Thank you, Steve. Finally, Lukas?

Lukas Magauer:

Yeah, hi everyone. Yeah, I'm basically just a member of testing, but also I'm flailing around in cloud stuff and documentation lately. 

Zane Hamilton:

So that's great. I hear a lot of documentation; that's always good. Seems to be the first thing that gets dropped in a development environment, so I love that you guys are working on that and taking care of it. I'm going to go ahead and just ask the broad question. I know whenever you're building a Linux distro, obviously there's a lot of testing that goes on, but I think for someone like me who's never actually built something like that, it seems a little bit daunting. So how do you even get started doing something like that? What does that look like?

How to Get Started Building and Testing [8:20]

Chris Stackpole:

That is something that's still ongoing. We are still evolving that. But yeah, in a way, I wish Trevor was here. He took the initial lead on this and I came in to help out with a lot of that. But early on we realized that we needed to have some way of testing and verifying that the work we were doing not only matched our goal of being one-to-one compatible with upstream, but wasn't broken so that we weren't releasing bad products. We want to be respectful of our community and have the best work that we can. There was a lot that went into it and we evaluated a lot of other groups and this isn't necessarily a dig against any other groups, but we looked at shell scripting that we saw other projects do, and we thought, that's not going to be complete enough.

There are other groups that are doing these CI/CD pipelines for some basic stuff, and we thought that's not going to be good enough. We want to do something that's a lot more robust that works great for a lot of application testing early on, but it doesn't capture anything of what the user ends up going through or would end up seeing. We looked at other projects that were doing Kickstart builds, and a lot of the Kickstart builds were fantastic and that they covered all the basics of how to set up and do a quick install, but they weren't testing any of the usability with the Kickstart. They completely bypassed the installation method, which the more experienced admins, they're all going to use Kickstart. But for a lot of the day-to-day users, they're going to install, they're going to go through that GUI, and you miss that entirely with the Kickstart.

When we looked at these other options for doing it, they just did not cover the testing as detailed as we would like it to be. So we started looking at what other projects do. We really liked the model that Fedora and openSUSE have of using something called openQA and going through graphical and command line ways of doing installation, testing applications, not just through the installation but applications after we've installed, and testing different aspects of every bit of the Rocky project. It allows us to also test different not just the standard install to x86, but we test other architectures, and we test other methods as well, like cloud images.

Zane Hamilton:

So that's a great way to start this, Stack. Whenever I think about open source communities and I think about testing, I think about: something gets built; the community goes and downloads it, starts playing with it, opening bugs. So hearing that something takes place before that and there's actually testing that goes on before, it's pretty fantastic and something that I hadn't really thought about before. But you mentioned testing this on different architectures. Where does all of this take place?

Testing Different Architectures [11:41]

Chris Stackpole:

So for the vast majority of the time, it's been taking place in our home labs. We've got several people in the testing team who have their own dedicated hardware sitting in their home lab server, whatever that is, wherever that is, whether that's a laptop or a desktop or underneath somebody's desk. We have a lot of scripts that the community has developed in order to assist in that. Lukas and I are just a small representation. There's been a lot of great work that's been done by Trevor and Al and Alan and Rich. I mean, we've got a whole bunch of people who do a lot of work and a lot of it's been on our personal gear, but recently we have set up an openQA system that we are hopefully going to soon open up to more of the community that is running in our cloud environment so that more people can contribute without having a lot of gear. And we'll be showing that a little bit soon.

Zane Hamilton:

Very cool. On the openQA side, I mean, you talk about testing the installer, I'm assuming that it tests more than just the installer. Like how far will that go?

Chris Stackpole:

We try to test as many variations as we can of the installer, different options and different ways that people might go through it. So we test all the different flavors of the ISO that we release. So that would be the minimal, the DVD, things of that nature. As we find errors and problems in our testing, we're trying to add those in. I've got this long-term goal to automate all the things so that everything is automated and tested as best as we can, but that's a lot of work. We've still got a long way to go on that, but we're making good progress on it.

Lukas Magauer:

Yeah, to add on to that, for sure, it's basically we find some problems in our release pipeline and then also pick up these problems and mostly put it either in openQA or write shell scripts, Anisible scripts, and what is needed to get that test done.

Zane Hamilton:

That's great. So Stack, you mentioned the different versions of the different ISOs. Since they're out there – you can get the one for GNOME, you can get KDE, you can get MATE – here's a lot of different ones, and you're doing that on all of the different installers to go through all of the different building up of those different environments?

Chris Stackpole:

Not quite. We're still working on some of that.

Zane Hamilton:

Okay. And I guess to Skip, since I know that you're doing the Pi thing, where's the Pi image and all of this; has it been integrated yet?

Integration of Pi Imager [14:46]

Skip Grube:

It has not been integrated into the testing conglomerate. I hope they accept me. Pi image is a little, it's got some quirks to it. It's a little non-standard.

Chris Stackpole:

Yeah, and it also hits a bit of a sore spot for me at the moment because it's been on my to-do list to implement that and get that moving along. I swear I am going to do it really soon. I’ve been slammed recently, but yes, that is something I really, really, really want to have happen very soon. Well, but life.

Lukas Magauer:

Yeah, well, but when we just don't let it fall, we just take the time to test that stuff then manually. So yeah, this takes like hours, but we at least test it.

Chris Stackpole:

That might be a good point to throw this in here because it's a lot more than just the openQA aspect. When we have a release, and maybe I should back up a little bit... So let's say the new release comes out, where the last session 8, 7, and 9.1, we used Mattermost Playbooks for managing all of our release stuff. It worked a whole lot better than a lot of the previous manual tracking and updating of a wiki. But there were some things that I'm actually working with Mattermost on in order to make some improvements going forward. We have in there different aspects of things – that this part needs to be tested manually; this part can be done by anybody in the community. Anybody can, wants to, can join in and run that test or multiple people run that test.

There are a lot of aspects that are still very open to the community who just need to be able to have some sort of equipment to install it on, whether that is a virtual machine or physical machine or Raspberry Pi or something along those nature. Then we have a lot of the stuff that we're trying to automate and that's within the openQA. There are aspects of the manual testing that still get done. There are still a lot of things that we have as we need to check this; we need to verify this “hey, this bit us last time; let's make sure it doesn't bite us this time again,” that go through. We've got a very lengthy checklist of all the things that we know that we need to verify before we do a release.

Lukas Magauer:

And on that we also try to document every piece and we try to test manually so that it can always be reproduced by somebody else if one of the testing members is not available or even to expand the testing members. 

Wale Soyinka:

I have a question for Stack and Lukas on this: are there some things that cannot be tested with openQA?

What Cannot Be Tested with openQA [17:46]

Chris Stackpole:

Some of the aspects that are going to be a lot harder to do are things like the Raspberry Pi quirks because it's not doing a hardware emulation. For most of the software, we should be able to eventually get automated with openQA. At least that's the hopeful goal. Some of those tasks, though, are going to be a lot harder to do than others. Do you have a thought, Lukas? 

Lukas Magauer:

The pieces when we can't really test in openQA are of course hardware. So like firmware stuff with RAID cards or something like that. And as well, there is this special test we always do, which is just burning an image on a DVD drive or even if it doesn't fit anymore on the Bluray and then running it because they're really special occasions where the BIOS just flukes out because it just can't boot the boot image.

Zane Hamilton:

Interesting. I haven't thought of actually burning anything to a disc in a long time. So glad to hear this still happens.

Skip Grube:

Yeah, it happens, man.

Wale Soyinka:

Wait, are you saying we can't use floppies anymore? We're trying to get the minimal. 

Lukas Magauer:

You can try.

Skip Grube:

He has 10,000 floppies at his house and he's going to do, hell or high water, an install.

Wale Soyinka:

It's called Minimal. So I thought he'll...

Lukas Magauer:

Minimal is definitely too big for a floppy, but maybe you can get boots working on that.

Chris Stackpole:

I think I'm going to have to throw out if you really need a floppy, you might need to look up tomsrtbt. That's what I used way back in the day and it's still functioning like I believe they're still releasing, so you know what? Have at it. But don't expect Rocky to be on a floppy. I'm not testing that.

Crowdsourced Testing [20:17]

Skip Grube:

Hey, I had a curious question because I remember vaguely, sleep-deprived, the early days of the project, a couple years ago. This thing launched at the end of 2020 and I know we were all working hard, but I'm just wondering. I remember – again, I'm not on the testing team; I just interact with the testing team in the course of my work – and I remember we started almost like a crowdsource type deal. Does anyone, and I know we've iterated – it sounds like we're iterating even further; it's great – but I'm just thinking, is that accurate to say that it was like, “Well, we released an alpha or a pre-release or whatever. Everybody go get it”? And it was exciting, but perhaps too chaotic, I think, for an actual process. 

Chris Stackpole:

Yeah. We crowdsourced a couple of different things. We had a report that we were gathering the xsos report, I think is what it was, in order to get some feedback about hardware that people were running it on, trying to get an idea of what kind of systems and oddball systems people were having. We were also taking in a lot of feedback and it was good to get the “hey, it worked” because that means that, hey, it worked for somebody. But trying to manage the “it broke my scenario” was really chaotic, and we did not have a great way of pulling in a lot of that data. It's easier now, because we're doing so much testing that we catch a lot of these things. So those bug reports do come in now. We also have a little bit more presence within our Git repositories. We've got the forums, the Mattermost, and we've got these different ways people communicate to us. So it's a little bit more structured and organized when we do have a release of people saying, “Hey, I've got a problem.” But yeah, in the early days, it was just a fire hose stream of things broke and we didn't have a whole lot of being able to handle that. 

Skip Grube:

Sounds like we've iterated a lot. We've come a long way and it sounds like we're up and up and out. It's great.

Lukas Magauer:

Yeah, of course. Still encourage people to just chip in on the releases and still just try to install anything and try what they want to to be running on their systems so they just can make sure that in the future that our releases also will work on their machines.

Zane Hamilton:

Thank you. So we have a question from Dave. It's good to see you, Dave. How will you guard against abusive openQA to do something other than testing (think Bitcoin or DDoS)?

How to Protect Against Abusive Use of openQA [23:22]

Chris Stackpole:

So granted, there's still a lot of this to be wrapped around. But yes, we do have processes that we're working on right now so that people can submit code into openQA through our Git pipelines where we're verifying and checking. We do have a lot of really good processes around making sure that the code that we do is signed; it's checked off by multiple testing members before it ever gets to that level. And so the eyeballs and processes that we've got that go on to making sure that code coming into the openQA repository is good and not malicious; hopefully, we'll guard against a lot of that. Then, when we do open it up for more users to go through the openQA interface, we're working on some of those protections and guards and how much people can use, what kind of resources are going to be allocated. They're going to have to have accounts and those accounts are going to come through the Rocky infrastructure team. So they've got things that are filtering out and watching for abuse. I don't have a great answer for you right now because it's still early days, but it is not something that has been forgotten about. We're thinking about some of that stuff.

Skip Grube:

Well, it sounds to me like the rest of the project, where Rocky infrastructure accounts or if you have a Rocky account and/or submit pull requests to Rocky code repositories, those are all subject to review. It's not open season out there. So, good stuff.

Zane Hamilton:

Wale, I believe you had a question?

Wale Soyinka:

Yes. So does the test end once an ISO is printed? Is that it? Or are there other things that you do in between releases, in between new ISOs?

Does the Test End Once an ISO Is Printed? [25:37]

Chris Stackpole:

So between releases, we are iterating through trying to make improvements and trying to ensure that we're catching more tests, that our process is more robust, that we've cleaned up any errors or things that have come up. So there's a lot of growth that still happens between releases. When a release actually happens, though, most of the testing team is just anxious for that first ISO. We want that first ISO because that gives us a starting point for all the different things that follow after that, so that we can kick off the different suite of tests to make sure that everything is functional. 

As we find problems, whether that is a problem because our test broke or maybe a button got moved and it moved just slightly enough that it broke our test or whatever., or maybe that is a bad build, that's where we have an iterative cycle inside of our Mattermost Playbook to say, “Hey, we need to rerun these tests.” We work very closely with infrastructure so that if we think it's something like a bad build, we let them know immediately: ”Hey, this just happened. Can we verify this?” That's where it starts to get a little messy because just because one test broke in an ISO, that may actually require a second release candidate ISO to be generated, but we can still do other tests to make sure that we haven't found anything else. There are tests that will still run on the first ISO even though we're waiting on release candidate two ISO in order to verify the previous tests that broke. And so it's a bit messy still, but as we automate more of this and we get better with the playbook, I'm hoping that it's cleaner and easier to follow along. Do you have a thought, Lukas?

Lukas Magauer:

Only after that, we also do other stuff between the releases, like documentation, of course, a lot of documentation, of course.

Chris Stackpole:

We've had several people who have stepped up recently asking about how they can do more documentation and that is very appreciative. I'm actually hoping to see some of that get merged in real soon. Because it's useful for when we are in the release candidate phase of ISOs to, say, a community member that's coming in says, “How do I help?” to be able to point them and say, “We need help with this.” Well, they don't know what that is. They're just going to be like, “I don’t know,” but if we can, say, point to this and it points to documentation that tells them, this is what I need you to do in order for us to get a success or a failure, which is also very valid, that is a lot more useful. I'm hoping that the next release we're actually going to have a lot more of that documentation available for the users to get started with.

Zane Hamilton:

So another question, maybe this is for Steven or Wale, but whenever you look at documentation and how things come through this process, how do you guys get involved? When do you guys get involved? And then at what level? I mean, are you going to try to document what breaks or what exactly is it that you're trying to look for? And then how do you document that?

Documentation Process for Testing [29:25]

Steven Spencer:

Honestly, when they talk about documentation on their end, they're talking about their process documentation and they've got people internally that are doing that. So we can't claim any fame or any problems with that documentation, unfortunately. We'd be happy to help, and we are, but we're not very involved in that particular process.

Zane Hamilton:

Got it.

Wale Soyinka:

Yeah. Where we chime in might be like right at the end when it comes to release notes. That's when all the hard work's been done and that's where we just come in and we just swoop in and just act like we did a lot with the release notes.

Zane Hamilton:

Thank you, Wale. Alan, welcome.

Why don't you introduce yourself and tell us how you were involved in the Rocky community.

Alan Marshall:

Well, I'm actually an engineer, a professional engineer. I'm retired. They can tell by my accent, I live in Scotland. And I was doing some voluntary work for one of our local health people and organizations and was working on their Linux systems, their VMware systems, but most of the stuff was on Microsoft. It was all Windows. And I'm afraid Microsoft has been one of my pet hates for quite a long time. So I've really gotten away from it when I retired, and then I saw the thing about the CentOS having been demised, to suddenly end its life. And I thought, “Hey, that's a bit off.” So, I got interested in the replacement, which Gregory had started up, and I just came along to the testing team and I've been there ever since. So although I'm not a CIS admin like the rest of them, I do have a fair background in real-time control systems in large industrial organizations. So I'm always hoping that the skills I had would be useful, but it's a limited use so far, but I'm certainly getting well into all the things that need to be done in the testing team and hopefully it's all moving along. So, that's my story.

Zane Hamilton:

That's fantastic. Thank you for joining, Alan; we really appreciate it.

Alan Marshall:

Okay, thanks.

Zane Hamilton:

So I guess back to some documentation, when you talk about release notes, you guys have to be involved a lot along the way too, I'm assuming, as things get tested and then you get to the release notes time. Do you have to sit down and start just installing and playing and writing, or is this kind of building off of something else? I mean that, in and of itself, for a QA process to me is pretty fascinating.

Testing Timeline Process [32:55]

Steven Spencer:

We start with a framework from using the previous notes and then we'll work with the release notes from upstream. And from the comments that come from Stack and Lukas and Alan and some of the other members of the team on the testing side, we emphasize things that should be more important to the Rocky Linux community. So that's where our process starts and where we get involved with those release notes. During the end of the testing process, when the release is about to come out, there's a lot of fine-tuning and a lot of conversations go back and forth between.

Wale Soyinka:

Yes, we will lean a lot on other groups for what goes into the release notes. We'll learn a lot also on upstream because they've done it. They've been there before us. So we'll lean a lot on folks like Louis, like Neil, they're hands on and saying, “Oh no, no, you can't. You have to change that. This is broken; you have to…” so we'll lean a lot on others for that, for what goes into the release notes and, of course, the testing team.

Zane Hamilton:

Great. Thank you, Wale. Thanks, Steve. We have one question from Anthony. Is anyone allowed to volunteer time or information to the testing team?

Open Testing Meetings [34:26]

Chris Stackpole:

Absolutely. Yeah. So we have very public open meetings every week. And they rotate between an afternoon and morning session for me. But that is actually reversed for somebody like Lukas and Alan because they're on the other side of the world from me. But yes, and I guess about five hours from now, we're actually going to have a testing meeting. It's open to anybody. And it's just a part of the Zoom that you could find in the public RESF calendar. You can join us on the Mattermost chat. And anything that comes up, you're welcome to join in and help out with. 

Additionally, if you're busy at the moment, but come release day or release week, you're like, “Hey, I want to help out with this,” we will have in our Mattermost channel the playbook running with different information and so you can follow along with the release and when we say, “Hey, new ISO has been released – we need somebody to do X,” anybody in the community can do that. Jumping in and doing that, whether that is grabbing a cloud image and verifying that that it works, if you have access to any cloud infrastructure or grabbing the VM image or burning a DVD – well, granted the DVD itself is too big now, but if you burn the boot or minimal image you can burn those to DVDs and CDs still. And so we have a community that that is the way that they interact with – it's actually surprising how many not connected to the internet systems are running Rocky that we hear about, and so that actually, it's important that they can burn it to a DVD that gets passed through their security team. 

And so if you've got a rewritable disc and a burner, that is an easy way where we can get another person to say, “I burned it successfully; I booted off of it. It works.” Especially, if you can boot both original BIOS and UEFI, that's a huge thing that a lot of people can do. There's a lot of these testing items on release week that we need feedback on. And getting a lot more people to get eyeballs on it is an easy thing for a lot of people to be able to do. Just hanging out in that channel, watching it all go down, and so, “hey, we need help with this” and you think I could do that. Perfect.

Skip Grube:

Release weeks are very interesting because you can, if you follow Rocky Linux and you're on our chat channels, you'll notice you can tell when it's a release week, when the development, infrastructure, documentation, and testing channel activity goes crazy. I guarantee you will see it. There are lots and lots of messages to be passed because there’s a lot going on.

Lukas Magauer:

Yeah, and to mention it: Mattermost is mostly used for the main parts, but there are also some channels bridge to IRC. So except that the release channel then because that one is only on Mattermost, but the other ones you would definitely will also see on IRC.

Zane Hamilton:

Thank you. So Stack, earlier you talked about having things signed and I know that we talk about everything going through the RESF gets signed as it gets built by the build system. Is there a mechanism or is there a desire or do you now sign packages also that it's been QA'd as it comes through testing? Or is there even a need because once you test, it just goes back in; if it's been signed by the RESF, it's been tested?

RESF Signed Testing [38:22]

Chris Stackpole:

At that point, that's all infrastructure. What I was talking about ‘signed’ is making sure our code commits are signed and we want to make sure that the code part of it is but mostly what we're testing is things that are release ready and already through infrastructure. So they've already gone through that part of the signing. So the testing bit, other than the verbal or Mattermost posts saying, “Hey, we sign off on this and this is good,” - that's about it.

Zane Hamilton:

Okay. I think there's another question from Anthony. He's asking: he's got a lot of spare servers and equipment around, sounds like he's tested a lot of Kubernetes type work and containers, and is interested in helping out. So, go join the community, Anthony. I'd love to have you help. Really appreciate it. All right. I know that there are other questions. Skip, you've always got questions. I'm going to let you ask something. Come on.

Skip Grube:

Actually my next question involves a practical type thing. Is there any way we can see openQA in action? I know there's a site for it, right? Like we can see tests as they happen.

Watching openQA In Action [39:43]

Wale Soyinka:

I think it's vaporware, Skip. I I don't think they have it; they have nothing.

Skip Grube:

No, wait a minute. Wait a minute. Somebody mute him.

Zane Hamilton:

There’s nothing like on-the-spot demos.

Skip Grube:

Oh, we don't have to see a test. If I, as a community member or an interested party, were to go, and in fact I'm going to segue… let me look at my old links here because I think there's an openQA Rocky Linux site, right? Openqa.rockylinux.org, maybe.

Alan Marshall:

We do have an infrastructure system that we're just in the process of getting it set up. So it will actually, once it's up and running, well it is actually running, but we haven't used it all that much just yet, although they had it for a couple of weeks. It's now we're in the process of getting the thing up to the stage where a whole load of our tests will be done on that and they will be visible publicly. I think it's openQA.rockylinux.org, I think is the website, and you go straight into openQA. So you can actually see if I run something at any particular time, you will see it as it happens.

Lukas Magauer:

I guess we are just showing it off. Let's see if I can pull it up.

Skip Grube:

Let me see if I can. I have a follow up to this too, but we'll see if, one second. Yeah. 

Lukas Magauer:

Okay, is it visible or do I have to switch?

Skip Grube:

Here you go. There it is.

Lukas Magauer:

Ah, there it is. Basically, I'm already logged-in and am right now an operator. So I don't see everything, but this is only the bare minimum management part, which is there. So you basically have in the interface the whole tests and you start on this page, or let's go back here. Then we also see the filters correctly. And it's grouped in test suites. And the test suites basically are a group of tests which are run together. And then if we go to some of the bigger tests there, you can on the one hand see which tests will get run in this one. Let's see if I can pull it up. What exactly has been run here?

Yeah, this is a minimal package set on the DVD ISO. And it's basically the interface which gets triggered by the API. The API can be easily triggered by a script, which is provided by openQA. And to see it in action, we would have to kick off something new, but you can also look at the stuff afterwards. By either looking at the pictures, which it took: these are basically the needles. So it takes a picture of the current mesh in which it tests and then compares it to a picture, which has been taken before. 

Skip Grube:

So, this is so cool. 

Alan Marshall:

I've got a test I can click off just now, Lukas, if you'd like to issue it.

Lukas Magauer:

Yeah, one second. I will only show that you can also just look at the recording of it. So this is basically the whole test, which has been done sometime ago.

Skip Grube:

This is kind of what I'm getting at because I secretly already know a little bit about openQA. The way it works, for people who aren't familiar, is it literally automatically inputs things just as a human would, right? So if there is a mouse and a keyboard or a virtual mouse and keyboard, yes. And so if there are issues with the GUI or with something going on, we will pick them up, as you can see very quickly. We can launch as many of these as we want with as many variations on what we install or the different ISOs we use or whatever. I just think this is really cool, especially since I didn't realize it saved a video of it. That's awesome. We can do it as it installs.

Lukas Magauer:

We once even had a problem, which we only found because the order of the steps which had been done was exactly what brought up the problem.So this is pretty cool. 

Skip Grube:

And the other cool part, which I also did not realize we had, was not only does it take screenshots of it, but it compares like pixel-for-pixel against known good screenshots.

Lukas Magauer:

I can also show that. Basically, let's go to one test, and if I click here and here, you can see how the new needles are taken and you can basically look at the previous ones. As you can see, we still have the images from Fedora inside because some of them also match the Rocky.

Skip Grube:

Well, it's Anaconda for everybody, right?

Lukas Magauer:

Yes.

Wale Soyinka:

I have a new question here for this. So is there some AI behind this, or does a human being have to go through the install process first so that openQA has something to compare with? Or is there some, what does it compare?

How Does openQA Work [46:04]

Lukas Magauer:

So, in the background, openQA is a really big stack of Perl scripts. It just runs through sequential tests, which are basically just Perl, which gets run and on every step, which – can I show it, one second, there – this is basically the part which gets run here. So there is another Perl script which starts this one, but here it basically checks if it's a PXE boot, and then it only runs this stuff. And especially when we see something like, so ‘get_var’ is just growing a variable from the environment. But if you see ‘wait_serial,’ ‘wait_serial’ is for the console. So there is not only a connection to the screen, but also to the console. And it looks also at that.

But if we also have these ‘assert_and click’ and ‘assert_screen’ here and they really just pull up the needle, how is it called in the background? So it's a picture with metadata and it looks for this entry in the database or the following structure. And then just compares what this image is to what it expected it to be, especially, in the block, which is mentioned in the metadata. So there is a whole picture, and then it will only compare this section here, which is marked in green here.

Zane Hamilton:

To expand on Wally's question: somebody has to go through and actually do this, create this screenshot, build this entire flow, write the code. So there's a lot of work involved in doing this, maybe some things after that. But the first time.

Does openQA Eliminate the Need for Personal Build? [48:22]

Chris Stackpole:

I'm actually going to loop this in with the previous question about how they can get involved. And also tie it into a question that Steven had, which was, “Does this instance of openQA eliminate the need for personal build?” So I'm going to tie all three of these together fast. It doesn't remove the need for personal for a lot of us because we are doing a lot of heavy development and we want to be able to test and tweak and break and we can break in our home environment. Versus this one, which is going to be a) more available for hopefully more automated testing from infrastructure, but b) we want to bring the community in on this. And setting up all the stuff with openQA, even with our scripts, is still a bit of a challenge.

It's not a simplistic software; there's a lot going on here, but with an interface available, we are hoping that we could create this environment where people who are interested in helping figure out, “hey, this test failed because the button went from the right hand side of the screen to the left hand side of the screen,” well, great, somebody can go look at that and then redraw the needle boxes for us. That's a way that people can participate by looking at where the failures are happening versus the expected result. And so this we're hoping allows for more community engagement. We also strive for transparency so people get to see the tests that we're actually doing. When we say that we do a lot of testing, which is why our releases are a little slower, people can see these are all the tests that we're doing.

It's not that other ways are bad. It's just we really want to be thorough. We want to release a great product and therefore look at all the different tests that we run and verify against. And now, this is all public as we start to move through it. And so this is a great start for being able to get involved with the community in openQA. Then, when you get to the spot of like, “hey, I want to do something else that possibly might break this,” great, we'll help you set up a home lab.

Zane Hamilton:

Alan, I think you said you had possibly one you could kick off to show us what it looks like.

Alan Marshall:

Yep. It's all ready to go. If you want to see it.

Lukas Magauer:

Okay. Did you kick one off? Oh, okay.

Alan Marshall:

There it goes.

Lukas Magauer:

Oh yeah. Okay. Then we will just go to one and we will basically see it in life.

Skip Grube:

This is a cool gizmo.

Zane Hamilton:

Thank you very much for installing Rocky; I really appreciate it, and it's wonderful.

Skip Grube:

And this is cool, because not only do we do this, but we can, especially during a release time, but really, anytime if we want to test anything, we can kick off many of these at once if we want to. It's whatever our hardware allows for. So yeah, this is way better than just giving everybody an ISO variant and saying, “go test it.” 

Lukas Magauer:

Yeah, we still do that, but in the background, this runs, and this gives us a lot of our time.

Skip Grube:

This gives us a good baseline though. Like it should work because it works automatically. If it doesn't work for you, let's figure out what's different about your get up, you know? Yes. So this is very cool.

Lukas Magauer:

And you can see here it not only shows you the screen, but it will also show you on the upper right where exactly it is. So what it's doing right now. So looking at the mouse button or looking at if the main page is open and such stuff, then it's waiting for some time and pulling this other stuff up. 

Skip Grube:

Yeah. That's neat. It’s doing it much faster than, well, it can do this while we sleep. That's the important part.

Lukas Magauer:

Yeah. Especially one of our test suites, which has most of the more special stuff that takes, on the normal system from our labs, takes a minimum four hours to run through.

Skip Grube:

Nice. Interesting. And that's because we have lots of variants and lots of different options and things we try. But automatic, that's great.

Lukas Magauer:

And currently we mostly run it for x86-64, but we also implemented it already for AR64, where we still need to look for more missions if someone wants to contribute these or wants to help there.

Chris Stackpole:

Yep. We need more AR64 machines. We're working on adding a lot more of them into our testing suite automatically. We've also got a few people in the community who are interested in the PowerPC, that is still a little further off on some of this. But the hope is that if we get x86-64, we would catch most of the same errors in the others. And eventually we'll get to the spot where – at least I really want us to be in the spot – where we're testing the full suite against all the distros that we're supporting as well as a lot of the common functionality. Just a minute or two ago, there was a thing about, I think it was Cody, had something about HPC up there, and I've got grand plans and I've already started on testing Rocky in a lot of known HPC type environments because that's what I do.

So it's going to be great because as we get more of the community involved with things like the special interest groups, I want to pull a lot of that into this as well. I would love for us to be able to be at a spot where a special interest group says, “Hey, we want to make sure that on release day our versions are working,” and we can pull that in. But again, these are some of the grand plans. We still have a lot of work ahead of us and we're still moving forward on it. But if you're at all interested in the testing team meetings, join the Mattermost testing group, and let us know what you've got and how you want to help, and we'll do what we can to get you involved and start it up and going.

Zane Hamilton:

So if someone actually had some of that infrastructure that maybe you guys don't have access to today, how would this integrator work with those different pieces? Do you have to install something? Do you just have to open access to it? How does that work?

How to Incorporate Infrastructure [55:22]

Chris Stackpole:

So that's a great question in that there are two different ways for things like PowerPC or S/390, where those aren't just things that are just laying around idle in somebody's basement; it's harder to do testing on some of that. If you have the ability to set up a test machine on site where you could run openQA, that is one way that's very easy because you could be somebody who comes in and says, “Hey, I just ran all these tests against PowerPC or S/390 or we're joking about risk” – you want to set up something like that? Fantastic. That is an easy way to get started. And by “easy,” I mean, you're still setting up a lot of it, but we are happy to come along and help you with it. There is an aspect of a remote worker though, where we would just need to have a machine that can tie in from our cloud infrastructure to your system. That is still being worked on through infrastructure to make sure how we do that securely with that kind of remote connection. That's a little harder because of some of the security concerns there but we're working on that as well.

Skip Grube:

Anybody got a mainframe handy? Talk to us. We like that.

Zane Hamilton:

Cheap and readily available.

Lukas Magauer:

To explain how we are running it, it's basically a Fedora mission because for us it best runs on Fedora right now. And so we are using that as the basis.

Chris Stackpole:

A lot of interesting packages that have to be built that are really supplied for EL Linux. We've attempted a couple times building the entire infrastructure for openQA on top of Rocky. It's gotten pretty messy every time. And we've run out of time when we've done it. But we are not above working with other groups. We've taken a lot from Fedora and openSUSE and we are so thankful for a lot of the work that they've done in paving the path. So running Fedora is the easiest way of getting started at openQA.

Zane Hamilton:

Thanks, Stack. Wale, do you have somebody you want to add? 

Wale Soyinka:

Oh, on the subject of workers, I was going to ask: how many do we have? How many on the architecture side of openQA do we have? How many workers do we have that are doing this?

Chris Stackpole:

A worker doesn't need to have a lot, a couple of cores and I think it was 8 GB of memory, something like that? 

Lukas Magauer:

Every worker needs two and a half cores and 4 GB of memory at a minimum. Or it will just lack and crash all the time. But on this mission right now, though, our production mission that we have right now, I think, we have 9 or 10.

Chris Stackpole:

And so you can have multiple workers on the same host, especially if you have virtualized abilities on top of virtualization. If your hardware supports it, you can run multiple virtual workers to do these things in parallel. But then we can also set up multiple physical workers as well, which is helpful for when you have multiple architectures.

Alan Marshall:

The rule I use is to take the number of cores in the machine, divide by two, subtract one, and that leaves two cores for each worker and to spare for the other things the machine has to do. And I've found that works pretty well. If I take more than half the number of cores, then I tend to get failures in tests, for the tests just hang and they fail. So it's best just to keep the number of workers down a bit and you don't have the problem.

Lukas Magauer:

Yes, you will definitely start to see lockups.

Alan Marshall:

Yeah.

Skip Grube:

This is great. You guys sound like me building packages now. Well, if we take that version to OpenSSL, that didn't work very well. So we'll have to go…

Chris Stackpole:

The aspect that this can be really slow, and that's the thing I found was that I was hitting the disc really hard trying to pull data. There can be limitations. Don't just go grab a monster CPU and think, ah, this will be good enough.

Wale Soyinka:

I was going to ask a question on the previous screen, but that's gone. It was administrator root, admin user creation. I was going to ask, are tests at granular, where we go through one installation process where we say, make this user administrator, you check the box, then you let it run to the end, and then you start again, and then do, without checking the box. Are the tests that granular, where we check every box and uncheck it and see what happens?

Chris Stackpole:

Not every test. There are several tests that we want to make sure are done with a checkbox and one without a checkbox. And there is an option inside of openQA for being able to have a certain amount of tests that are the same and then forking that to go different directions. We're still developing a lot of that stuff. But yes, there are some tests where we know we need to test it, both A and B. Other tests, we don't necessarily need that. But again, if we get a question or a bug or something that comes through, then it goes back into the test queue. And so we're adding a lot of tests because it was a known problem at some point.

Wale Soyinka:

Thank you.

Lukas Magauer:

So as you can see, this is one of the previous runs, and this is the part of which you mentioned, right?

Wale Soyinka:

Correct. That is the exact screen. Yes. That one. They'll make this user administrator.

Lukas Magauer:

Yep.

Zane Hamilton:

Now this is fantastic, and I know I could sit here and watch this stuff all day. It's really cool. So it's amazing how much work goes into this that you don't really think about until you actually see it. All of the screenshots, all the code behind it.

Skip Grube:

I just like the videos. That's awesome.

Zane Hamilton:

Time-lapse videos are cool. It'd be kind of cool to see one fail, just to see where it failed. So we are up on time. I really appreciate everyone's time. I really appreciate the Rocky community and everything that you do. Thank you for spending time and putting this out there and working on it, creating these scripts. You guys are amazing. Love the work you do. If you want to get involved, go to the Mattermost site. Volunteer testing team needs a lot of help. As you can tell. There's a lot of work in that. We really appreciate it. Guys, we will see you again next week. Like and subscribe. Alan, Lukas, Stack, Steve, Wale, Skip, it's always good to see you guys. Really appreciate it. See you next week.

Chris Stackpole:

Thank you all. And just if you don't mind, really fast, just a huge shout out to the testing team because without them, this is, I feel like I don't do nearly as much as the rest of these guys. They're fantastic, but Lukas, Alan, Al, Trevor, Rich, and then a lot of other people help out so much and we do build off a lot within the community itself. And so a big shout out to both openSUSE, Fedora, openQA, and all these groups that we're building and using their tools. This is fantastic. So thank you so much for the time, for us to be able to show off what we're doing in our little world.

Zane Hamilton:

Absolutely. Thank you, guys.