Generate random data from lists
This article is going to cover generating random data in Excel but limiting the data to predefined lists.
The first step is to prepare the lists of data
Lists
In a worksheet named Lists there are separate predefined data in columns for:
- First Name
- Last name
and related data for:
- City
- Country

Data
To generate testing data, a random value for First Name, Last Name and City are going to be selected for each row.
The Country is going to be looked up based on the City.
A nine digit number is going to be generated for the Code.
First Name
To select a random first name from the list of two hundred and eighty names, the RANDBETWEEN function is going to be used. It will generate a random number between 1 and 280. This number is used as a parameter for the INDEX function to select the matching row from the data.
=INDEX(Lists!$A$4:$A$283,RANDBETWEEN(1,280))
Last Name
The random last name is selected in the same way as the first name.
=INDEX(Lists!$C$4:$C$283,RANDBETWEEN(1,280))
Code
A nine digit code is generated using the RANDBETWEEN function.
=RANDBETWEEN(100000000,999999999)
City
The random city is selected from the list of two hundred and three cities in the same way as the first name.
=INDEX(Lists!$E$4:$E$206,RANDBETWEEN(1,203))
Country
The country value is looked up using XLOOKUP based on the city value.
=XLOOKUP($E4,Lists!$E$4:$E$206,Lists!$F$4:$F$206)

Invalid Data
If you want to introduce invalid data into the random data for testing purposes, it can be done with an IF function and another RANDBETWEEN function.
We are also going to add a user input field to allow them to set the percentage of data that should be invalid. For the example, the percentage is going to be set at forty percent.
Lists of invalid data
New lists of only invalid data need to be created. In addition to the valid data.
For the example, only ten invalid rows per list have been created. More should be added to prevent too much duplication.

First Name
The first step is to generate a random number between 1 and 100 using RANDBETWEEN.
This number is compared to the percentage set by the user.
If the random percentage is more, valid data is returned.
If the random percentage is less, invalid data is returned.
For valid data, a random first name from the list of two hundred and eighty names is selected using the RANDBETWEEN function. It will generate a random number between 1 and 280. This number is used as a parameter for the INDEX function to select the matching row from the data.
For the invalid data, the same process is followed but using the invalid data list of ten rows.
=IF(RANDBETWEEN(1,100)>$E$1,INDEX(Lists!$A$4:$A$283,RANDBETWEEN(1,280)),INDEX('Invalid Lists'!$A$4:$A$13,RANDBETWEEN(1,10)))
Last Name
The random last name is selected in the same way as the first name.
=IF(RANDBETWEEN(1,100)>$E$1,INDEX(Lists!$C$4:$C$283,RANDBETWEEN(1,280)),INDEX('Invalid Lists'!$C$4:$C$13,RANDBETWEEN(1,10)))
Code
The same process used to decide whether to return valid or invalid data for the first name is followed.
For valid data, a nine digit code is generated using the RANDBETWEEN function.
For invalid data, an eight digit code is generated using the RANDBETWEEN function.
=IF(RANDBETWEEN(1,100)>$E$1,RANDBETWEEN(100000000,999999999),RANDBETWEEN(10000000,99999999))
City
The random city is selected in the same way as the first name.
=IF(RANDBETWEEN(1,100)>$E$1,INDEX(Lists!$E$4:$E$206,RANDBETWEEN(1,203)),INDEX('Invalid Lists'!$E$4:$E$13,RANDBETWEEN(1,10)))
Country
The country value is looked up using XLOOKUP based on the city value.
To make sure that the invalid selections are included in the lookup, a range is created by using the VSTACK function to join the valid and invalid lists together.
=XLOOKUP($E4,VSTACK(Lists!$E$4:$E$206,'Invalid Lists'!$E$4:$E$13),VSTACK(Lists!$F$4:$F$206,'Invalid Lists'!$F$4:$F$13))

Highlighting invalid data
To display which data is invalid, conditional formatting can be used.
A rule compares the cell value to see if it matches against an XLOOKUP of the invalid data list. If there is a match, the cells are highlighted in red.


Final thoughts
The RANDBETWEEN function is volatile. This means that every time the workbook is recalculated, the RANDBETWEEN function will produce different results. The workbook is recalculated when data is entered into a cell or edited.
To create a static version of the data, use Copy and then Paste Special as Values.