Parallel computing on Quest (NW cluster)
From Bayesian Behavior Lab
Parallel Computing with Matlab on Quest
First, some general tips for writing a good parfor loop.
1) Pre-define a structure called WhatToFit that specifies what inputs go into the fitting function, what models to fit, and what results to return or write to disk. (e.g. which neuron, which model-type, the indices of the cross-validation training and test sets, whether to return pseudo-R2s or not, etc.). This minimizes the inputs to the function, and WhatToFit can be parsed within each parallel execution of the function
2) Make your fitting function write out the result to file or collect the results in a cell type. Indexing is complicated within parfor and you want to avoid that as much as possible.
Now, make sure your parfor loop runs as expected by testing on your local machine. This saves a lot of bugs on quest. Write a simple 'wrapper.m' file that calls the parallelized function doing all the heavy lifting.
matlabpool open 2
Once you're ready, copy all your necessary data and m files from your local device to your home directory on Quest. Use an FTP GUI if you have one. Type this on your shell and enter your Quest password when prompted. If the paths are right, you should see a list of files that are successfully copied.
scp -rp <localpath>/<localfile> <user>@quest.it.northwestern.edu:/home/<user>/<questpath>/.
Also make sure your Quest home directory has the 'Quest2_b1024.settings' file that specifies whatever the parallel toolbox needs to distribute jobs to workers. If it doesn't exist, you can grab a copy from '/home/pry194' on Quest.
cp /home/pry194/Quest2_b1024.settings ~/.
Now ssh into Quest:
ssh -X <user>@quest.it.northwestern.edu
module load matlab
Ok, now edit your wrapper.m file by replacing the line:
with the line:
This will make sure that we use the correct profile which gives access to all the workers we requested.
Finally, make a simple text file 'runMyCode.txt' with the following content:
And same for python
Make sure that the txt file is excutable. At the shell, type:
chmod u+x runMyCode.txt
Now actually run the file:
That's it! Your code should be queued to run on as many workers you requested. The console output from Matlab will be saved in 'log.txt'.
Ok, now you need some basic tools to monitor what the hell is going on.
showq | grep $USER
gives a list of jobs started by the user. You can get your job number (usually an 8 digit number) as well as its status: 'idle', 'running', 'completed' etc. You can get more details on your job by using the job number:
If you started a job by mistake and want to cancel it, try:
You can continuously monitor the console output by peering into 'log.txt':
more log.txt (OR)