Parallel computing on Quest (NW cluster)

From Bayesian Behavior Lab

Jump to: navigation, search

Parallel Computing with Matlab on Quest

First, some general tips for writing a good parfor loop.

1) Pre-define a structure called WhatToFit that specifies what inputs go into the fitting function, what models to fit, and what results to return or write to disk. (e.g. which neuron, which model-type, the indices of the cross-validation training and test sets, whether to return pseudo-R2s or not, etc.). This minimizes the inputs to the function, and WhatToFit can be parsed within each parallel execution of the function

2) Make your fitting function write out the result to file or collect the results in a cell type. Indexing is complicated within parfor and you want to avoid that as much as possible.

Now, make sure your parfor loop runs as expected by testing on your local machine. This saves a lot of bugs on quest. Write a simple 'wrapper.m' file that calls the parallelized function doing all the heavy lifting.



matlabpool open 2
parfor i=1:100
fprintf('Fitting iteration %d\n', i);
Result{i} = fitMyModel(WhatToFit(i));
end
matlabpool close


Once you're ready, copy all your necessary data and m files from your local device to your home directory on Quest. Use an FTP GUI if you have one. Type this on your shell and enter your Quest password when prompted. If the paths are right, you should see a list of files that are successfully copied.


scp -rp <localpath>/<localfile> <user>@quest.it.northwestern.edu:/home/<user>/<questpath>/.
scp Quest1n13c8_prod.settings <user>@quest.it.northwestern.edu:/home/<user>/.


Also make sure your Quest home directory has the 'Quest2_b1024.settings' file that specifies whatever the parallel toolbox needs to distribute jobs to workers. If it doesn't exist, you can grab a copy from '/home/pry194' on Quest.


      cp /home/pry194/Quest2_b1024.settings ~/.

Now ssh into Quest:


       ssh -X 	<user>@quest.it.northwestern.edu

cd <questpath>



First, launch Matlab from the shell and make a parallel computing profile using the settings file.



module load matlab
matlab &



In the GUI version, go to parallel settings and import the settings file. Now set this profile as default. Strictly speaking, you should have a validation settings file and validate the parallel computing toolbox, but let's skip this and assume the toolbox works well.

Ok, now edit your wrapper.m file by replacing the line:
matlabpool open 2

with the line:
matlabpool('open', 'Quest1n13c8_prod');

This will make sure that we use the correct profile which gives access to all the workers we requested.

Finally, make a simple text file 'runMyCode.txt' with the following content:


#!/bin/bash
module load matlab
matlab -nodisplay -nosplash -r wrapper > log.txt &
exit


And same for python


#!/bin/bash
module load python/anaconda
python simplecode.py > simplecode_result.txt &
exit


Make sure that the txt file is excutable. At the shell, type:

chmod u+x runMyCode.txt

Now actually run the file:

./runMyCode.txt

That's it! Your code should be queued to run on as many workers you requested. The console output from Matlab will be saved in 'log.txt'.

Ok, now you need some basic tools to monitor what the hell is going on.

showq | grep $USER

gives a list of jobs started by the user. You can get your job number (usually an 8 digit number) as well as its status: 'idle', 'running', 'completed' etc. You can get more details on your job by using the job number:

checkjob <jobnumber>

If you started a job by mistake and want to cancel it, try:

canceljob <jobnumber>

You can continuously monitor the console output by peering into 'log.txt':

more log.txt (OR)
vi log.txt

This page was last modified on 28 October 2014, at 21:18. This page has been accessed 519,646 times.