Matlab (R2012a) on Star
1. Copy the files from (MATLABROOT)/toolbox/distcomp/examples/integration/sge/nonshared to (MATLABROOT)/toolbox/local on your local computer. As I had already done this for another LSF-based cluster, I had to rename files internally and externally. If you only want to use the Star cluster then ignore the uge_ prefixes in the pictures below and copy as just mentioned. If, however, you wish to use the Star cluster with another non-gridengine based scheduler, you must
a)copy renamed files to toolbox/local (I used the uge_ prefix) and
b)internally change calls in the uge_* files to be calls to the renamed files. For instance, in uge_communicatingJobWrapper.m change getRemoteConnection to uge_getRemoteConnection, to, createSubmitScript to uge_createSubmitScript and extractJobId to uge_extractJobId. A full list is given in an appendix.
(All descriptions are in a local linux format, Windows or Mac local format should be very similar. The cluster format is linux)
2. Start matlab on your local computer.
3. Run the command (in your matlab command window)
pctconfig('hostname', 'local_FQDN')
where local_FQDN is the Fully Qualified Domain Name of your local computer.
4. Setup a generic scheduler. Parallel->Manage Cluster Profiles. The pictures below show local and remote data locations into which matlab will insert job data. Note that I've explicitly used the 'star' profile in the examples below. The number of workers is optional but be aware that jobs requesting more than the available number of cores will wait (and it may be a long time). It may be easier to ensure that your job runs locally (perhaps with multiple processes) and start correspondingly small on Star. (You must have an account on Star for this to work.) The command
qstat -u '*'
on star will list all running and queued jobs on the Star cluster and qstat by itself will list just yours.
The data locations shown are set according to your configuration. Star_username is a placeholder for your cluster login name and local_username a placeholder for your matlab client desktop name. The functions referred to above with the '@...' designations must be in $matlabroot/toolbox/local. I found them in $matlabroot/toolbox/distcomp/examples/integration/sge/nonshared. The nonshared indicates that your workstation/desktop does not share a filesystem with the cluster. If your matlab parallel computing toolbox does not have them, they are available on the Star cluster under /share/apps/mathworks.
The matlab scripts below are examples I found to work with the software. The hpccLinpack.m file performs an spmd job.
t=createTask(j,@hpccLinpack, 2, {})
I added the hpccLinpack.m file to this document. I had to change hpccLinpack.m to force it to produce two output arguments. You see that 2 in the createTask command. If the script(s) are 'parallel' scripts you can substitute
the wait function causes matlab to wait until the job is finished. The
above set runs 9 separate worker processes; the 8 listed above + 1 as a driver.
If you have questions or issues, please let me know.
Dr. Marty Bylander
System Support Analyst-HPC
512-245-7866 office
512-245-5806 fax
The original versions of the hpcchallenge files are also on Star under /share/apps/mathworks/toolbox/distcomp/examples/benchmark/hpcchallenge. The altered hpccLinpack.m is shown below:
function [datasize, perf ] = hpccLinpack( m )
%HPCCLINPACK An implementation of the HPCC Global HPL benchmark
% hpccLinpack(m) creates a random codistributed real matrix A of size
% m-by-m and a real random codistributed vector B of length m. It then
% measures the time to perform the matrix division of A into B (X = A\B,
% which is the solution to the equation A*X = B) in a parallel way using
% the currently available resources (MATLAB pool). This time indicates
% the performance metric. Finally the function computes the scaled
% residuals to ensure that the error on the computation is within
% acceptable bounds.
% If you do not specify m, the default value is that returned from
% hpccGetProblemSize('hpl'), which assumes that each process in the pool
% has 256 MB of memory available. This is expected to be smaller than
% the actual memory available.
% Details of the HPC Challenge benchmarks can be found at
% and the specific Class 2 specs are linked off
% that page. (At the time of writing, the specs are linked at
% Examples:
% % Without a matlabpool open
% tic; hpccLinpack; toc
% Data size: 0.108665 GB
% Performance: 16.351622 GFlops
% Elapsed time is 2.791896 seconds.
% % With a local matlabpool of size 4
% tic; hpccLinpack; toc
% Data size: 0.434774 GB
% Performance: 18.650758 GFlops
% Elapsed time is 21.647003 seconds.
% See also: hpccGetProblemSize, matlabpool
% Copyright 2008-2009 The MathWorks, Inc.
% If no size provided then get a default size
if nargin < 1
m = hpccGetProblemSize( 'hpl' );
% Create a distributed matrix in the 2d block cyclic distribution and a
% distributed column vector in 1d
A = codistributed.randn(m, m, codistributor2dbc);
b = codistributed.rand(m, 1);
% Time the solution of the linear system
x = A\b;
t = toc;
% Need to convert to a 1d distribution for the checking code below
A = redistribute(A, codistributor1d);
% Compute scaled residuals
r1 = norm(A*x-b,inf)/(eps*norm(A,1)*m);
r2 = norm(A*x-b,inf)/(eps*norm(A,1)*norm(x,1));
r3 = norm(A*x-b,inf)/(eps*norm(A,inf)*norm(x,inf)*m);
% This test is specified in the benchmark definition
if max([r1 r2 r3]) > 16
error('Failed the HPC HPL Benchmark');
% Performance in gigaflops
datasize = 8*m2/(10243);
perf = (2/3*m3 + 3/2*m2)/max([t{:}])/1.e9;
fprintf('Data size: %f GB\nPerformance: %f GFlops\n', 8*m2/(10243), perf);
A list of changes is shown below, using uge_ as a prefix. Note that I used ugeSubmitFcn instead of uge_SubmitFcn, as is seen above in the first picture. Grep is an internal linux command that lists lines in files which match a string. (use the command `man grep` for the man page). A list of gridengine related files are shown below that.
bylander@/usr/local/MATLAB/R2012a/toolbox/local> grep -i uge *
uge_communicatingSubmitFcn.m:remoteConnection = uge_getRemoteConnection(cluster, clusterHost, remoteJobStorageLocation);
uge_communicatingSubmitFcn.m:scriptName = '';
uge_communicatingSubmitFcn.m:uge_createSubmitScript(localScriptName, jobName, quotedLogFile, quotedScriptName, ...
uge_communicatingSubmitFcn.m:jobIDs = uge_extractJobId(cmdOut);
uge_createSubmitScript.m:commandToRun = uge_getSubmitString(jobName, quotedLogFile, quotedScriptName, ...
uge_deleteJobFcn.m:remoteConnection =uge_getRemoteConnection(cluster, clusterHost, remoteJobStorageLocation);
uge_getJobStateFcn.m:remoteConnection = uge_getRemoteConnection(cluster, clusterHost, remoteJobStorageLocation);
ugeSubmitFcn.m:remoteConnection = uge_getRemoteConnection(cluster, clusterHost, remoteJobStorageLocation);
ugeSubmitFcn.m:scriptName = '';
ugeSubmitFcn.m: uge_createSubmitScript(localScriptName, jobName, quotedLogFile, quotedScriptName, ...
ugeSubmitFcn.m: jobIDs{ii} = uge_extractJobId(cmdOut);
bylander@/usr/local/MATLAB/R2012a/toolbox/local> ls -lt uge*
-r-xr-xr-x 1 root root 5050 Feb 15 11:54
-r--r--r-- 1 root root 6676 Feb 15 10:24 uge_communicatingSubmitFcn.m
-r--r--r-- 1 root root 1446 Feb 15 10:20 uge_createSubmitScript.m
-r--r--r-- 1 root root 3343 Feb 15 07:34 uge_deleteJobFcn.m
-r--r--r-- 1 root root 7885 Feb 15 07:32 uge_getJobStateFcn.m
-r--r--r-- 1 root root 6820 Feb 15 07:30 ugeSubmitFcn.m
-r-xr-xr-x 1 root root 416 Feb 15 07:22
-r--r--r-- 1 root root 8352 Feb 15 07:20 uge_getRemoteConnection.m
-r--r--r-- 1 root root 792 Feb 15 07:19 uge_getSubmitString.m
-r--r--r-- 1 root root 482 Feb 15 07:18 uge_extractJobId.m