I am trying to run a non-linear finite element MATLAB code whose basic structure is as shown below. It has an outer for-loop and inside it, there are two parfor loops.
for i = 1 : total_load_steps
parfor w = 1 : number_of_elements
//Elemental matrices and vectors computation
end
// Sparse assembly and solve
parfor w = 1:ele
// update variables
end
end
The main script described above and necessary functions present inside it are stored in my local computer.
The 'userpath' command is set to the directory on my local computer from which I am running MATLAB.
The HPC cluster is called from my local machine using 'parcluster' command and the MATLAB script named 'My_model' is submitted to the HPC from my local computer. The following commands are used in the process:
configCluster;
// Entire my user-name and password //
cp = parcluster;
cp.AdditionalProperties.QueueName = 'highmemory';
cp.AdditionalProperties.WallTime = '10:00:00';
cp.AdditionalProperties.ProcsPerNode = 48;
cp.saveProfile
job_1 = cp.batch('My_model', 'Pool', 96, 'CurrentFolder' ,'.');
Strangely, even after assigning 96 cores in HPC, the computation runs slower than my local computer which uses 10 cores only. Moreover, when I increase the number of cores on the HPC, the computation runs even more slowly.
I am unable to find out the exact reason behind such an undesirable behaviour. Could anyone provide any insight as to why is this happening and how can I resolve it to speed-up the computation in the HPC ?