Since you have access to the Parallel toolbox, I suggest that you first check whether you can do it the easy way.
Basically, instead of writing
for i=1:lots
out(:,i)=do(something);
end
You write
parfor i=1:lots
out(:,i)=do(something);
end
Then, you use matlabpool to create a number of workers (you can have a maximum of 8 on your local machine with the toolbox, and tons on a remote cluster if you also have a Distributed Computing Server license), and you run the code, and see nice speed gains when your iterations are run by 8 cores instead of one.
Even though the parfor route is the easiest, it may not work right out of the box, since you might do your indexing wrong, or you may be referencing an array in a problematic way etc. Look at the mlint warnings in the editor, read the documentation, and rely on good old trial and error, and you should figure it out reasonably fast. If you have nested loops, it's often best parallelize only the innermost one and ensure it does tons of iterations - this is not only good design, it also reduces the amount of code that could give you trouble.
Note that especially if you run the code on a local machine, you may run into memory issues (which might manifest in really slow execution in parallel mode because you're paging): Every worker gets a copy of the workspace, so if your calculation involves creating a 500MB array, 8 workers will need a total 4GB of RAM - and then you haven't even started counting the RAM of the parent process! In addition, it can be good to only use N-1 cores on your machine, so that there is still one core left for other processes that may run on the computer (such as a mandatory antivirus...).
Since you have access to the Parallel toolbox, I suggest that you first check whether you can do it the easy way.
Basically, instead of writing
for i=1:lots
out(:,i)=do(something);
end
You write
parfor i=1:lots
out(:,i)=do(something);
end
Then, you use matlabpool to create a number of workers (you can have a maximum of 8 on your local machine with the toolbox, and tons on a remote cluster if you also have a Distributed Computing Server license), and you run the code, and see nice speed gains when your iterations are run by 8 cores instead of one.
Even though the parfor route is the easiest, it may not work right out of the box, since you might do your indexing wrong, or you may be referencing an array in a problematic way etc. Look at the mlint warnings in the editor, read the documentation, and rely on good old trial and error, and you should figure it out reasonably fast. If you have nested loops, it's often best parallelize only the innermost one and ensure it does tons of iterations - this is not only good design, it also reduces the amount of code that could give you trouble.
Note that especially if you run the code on a local machine, you may run into memory issues (which might manifest in really slow execution in parallel mode because you're paging): Every worker gets a copy of the workspace, so if your calculation involves creating a 500MB array, 8 workers will need a total 4GB of RAM - and then you haven't even started counting the RAM of the parent process! In addition, it can be good to only use N-1 cores on your machine, so that there is still one core left for other processes that may run on the computer (such as a mandatory antivirus...).
You have to do several things if you want to run parallel calculation with MATLAB. First
of all, you have to reprogram the code. The only change is the "for" loop. For parallel calculation, you need to replace "for" by "parfor" just like what @Hedayatpoor said. However, you also need pay much attention to the codes in the loop. For example, you use i as index (parfor i=1:N), you can not call variable indexed by i in the loop (b=a(i) is not legal). Next, you can use the Parallel toolbox. Typing "matlabpool 4" can help you connect to 4 labs.
Actually, the parallel calculation capability of MATLAB is limited. If you really have so large amount of calculation, FORTRAN may be a better choice.
I concur that parfor works well if you adjust your code to its specifications. I will add that in the newer versions of MATLAB, matlabpool() is replaced by parpool().
Hello, Kouba! Since you have a NVidia's graphics card in your PC, it is very simple. I suggest you to use the MATLAB R2013a. It already contains the PCT and you only must to use the GPUARRAY function to copy the data to the device memory. After that, you are able to use the same MATLAB built-in functions to run your code in a parallel mode. All the parallelization is implicity executed in the GPU cores by the PCT. You don't have to concern about the number of blocks or threads per block, or another details about parallel programming.
If you need more details, please read one of my papers called "Parallelization of a Modified Firefly Algorithm...".